User Agents for Scraping - Improve Your Web Scraping Efforts

Vilius Dumcius

Last updated -

December 21, 2023

In This Article

When you start scraping the web, you’ll notice that scripts are sometimes blocked seemingly without reason. Somehow, the website knows you’re not using a real browser, and sensing your intentions, it blocks your access.

There’s an easy way to tackle this—by changing your user agent header. Keep reading to find out what a user agent is, which are the most common user agents, and how you can adjust your code to mimic the user agent header of a browser.

What Is a User Agent, and Why Is It Important for Web Scraping?

When you send a request to a server through a browser or an HTTP client like the Requests library in Python, the request includes HTTP headers, which contain all kinds of information about this request.

Among other things, they also include a user agent header —a name that lets the server identify what kind of application the request comes from.

To see how it looks, you can use the following code. It uses the Requests library to send a request from your device. Then, it prints out the HTTP headers of that request in the console.

import requests

response = requests.get("https://example.com/")
print(response.request.headers)

It should print out something like this:

{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '11', 'Content-Type': 'application/x-www-form-urlencoded'}

The relevant header in this case is the first one. As you see, the Requests library tells the server that you’re using it. What a snitch!

The user agent header of web browsers looks slightly different from the user agent header of standalone applications. Inside, it lists the version of the web browser, the operating system, and some other tidbits that can be used to determine what kind of content to serve the user.

For example, here’s the user agent string for the latest version of Chrome (at the time of writing this article) running on Windows:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

The user agent strings of desktop and mobile browsers also differ. For example, here’s a user agent of a mobile browser:

Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Mobile Safari/537.36

By changing the user agent string from the default one to the one that is used by browsers, you can make the website think that the request originates from a real browser and hide the fact that you’re doing web scraping.

How to Change Your User Agent?

Most of the HTTP client applications used in web scraping let you easily change the contents of the user agent string and, in that way, mimic using a real browser.

In this part, you’ll learn how to do it with Requests , the most popular Python HTTP client library.

Let’s say that you already have some code using Requests up and running.

import requests

response = requests.get("https://example.com/")
print(response.request.headers)

To change the user agent header that Requests uses, create a new headers variable that will contain a dictionary. The dictionary needs just one entry—the user agent header.

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'}

Now you can pass the headers variable to the requests.get() function.

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'}

response = requests.get("https://example.com/", headers=headers)
print(response.request.headers)

It will include the user agent header in the request, overwriting the default header. The request will now have a user agent header that matches that of the Chrome browser.

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Which User Agents Are Commonly Used for Scraping Websites?

If you need to pick a user agent string for your web scraping script, the best bet is to just take the most commonly used one. It will blend with the other traffic sent to the website and not stand out.

Currently, the most common user agent is the following:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

It’s the user agent header of the latest Chrome browser running on Windows (starting from Windows 11, browsers don’t discern between Windows versions in the browser). While it might seem odd that there are mentions of other browsers in this user agent as well, there is a solid reason behind that .

Here are other user agents you can use, in order of popularity:

1. Chrome 115.0 on macOS

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

2. Chrome 114.0 on Windows

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36

3. Firefox 116.0 on Windows

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0

4. Firefox 115.0 on Windows

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0

5. Chrome 114.0 on macOS

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36

6. Chrome 115.0 on Linux

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36

7. Chrome 116.0 on Windows

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36

8. Firefox 115.0 on Linux

Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0

9. Firefox 116.0 on Linux

Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0

10. Edge 115.0 on Windows

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 Edg/115.0.1901.188

Keep in mind that the latest browser changes, so it’s a good idea to research commonly used user agent headers and update your headers time by time.

Alternatively, you can just get the user agent string of your browser by googling the keyphrase “What’s my user agent?”. Google should display your user agent string, which you can then set as the user agent for your script. Given that it’s copied from a real browser, it should look natural enough to bypass most scraping restrictions.

Where Can I Find a Comprehensive List of User Agents for Web Scraping Purposes?

If you require a more detailed list of user agents you can use for web scraping, check out this blog post . It contains a list of commonly used desktop-based user agents that is continually updated based on data from people visiting the blog.

Are There Any User Agents Specifically Designed for Mobile Scraping?

Just like desktop browsers, mobile browsers also have user agents that are specific to this browser and the operating system.

Here are some commonly used ones:

1. Chrome on Android

Mozilla/5.0 (Linux; Android 13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.5845.92 Mobile Safari/537.36

2. Firefox on Android

Mozilla/5.0 (Android 13; Mobile; rv:68.0) Gecko/68.0 Firefox/116.0

How Can Using Different User Agents Help in Bypassing Anti-scraping Measures?

By using a user agent header that matches that of a browser, you can hide from preventative solutions that try to block traffic that comes from simple scraping scripts.

For example, Reddit can block requests which have the default HTTP client headers. Changing the user agent to virtually anything else gives you more possibilities to scrape the website.

Of course, using a different user agent is not the ultimate solution. Otherwise, nobody would get IP banned for using browser automation libraries like Puppeteer for web scraping. In reality, websites monitor traffic patterns, user actions, IP address usage, and many other things to discern whether you’re doing web scraping. It’s essential to always stay on the top of your game, and just one small change in headers won’t cut it.

One of the best ways to support large-scale web scraping is through the use of rotating proxies . Proxies act as middlemen between you and your server, hiding the original IP address of your request. By rotating proxies throughout the scraping session, you can hide the fact that there is scraping happening at all—each request will have a different IP address, so it will look like there are just multiple users visiting the site.

If you’re searching for a reliable proxy solution, check out our residential proxies . With our network of over 2 million ethically sourced, unique IPs, we can provide you with everything you need to scale your web scraping efforts beyond simple scripts.

FAQ

Can using a user agent help in avoiding IP blocking while scraping?

While using a user agent can help you avoid detection and IP blocking, it won’t do anything if your IP has already been flagged. Using the same IP for all requests will create a pattern that can lead to an IP ban with and without the use of proper user agents. For this reason, the use of rotating proxies while doing any large-scale web scraping is a must.

Are there any limitations or restrictions when using user agents for scraping?

You can put virtually any text string in the user agent header as you wish, so there are no limitations. But you have to keep in mind that using a user agent header of a different application might be against the terms of service of a website. Therefore, trying to hide your user agent might lead to repercussions such as an IP ban or account suspension. This is easy to solve with rotating proxies and/or multiple accounts.

How can I change the user agent in popular web scraping frameworks?

All popular web scraping frameworks have a way to manipulate the headers of requests. All you need to do is add a custom User-Agent header to the request, which will then overwrite the default one used by the framework.

Create account

Author

Vilius Dumcius

Product Owner

With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.

Learn more about Vilius Dumcius

Share on