IPRoyal
Back to blog

Node Unblocker for Web Scraping: Tutorial for 2024

Vilius Dumcius

Last updated -

How to

In This Article

Ready to get started?

Register now

Node Unblocker is a Node JS library that’s used for proxying and rewriting remote web pages. The core of Node Unblocker is creating a server instance that acts as a proxy within a machine. As such, it can be used to circumvent geographical or other access restrictions. What Is Node Unblocker? Built upon the Express framework, Node Unblocker is a Node JS library that’s used to create a proxy within a machine. Like any other proxy, it takes requests from a machine and forwards them to the destination server , only to return the request to the source.

Node Unblocker is incredibly easy to set up, and instances can be started on nearly any machine with just a few lines of code. In addition to creating a proxy within a machine, Node Unblocker also rewrites URLs by adding /proxy/ before the HTTP protocol. Such a change may help circumvent local network restrictions.

Since web scraping requires proxies, Node Unblocker is a popular choice for those who have access to a third-party machine(s). You can set up Node Unblocker in cloud services, creating a proxy that can be used for your web scraping needs.

There are some limitations to Node Unblocker, however, as it has trouble reading some advanced pages. You’ll have trouble accessing social media networks (as these use postMessage, which Node Unblocker can’t interact with) and some of the more advanced websites that use AJAX or OAuth login forms.

How Does Node Unblocker Work?

As mentioned above, Node Unblocker creates a web proxy server within a machine. It is then used to read and send HTTP requests that would usually transpire between the origin machine and the destination server.

While Node Unblocker can function as a basic web proxy, some advanced features make it valuable even if you have a proxy pool available. Without the usage of these features, however, Node Unblocker becomes significantly less important if you have a good pool of residential proxies available.

Most of the advanced customization options are available through Node Unblocker’s middleware . These will highly depend on your web scraping use case, but there are a few features that may be extremely valuable:

  • Removing CSP can sometimes leak to other websites and break the proxy. Additionally, removing CSP can allow you to execute inline scripts, which can be useful if content is loaded dynamically through JavaScript.
  • Cookies can help maintain sessions, navigate through multi-step workflows, and even reduce block rates.
  • Redirects reduces the likelihood that redirects will fail to go through the proxy.

Middleware is generally useful if you want to modify requests and response parsing behavior as most of them are restricted by proxy providers, allowing you to make the best of both worlds. You can easily modify aspects such as request headers with Node Unblocker, making it useful for web scraping and other projects.

Additionally, the configuration file lets you further tweak the behavior of Node Unblocker’s web proxy. For example, the client forces JavaScript to go through the proxy by default, but that can be turned off if required.

Node Unblocker Prerequisites

If you’re starting with a blank slate, you’ll need a few things to get you started with Node Unblocker.

1. Node.JS

Before you can start creating Node Unblocker servers, you’ll need the Node JS runtime environment installed .

2. An IDE

There are numerous great IDEs that you can use for Node JS, such as Atom or Webstorm . We’ll be using Webstorm moving forward, but the principles remain the same, regardless of the IDE you use.

3. A cloud service provider

While you can run Node Unblocker on your local machine, you’d still be using your own IP address, making the web proxy significantly less effective for web scraping.

You’ll start using the cloud service at the very end of the tutorial once the application is working as intended.

Installing and Starting Node JS

Once your IDE is set up and running, you’ll have to initialize a Node JS project. In the Terminal (or any equivalent), type in:

npm init -y

Running the “-y” argument automatically answers some setup questions. You can remove it and answer the questions manually, but most of them are the application’s metadata and names, so it’s not as important for our purposes.

Then you’ll need to install the Node Unblocker and Express package:

npm install unblocker express

Unblocker is the Node Unblocker package. Express is a library that lets you create a server through Node JS.

Running these commands will create a new file, “package.json”, which will contain details about your application. Create a new file called “app.js” in the project directory and open it. Importing the Libraries

const express = require('express')
const Unblocker = require('unblocker');

Since we won’t be reusing the variables, we can use “const” to import Express and Node Unblocker, which means the reference can’t be reassigned later on. “Var” is also acceptable, although it can lead to issues in larger codebases.

Our “require” function serves as an import for the libraries themselves. “Require” functions similarly to “import” in other languages since whenever it’s called, Node JS will look for the named library in either core modules or third-party ones and load it.

Creating the Web Proxy

const app = express();

const unblocker = new Unblocker({prefix: '/proxy/'});
app.use(unblocker);

We start by initializing the Express application, which will allow us to set up a server and configure it later on.

The next line initializes an Unblocker instance, which will use the prefix /proxy/ to fetch requests. If you try to access a website without the /proxy/ prefix, the Unblocker instance will not attempt to take over the request, and your regular IP address will be used. As such, all proxied URLs begin with the prefix and any URLs you want to access regularly shouldn’t have it.

Finally, “app.use” tells the Express application to use the Unblocker instance as middleware. All incoming requests to the Express application will be passed through the Unblocker instance, which will then function as your web proxy.

You may also set a custom port for your application if necessary:

const port = 3000;

Starting the Server

While we have set up the Unblocker server, we still need to let the server to launch and listen to the selected port:

app.listen(process.env.PORT || port || 8080).on('upgrade', unblocker.onUpgrade);
console.log("Node Unblocker Server Running On Port:", process.env.PORT || port || 8080)

We’re starting the server with “app.listen” and set to a default port or a defined port (if you previously used the “const port” function), or, if neither is found, uses 8080.

Our function continues with “.on” and specifies “upgrade”, which is used for various networking protocols that require an “upgrade” (WebSocket, for example). Both of these arguments are required for the Node Unblocker server to pick up requests that may be using protocols other than HTTP.

Finally, “console.log” is simply a message that states that the Node Unblocker server is running and on which port.

Testing the Server Locally

Before launching it on a remote server, you should always run it locally to make sure that Node Unblocker is functioning as expected.

Depending on your IDE and OS, starting the server may have a few different steps. Open up the Terminal and head over to your project location:

cd X:\YOUR\PROJECT\FOLDER 

Then, launch the server:

node app.js

If you named your file anything other than “app.js”, you will have to use that name.

You can now either use cURL or your regular browser to access a website. To use the latter, simply type the URL below into your browser:

http://localhost:8080/proxy/https://iproyal.com/

Take note of the port that has been used for Node Unblocker, though, as an incorrect one may fail to load the page. If the port is correct, however, you shouldn’t notice any unnecessary buffering or slowdowns, and the page should load correctly.

Launching Node Unblocker on a Remote Server As mentioned, you can run a Node Unblocker server on your local machine. It may be helpful if you need to access blocked websites that have been restricted by a local network administrator. However, you won’t be able to access geo-restricted content as the IP address will be the same.

Launching Node Unblocker on a cloud server will let you use it as a web proxy for evading internet censorship, accessing geo-restricted content, and bypassing most other blocks.

There are numerous providers you can use, such as Heroku, Render, and many others. We’ll be using Google Cloud Compute Engine as it lets us deploy low-cost virtual machines.

Start by editing the “package.json” file as such:

{
 "name": "node-unblocker",
 "version": "1.0.0",
 "description": "",
 "main": "app.js",
 "private" : true,
 "keywords": [],
 "author": "",
 "license": "ISC",
 "engines": {
   "node": "21.x"
 },
 "dependencies": {
   "express": "^4.18.2",
   "unblocker": "^2.3.0"

 },
 "scripts": {
   "start": "node app.js"
 }
}

We’re adding a few new things, but the most important part is to add the “scripts” key-value pair, as it provides the VM with the command to run the Node Unblocker proxy server. Additionally, the “engines” tells the VM which version of Node JS to use for the proxy server.

Then, register to Google Cloud and enable Compute Engine on your account. You’ll be brought to a new menu where you should click “Create Instance”.

img1.png

Select the cheapest (E2 at the time of writing) VM instance and click “Create”. Google will create a random server running the specifications, which you’ll be able to use freely.

You’ll have to wait a bit for the instance to start. If you have some experience with cloud servers, you can use SSH to connect to it. Otherwise, you can always connect through your browser by clicking the downwards pointing arrow next to “SSH”.

img2.png

You’ll be brought into the server, which should be running Ubuntu. You may need to relog into your Google Account a few times before full authorization is granted.

If you chose Ubuntu or Debian, you’ll need to slightly adjust the “listen” function:

app.listen(process.env.PORT || port || 8080, '0.0.0.0').on('upgrade', unblocker.onUpgrade);
console.log("Node Unblocker Server Running On Port:", process.env.PORT || port || 8080)

Then, you’ll need to upload your project files into the VM. If you’ve connected through a browser, you can click the “Upload Files” button and simply select them. Otherwise, use the SSH command:

scp /path/to/file username@a:/path/to/destination

Once the files are sent, install Node JS on the machine. First, you’ll need to add the NodeSource repository, guidelines for which can be found on the Node JS Github repository . Running the command should automatically install Node JS and npm.

Then, run the application by typing in:

node app.js

If everything is working correctly, use your local machine’s browser and type in:

VM_EXTERNAL_IP_ADDRESS:PORT/proxy/https://iproyal.com

If you’re getting errors, you may need to enable HTTP traffic in your VM instance or create a firewall rule that would allow traffic through your selected port.

Moving Forward

You can now use the proxy server for web scraping purposes, as long as it’s allowed by the terms of your cloud service provider. However, a single proxy server won’t be enough for long as any IP address can get quickly banned.

Node Unblocker proxies are great if you have a small project or if you have access to many cloud VMs. In that case, you can create many Node Unblocker proxies, which will mitigate the threat of blocks, internet censorship, and other restrictions.

For larger projects, however, getting access to a larger proxy pool is recommended. It would likely be less costly per IP address (or traffic) than running Node Unblocker and provide even better features.

Create account

Author

Vilius Dumcius

Product Owner

With six years of programming experience, Vilius specializes in full-stack web development with PHP (Laravel), MySQL, Docker, Vue.js, and Typescript. Managing a skilled team at IPRoyal for years, he excels in overseeing diverse web projects and custom solutions. Vilius plays a critical role in managing proxy-related tasks for the company, serving as the lead programmer involved in every aspect of the business. Outside of his professional duties, Vilius channels his passion for personal and professional growth, balancing his tech expertise with a commitment to continuous improvement.

Learn more about Vilius Dumcius
Share on

Related articles

Want to learn how IPRoyal can assist you in customizing Proxies on a larger scale?