Open In App

Spoofing IP address when web scraping using Python

Last Updated : 08 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to scrap a website using Requests by rotating proxies in Python.

Modules Required

  • Requests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc.

Syntax: 

requests.get(url, parameter) 

  • JSON JavaScript Object Notation is a format for structuring data. It is mainly used for storing and transferring data between the browser and the server. Python too supports JSON with a built-in package called json. This package provides all the necessary tools for working with JSON Objects including parsing, serializing, de-serializing, and many more.

Approach

  • Manually create a set of http proxies if you don’t have use rapidapi.(Here create_proxy() function is used to generate a set of http proxies using rapidapi)
  • Iterate the set of proxies and send a GET request using requests.get(url, proxies=proxies) to the website along with the proxies as parameters.

Syntax:

requests.get(url, proxies=proxies)

  • If the proxy is working perfectly then it should return an object of the URL.

Apart from working with the code, there are few more set-ups that need to be done, and given below are the details of these setups.

Using Rapidapi to get a set of proxies: 

  • Firstly, you need to buy a subscription of the API from rapidapi and then go to dashboard and select Python and copy the api_key.
  • Initialize the headers with the API key and the rapidapi host.

Syntax:

headers = {

       ‘x-rapidapi-key’: “paste_api_key_here”,

       ‘x-rapidapi-host’: “proxy-orbit1.p.rapidapi.com”

       }

  • Send a GET request to the API along with headers ,

Syntax:

response = requests.request(“GET”, url, headers=headers)

  • This will return a JSON, parsing the text using json.loads(), we can find the proxy server address in the “curl” key.

Syntax:

response = json.loads(response.text)

proxy = response[‘curl’]

Sending Proxy in requests.get() as parameter:

Sending a GET request using requests.get() along with a proxy to this url which will return the proxy server address of current session.

Syntax:

 # Note : Opening https://ipecho.net/plain in browser will show the current ip address of the session.

 proxies = ‘http://78.47.16.54:80’

 page = requests.get(‘https://ipecho.net/plain’, proxies={“http”: proxy, “https”: proxy})

 print(page.text)

Program:

Python3




import requests
import json
  
  
# Gets proxies from rapidapi to create
# a set of proxies.
# Use this function only if you have rapidapi key.
def create_proxy():
  
    # Initialise the headers and paste the API key
    # of proxy-orbit1 from rapidapi.
    headers = {
        'x-rapidapi-key': "paste_api_key_here",
        'x-rapidapi-host': "proxy-orbit1.p.rapidapi.com"
    }
  
    # Sends a GET request to the above url along with api
    # keys which returns an object containing data in json
    # format which is then parsed using json.loads.
    response = requests.request("GET", url, headers=headers)
    response = json.loads(response.text)
  
    # The proxy server ip address is present in 'curl' key.
    proxy = response['curl']
    return proxy
  
  
# Main Function
if __name__ == "__main__":
  
    # Create an empty set and call the create_proxy()
    # function to generate a set of proxies from rapidapi.
    # Orbit proxy Rapid api key is required.
    proxies = set()
    print("Creating Proxy List")
    for __ in range(10):
        proxies.add(create_proxy())
  
    # If you do not have rapidapi then create a set of
    # proxies manually.
    # proxies = {'http://78.47.16.54:80',
  
    # Iterate the proxies and check if it is working.
    for proxy in proxies:
        print("\nChecking proxy:", proxy)
        try:
  
            # https://ipecho.net/plain returns the ip address
            # of the current session if a GET request is sent.
            page = requests.get('https://ipecho.net/plain',
                                proxies={"http": proxy, "https": proxy})
            print("Status OK, Output:", page.text)
        except OSError as e:
  
            # Proxy returns Connection error
            print(e)


Output:



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads