Spoofing IP address when web scraping using Python
Last Updated :
08 Mar, 2024
In this article, we are going to scrap a website using Requests by rotating proxies in Python.
Modules Required
- Requests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc.
Syntax:
requests.get(url, parameter)
- JSON JavaScript Object Notation is a format for structuring data. It is mainly used for storing and transferring data between the browser and the server. Python too supports JSON with a built-in package called json. This package provides all the necessary tools for working with JSON Objects including parsing, serializing, de-serializing, and many more.
Approach
- Manually create a set of http proxies if you don’t have use rapidapi.(Here create_proxy() function is used to generate a set of http proxies using rapidapi)
- Iterate the set of proxies and send a GET request using requests.get(url, proxies=proxies) to the website along with the proxies as parameters.
Syntax:
requests.get(url, proxies=proxies)
- If the proxy is working perfectly then it should return an object of the URL.
Apart from working with the code, there are few more set-ups that need to be done, and given below are the details of these setups.
Using Rapidapi to get a set of proxies:
- Firstly, you need to buy a subscription of the API from rapidapi and then go to dashboard and select Python and copy the api_key.
- Initialize the headers with the API key and the rapidapi host.
Syntax:
headers = {
‘x-rapidapi-key’: “paste_api_key_here”,
‘x-rapidapi-host’: “proxy-orbit1.p.rapidapi.com”
}
- Send a GET request to the API along with headers ,
Syntax:
response = requests.request(“GET”, url, headers=headers)
- This will return a JSON, parsing the text using json.loads(), we can find the proxy server address in the “curl” key.
Syntax:
response = json.loads(response.text)
proxy = response[‘curl’]
Sending Proxy in requests.get() as parameter:
Sending a GET request using requests.get() along with a proxy to this url which will return the proxy server address of current session.
Syntax:
# Note : Opening https://ipecho.net/plain in browser will show the current ip address of the session.
proxies = ‘http://78.47.16.54:80’
page = requests.get(‘https://ipecho.net/plain’, proxies={“http”: proxy, “https”: proxy})
print(page.text)
Program:
Python3
import requests
import json
def create_proxy():
headers = {
'x-rapidapi-key' : "paste_api_key_here" ,
'x-rapidapi-host' : "proxy-orbit1.p.rapidapi.com"
}
response = requests.request( "GET" , url, headers = headers)
response = json.loads(response.text)
proxy = response[ 'curl' ]
return proxy
if __name__ = = "__main__" :
proxies = set ()
print ( "Creating Proxy List" )
for __ in range ( 10 ):
proxies.add(create_proxy())
for proxy in proxies:
print ( "\nChecking proxy:" , proxy)
try :
proxies = { "http" : proxy, "https" : proxy})
print ( "Status OK, Output:" , page.text)
except OSError as e:
print (e)
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...