Skip to content
Related Articles
Get the best out of our app
GFG App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Spoofing IP address when web scraping using Python

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In this article, we are going to scrap a website using Requests by rotating proxies in Python.

Modules Required

  • Requests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc.

Syntax: 

requests.get(url, parameter) 

  • JSON JavaScript Object Notation is a format for structuring data. It is mainly used for storing and transferring data between the browser and the server. Python too supports JSON with a built-in package called json. This package provides all the necessary tools for working with JSON Objects including parsing, serializing, de-serializing, and many more.

Approach

  • Manually create a set of http proxies if you don’t have use rapidapi.(Here create_proxy() function is used to generate a set of http proxies using rapidapi)
  • Iterate the set of proxies and send a GET request using requests.get(url, proxies=proxies) to the website along with the proxies as parameters.

Syntax:

requests.get(url, proxies=proxies)

  • If the proxy is working perfectly then it should return an object of the URL.

Apart from working with the code, there are few more set-ups that need to be done, and given below are the details of these setups.

Using Rapidapi to get a set of proxies: 

  • Firstly, you need to buy a subscription of this API from rapidapi and then go to dashboard and select Python and copy the api_key.
  • Initialize the headers with the API key and the rapidapi host.

Syntax:

headers = {

       ‘x-rapidapi-key’: “paste_api_key_here”,

       ‘x-rapidapi-host’: “proxy-orbit1.p.rapidapi.com”

       }

  • Send a GET request to the API along with headers ,

Syntax:

response = requests.request(“GET”, url, headers=headers)

  • This will return a JSON, parsing the text using json.loads(), we can find the proxy server address in the “curl” key.

Syntax:

response = json.loads(response.text)

proxy = response[‘curl’]

Sending Proxy in requests.get() as parameter:

Sending a GET request using requests.get() along with a proxy to this url which will return the proxy server address of current session.

Syntax:

 # Note : Opening https://ipecho.net/plain in browser will show the current ip address of the session.

 proxies = ‘http://78.47.16.54:80’

 page = requests.get(‘https://ipecho.net/plain’, proxies={“http”: proxy, “https”: proxy})

 print(page.text)

Program:

Python3




import requests
import json
  
  
# Gets proxies from rapidapi to create
# a set of proxies.
# Use this function only if you have rapidapi key.
def create_proxy():
  
    # Initialise the headers and paste the API key
    # of proxy-orbit1 from rapidapi.
    headers = {
        'x-rapidapi-key': "paste_api_key_here",
        'x-rapidapi-host': "proxy-orbit1.p.rapidapi.com"
    }
  
    # Sends a GET request to the above url along with api
    # keys which returns an object containing data in json
    # format which is then parsed using json.loads.
    response = requests.request("GET", url, headers=headers)
    response = json.loads(response.text)
  
    # The proxy server ip address is present in 'curl' key.
    proxy = response['curl']
    return proxy
  
  
# Main Function
if __name__ == "__main__":
  
    # Create an empty set and call the create_proxy()
    # function to generate a set of proxies from rapidapi.
    # Orbit proxy Rapid api key is required.
    proxies = set()
    print("Creating Proxy List")
    for __ in range(10):
        proxies.add(create_proxy())
  
    # If you do not have rapidapi then create a set of
    # proxies manually.
    # proxies = {'http://78.47.16.54:80',
  
    # Iterate the proxies and check if it is working.
    for proxy in proxies:
        print("\nChecking proxy:", proxy)
        try:
  
            # https://ipecho.net/plain returns the ip address
            # of the current session if a GET request is sent.
            page = requests.get('https://ipecho.net/plain',
                                proxies={"http": proxy, "https": proxy})
            print("Status OK, Output:", page.text)
        except OSError as e:
  
            # Proxy returns Connection error
            print(e)


Output:


My Personal Notes arrow_drop_up
Last Updated : 13 Jul, 2021
Like Article
Save Article
Similar Reads
Related Tutorials