Open In App

How to get Rank of page in google search results using BeautifulSoup ?

Last Updated : 12 Jul, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn How to get Google Page Ranking by searching a keyword using Python. Let’s understand the basics of Google ranking then we proceed with its finding using Python.

Google Ranking

Google keyword ranking is the position where the website is present in Google Search when a user searches the keyword. In other words, Google searching is basically sorting through hundreds of billions of web pages in search index to find the most relevant, useful results in a fraction of a second, and present them in a way that helps you find what you’re looking for. 

How do we find the rank?

We use a module called requests which has a method get that returns a response containing page content, status, etc, we save the response to an object page and extract the page content from the object using the page.text method and use beautiful soup to parse the document in HTML using python’s inbuilt HTML parser so that we can access the data from the HTML document and get the URL from the searched keyword.

Modules Needed

We need to install two modules, requests, and bs4 respectively via pip in your system.

Requests: The requests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc.

Syntax: pip install requests

Beautiful Soup: The beautiful soup module allows you to parse the raw HTML or XML documents using python’s inbuilt HTML parser so that we can extract data from the parsed document.

pip install bs4

Approach:

1. We need to provide the keyword to be searched and the website to find the rank if it exists in a given number of search queries.

Base url for google searching: “https://www.google.com/search?q=” 
Add the keyword by replacing spaces with “+” 
Add &num=30 as well representing the number of search results as 30
Final url = “https://www.google.com/search?q=Best+dsa+practice+questions&num=30”

2. Use the requests.get(url) method to send an HTTP request to a Google search which returns a response from the search engine that is saved to the page as an object.

page = requests.get(“https://www.google.com/search?q=Best+dsa+practice+questions&num=30”)

3. Use page.text method to get the page content and parse the raw HTML using beautiful soup.

soup = BeautifulSoup(page.text, 'html.parser')

This creates a parsed tree that helps in accessing data from the HTML document.

4. Find the div, all having the same class named “ZINbbc xpd O9g5cc uUPGi” using soup.find_all() as it contains all the search queries along with URLs inside <a> tag and stores it in result_div. (refer to the image having developer tools below)

result_div = soup.find_all(‘div’, attrs={‘class’: ‘ZINbbc xpd O9g5cc uUPGi’})

Here is an image of developer tools where all the search queries have the same div class but contains different URLs, and we can see here the rank is 2 by checking the URL in <a> tag.

5. Iterate result_div and find <a> tag and check if any URL exists, if found, check if it matches with the provided website in input and adds the rank to the rank_list. (rank_list is a string variable since multiple ranks can occur)

link = div.find("a", href=True)
if link['href'][7:7+len(website)] == website:
    rank_list += str(rank)+","

6. After the iteration is over, return the rank_list and print the rank.

Below is the full implementation:

Python3




# import the required modules
import requests
from bs4 import BeautifulSoup
 
# Function will get all the ranks of the website
# by searching the keyword in google and returns
# a string of ranks or Website Missing if the website
# doesn't occur in the given number of search queries.
def find_rank(keyword, website, search_query):
 
    # Initialise the required variables
    rank, rank_list = 1, ""
 
    # Base search url of google
 
    # Replaces whitespace with "+" in keyword
    keyword = keyword.replace(" ", "+")
 
    # Base url is updated with the keyword to be
    # searched in given number of search results.
    url = url + keyword + "&num=" + str(search_query)
 
    # requests.get(url) returns a response that is saved
    # in a response object called page.
    page = requests.get(url)
 
    # page.text gives us access to the web data in text
    # format, we pass it as an argument to BeautifulSoup
    # along with the html.parser which will create a
    # parsed tree in soup.
    soup = BeautifulSoup(page.text, 'html.parser')
 
    # soup.find_all finds the div, all having the same
    # class "ZINbbc xpd O9g5cc uUPGi" that is stored
    # in result_div
    result_div = soup.find_all(
        'div', attrs={'class': 'ZINbbc xpd O9g5cc uUPGi'})
 
    # Iterate result_div and check for the given website
    # inside <a> tag adding the rank to the
    # rank_list if found.
    for div in result_div:
        try:
 
          # Finds <a> tag and checks if the url is present,
          # if present then check with the provided
          # website in main()
            link = div.find("a", href=True)
            if link['href'][7:7+len(website)] == website:
                rank_list += str(rank)+","
            rank += 1
        except:
            pass
    return (rank_list, "Website Missing")[rank_list == ""]
 
# Main Function
if __name__ == "__main__":
    keyword = "dsa practice questions"
    website = "https://www.geeksforgeeks.org"
    search_query = 30
    rank = find_rank(keyword, website, search_query)
     
    if rank == "Website Missing":
        print(rank)
    else:
        print("Rank of Website :", rank[:-1])


Output:

Rank of Website : 1,2


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads