How to get Rank of page in google search results using BeautifulSoup ?
Last Updated :
12 Jul, 2021
In this article, we will learn How to get Google Page Ranking by searching a keyword using Python. Let’s understand the basics of Google ranking then we proceed with its finding using Python.
Google Ranking
Google keyword ranking is the position where the website is present in Google Search when a user searches the keyword. In other words, Google searching is basically sorting through hundreds of billions of web pages in search index to find the most relevant, useful results in a fraction of a second, and present them in a way that helps you find what you’re looking for.
How do we find the rank?
We use a module called requests which has a method get that returns a response containing page content, status, etc, we save the response to an object page and extract the page content from the object using the page.text method and use beautiful soup to parse the document in HTML using python’s inbuilt HTML parser so that we can access the data from the HTML document and get the URL from the searched keyword.
Modules Needed
We need to install two modules, requests, and bs4 respectively via pip in your system.
Requests: The requests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc.
Syntax: pip install requests
Beautiful Soup: The beautiful soup module allows you to parse the raw HTML or XML documents using python’s inbuilt HTML parser so that we can extract data from the parsed document.
pip install bs4
Approach:
1. We need to provide the keyword to be searched and the website to find the rank if it exists in a given number of search queries.
Base url for google searching: “https://www.google.com/search?q=”
Add the keyword by replacing spaces with “+”
Add &num=30 as well representing the number of search results as 30
Final url = “https://www.google.com/search?q=Best+dsa+practice+questions&num=30”
2. Use the requests.get(url) method to send an HTTP request to a Google search which returns a response from the search engine that is saved to the page as an object.
page = requests.get(“https://www.google.com/search?q=Best+dsa+practice+questions&num=30”)
3. Use page.text method to get the page content and parse the raw HTML using beautiful soup.
soup = BeautifulSoup(page.text, 'html.parser')
This creates a parsed tree that helps in accessing data from the HTML document.
4. Find the div, all having the same class named “ZINbbc xpd O9g5cc uUPGi” using soup.find_all() as it contains all the search queries along with URLs inside <a> tag and stores it in result_div. (refer to the image having developer tools below)
result_div = soup.find_all(‘div’, attrs={‘class’: ‘ZINbbc xpd O9g5cc uUPGi’})
Here is an image of developer tools where all the search queries have the same div class but contains different URLs, and we can see here the rank is 2 by checking the URL in <a> tag.
5. Iterate result_div and find <a> tag and check if any URL exists, if found, check if it matches with the provided website in input and adds the rank to the rank_list. (rank_list is a string variable since multiple ranks can occur)
link = div.find("a", href=True)
if link['href'][7:7+len(website)] == website:
rank_list += str(rank)+","
6. After the iteration is over, return the rank_list and print the rank.
Below is the full implementation:
Python3
import requests
from bs4 import BeautifulSoup
def find_rank(keyword, website, search_query):
rank, rank_list = 1 , ""
keyword = keyword.replace( " " , "+" )
url = url + keyword + "&num=" + str (search_query)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser' )
result_div = soup.find_all(
'div' , attrs = { 'class' : 'ZINbbc xpd O9g5cc uUPGi' })
for div in result_div:
try :
link = div.find( "a" , href = True )
if link[ 'href' ][ 7 : 7 + len (website)] = = website:
rank_list + = str (rank) + ","
rank + = 1
except :
pass
return (rank_list, "Website Missing" )[rank_list = = ""]
if __name__ = = "__main__" :
keyword = "dsa practice questions"
search_query = 30
rank = find_rank(keyword, website, search_query)
if rank = = "Website Missing" :
print (rank)
else :
print ( "Rank of Website :" , rank[: - 1 ])
|
Output:
Rank of Website : 1,2
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...