How to get the next page on BeautifulSoup?

Last Updated : 16 May, 2021

In this article, we are going to see how to Get the next page on beautifulsoup.

Modules Needed

BeautifulSoup: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. To install this module type the below command in the terminal.

pip install bs4

requests: This library allows you to send HTTP/1.1 requests extremely easily. To install this module type the below command in the terminal.

pip install requests

Approach:

Get the next page on beautifulsoup means first we will scrap one-page content and if many links are given on the page, and we want to scrap them also. We can get the next page first we will scrap the sample website after that any other links find, and we will call again requests. Get method for that page and will create a soup of that also. So this way we can get to the next page on beautifulsoup.

Let’s execute the script step-by-step :

Step 1: Import all dependence

from bs4 import BeautifulSoup
import requests

Step 2: We need to request the page URL with requests.

page=requests.get(sample_website)

Step 3: With the help of beautifulsoup method and HTML parser we will create a soup of the page.

soup = BeautifulSoup(page, 'html.parser')

Step 4:

We will search in the parse tree and find the link. If we want that URL, then with the help of the requests module and beautiful module we will again create the soup of the next page hence we can get the next page on beautifulsoup.

Python3

for i in soup.find_all('a', href = True): 
    
  # check all link which is contain 
  # "www.geeksforgeeks.org" string  
  if("www.geeksforgeeks.org" in i['href']): 
      
    # call get method to request next url 
    nextpage = requests.get(i['href']) 
      
    # create soup for next url 
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser') 
      
    # we can scrap any thing of the 
    # next page here we are scraping title of  
    # nexturl page string 
    print("next url title : ",nextsoup.find('title').string) 

Below is the full Implementation:

Python3

from bs4 import BeautifulSoup 
import requests 
  
# sample website 
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
  
# call get method to request the page 
page=requests.get(sample_website) 
  
# with the help of BeautifulSoup 
# method and html parser created soup 
soup = BeautifulSoup(page.content, 'html.parser') 
  
# With the help of find_all 
# method perform searching in parser tree 
for i in soup.find_all('a', href = True): 
    
  # check all link which is contain 
  # "www.geeksforgeeks.org" string  
  if("www.geeksforgeeks.org" in i['href']): 
      
    # call get method to request next url 
    nextpage = requests.get(i['href']) 
      
    # create soup for next url 
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser') 
      
    # we can scrap any thing of the 
    # next page here we are scraping title of  
    # nexturl page string 
    print("next url title : ",nextsoup.find('title').string) 

Output:

next url title :  GeeksforGeeks | A computer science portal for geeks
next url title :  Analysis of Algorithms | Set 1 (Asymptotic Analysis) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 3 (Asymptotic Notations) - GeeksforGeeks
next url title :  Analysis of algorithms | little o and little omega notations - GeeksforGeeks
next url title :  Lower and Upper Bound Theory - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 4 (Analysis of Loops) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 4 (Solving Recurrences) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - GeeksforGeeks
next url title :  What does 'Space Complexity' mean? - GeeksforGeeks
next url title :  Pseudo-polynomial Algorithms - GeeksforGeeks
next url title :  Polynomial Time Approximation Scheme - GeeksforGeeks
next url title :  A Time Complexity Question - GeeksforGeeks
.................................................................

Suggest improvement

BeautifulSoup - Remove the contents of tag

Subset or Filter data with multiple conditions in PySpark

Share your thoughts in the comments