How to get the next page on BeautifulSoup?
Last Updated :
16 May, 2021
In this article, we are going to see how to Get the next page on beautifulsoup.
Modules Needed
- BeautifulSoup: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. To install this module type the below command in the terminal.
pip install bs4
- requests: This library allows you to send HTTP/1.1 requests extremely easily. To install this module type the below command in the terminal.
pip install requests
Approach:
Get the next page on beautifulsoup means first we will scrap one-page content and if many links are given on the page, and we want to scrap them also. We can get the next page first we will scrap the sample website after that any other links find, and we will call again requests. Get method for that page and will create a soup of that also. So this way we can get to the next page on beautifulsoup.
Let’s execute the script step-by-step :
Step 1: Import all dependence
from bs4 import BeautifulSoup
import requests
Step 2: We need to request the page URL with requests.
page=requests.get(sample_website)
Step 3: With the help of beautifulsoup method and HTML parser we will create a soup of the page.
soup = BeautifulSoup(page, 'html.parser')
Step 4:
We will search in the parse tree and find the link. If we want that URL, then with the help of the requests module and beautiful module we will again create the soup of the next page hence we can get the next page on beautifulsoup.
Python3
for i in soup.find_all( 'a' , href = True ):
if ( "www.geeksforgeeks.org" in i[ 'href' ]):
nextpage = requests.get(i[ 'href' ])
nextsoup = BeautifulSoup(nextpage.content, 'html.parser' )
print ( "next url title : " ,nextsoup.find( 'title' ).string)
|
Below is the full Implementation:
Python3
from bs4 import BeautifulSoup
import requests
page = requests.get(sample_website)
soup = BeautifulSoup(page.content, 'html.parser' )
for i in soup.find_all( 'a' , href = True ):
if ( "www.geeksforgeeks.org" in i[ 'href' ]):
nextpage = requests.get(i[ 'href' ])
nextsoup = BeautifulSoup(nextpage.content, 'html.parser' )
print ( "next url title : " ,nextsoup.find( 'title' ).string)
|
Output:
next url title : GeeksforGeeks | A computer science portal for geeks
next url title : Analysis of Algorithms | Set 1 (Asymptotic Analysis) - GeeksforGeeks
next url title : Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - GeeksforGeeks
next url title : Analysis of Algorithms | Set 3 (Asymptotic Notations) - GeeksforGeeks
next url title : Analysis of algorithms | little o and little omega notations - GeeksforGeeks
next url title : Lower and Upper Bound Theory - GeeksforGeeks
next url title : Analysis of Algorithms | Set 4 (Analysis of Loops) - GeeksforGeeks
next url title : Analysis of Algorithm | Set 4 (Solving Recurrences) - GeeksforGeeks
next url title : Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - GeeksforGeeks
next url title : What does 'Space Complexity' mean? - GeeksforGeeks
next url title : Pseudo-polynomial Algorithms - GeeksforGeeks
next url title : Polynomial Time Approximation Scheme - GeeksforGeeks
next url title : A Time Complexity Question - GeeksforGeeks
.................................................................
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...