How to remove empty tags using BeautifulSoup in Python?
Prerequisite: Requests, BeautifulSoup, strip
The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Get HTML Code
- Iterate through each tag
- Fetching text from the tag and remove whitespaces using the strip.
- After removing whitespace, check If the length of the text is zero remove the tag from HTML code.
Example 1: Remove empty tag.
Python3
from bs4 import BeautifulSoup
html_object =
soup = BeautifulSoup( html_object , "lxml" )
for x in soup.find_all():
if len (x.get_text(strip = True )) = = 0 :
x.extract()
print (soup)
|
Output:
<html><body><strong>sometexthere</strong>
</body></html>
Example 2: Remove empty tag from a given URL.
Python3
from bs4 import BeautifulSoup
import requests
page = requests.get( URL )
soup = BeautifulSoup( page.content , "lxml" )
for x in soup.find_all():
if len ( x.get_text ( strip = True )) = = 0 :
x.extract()
print (soup)
|
Output:
Last Updated :
26 Nov, 2020
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...