Extract feed details from RSS in Python
In the article, we will be seeing how to extract feed and post details using RSS feed for a Hashnode blog. Although we are going to use it for blogs on Hashnode it can be used for other feeds as well.
RSS means Rich Site Summary and uses standard web formats to publish information that changes frequently like blog posts, news, audio, video, etc. RSS documents often know as feed which consists of text, and metadata, like time and author’s name.
Installing feed parser:
We will be using the Feedparser python library for parsing the RSS feed of the blog. It is quite a popular library for parsing blog feeds.
pip install feedparser
Let’s understand this stepwise:
Step 1: Getting RSS feed
Use the feedparser.parse() function for creating a feed object which contains parsed blog. It takes the URL of the blog feed.
Python3
blog_feed = feedparser.parse(feed_url)
|
Step 2: Getting details from the blog.
Python3
blog_feed.feed.title
blog_feed.feed.link
len (blog_feed.entries)
print (blog_feed.entries[ 0 ].title)
print (blog_feed.entries[ 0 ].link)
print (blog_feed.entries[ 0 ].author)
print (blog_feed.entries[ 0 ].published)
tags = [tag.term for tag in blog_feed.entries[ 0 ].tags]
authors = [author.name for author in blog_feed.entries[ 0 ].authors]
|
Below is the full implementation: Now use the above code to write a function that takes the link of RSS feed and returns the details.
Python3
def get_posts_details(rss = None ):
if rss is not None :
import feedparser
blog_feed = blog_feed = feedparser.parse(rss)
posts = blog_feed.entries
posts_details = { "Blog title" : blog_feed.feed.title,
"Blog link" : blog_feed.feed.link}
post_list = []
for post in posts:
temp = dict ()
try :
temp[ "title" ] = post.title
temp[ "link" ] = post.link
temp[ "author" ] = post.author
temp[ "time_published" ] = post.published
temp[ "tags" ] = [tag.term for tag in post.tags]
temp[ "authors" ] = [author.name for author in post.authors]
temp[ "summary" ] = post.summary
except :
pass
post_list.append(temp)
posts_details[ "posts" ] = post_list
return posts_details
else :
return None
if __name__ = = "__main__" :
import json
data = get_posts_details(rss = feed_url)
if data:
print (json.dumps(data, indent = 2 ))
else :
print ( "None" )
|
Output:
Last Updated :
24 Jan, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...