Open In App

Extract feed details from RSS in Python

Improve
Improve
Like Article
Like
Save
Share
Report

In the article, we will be seeing how to extract feed and post details using RSS feed for a Hashnode blog. Although we are going to use it for blogs on Hashnode it can be used for other feeds as well.

RSS means Rich Site Summary and uses standard web formats to publish information that changes frequently like blog posts, news, audio, video, etc. RSS documents often know as feed which consists of text, and metadata, like time and author’s name.

Installing feed parser:

We will be using the Feedparser python library for parsing the RSS feed of the blog. It is quite a popular library for parsing blog feeds.

pip install feedparser

Let’s understand this stepwise:

Step 1: Getting RSS feed

Use the feedparser.parse() function for creating a feed object which contains parsed blog. It takes the URL of the blog feed.

Python3




# url of blog feed
  
blog_feed = feedparser.parse(feed_url)


Step 2: Getting details from the blog.

Python3




# returns title of the blog site
blog_feed.feed.title 
  
# returns the link of the blog
# and number of entries(blogs) in the site.
blog_feed.feed.link
len(blog_feed.entries)
  
# Details of individual blog can
# be accessed by using attribute name
print(blog_feed.entries[0].title)
print(blog_feed.entries[0].link)
print(blog_feed.entries[0].author)
print(blog_feed.entries[0].published)
  
# Getting lists of tags and authors.
tags = [tag.term for tag in blog_feed.entries[0].tags]
authors= [author.name for author in blog_feed.entries[0].authors]


Below is the full implementation: Now use the above code to write a function that takes the link of RSS feed and returns the details.

Python3




def get_posts_details(rss=None):
    
    """
    Take link of rss feed as argument
    """
    if rss is not None:
        
          # import the library only when url for feed is passed
        import feedparser
          
        # parsing blog feed
        blog_feed = blog_feed = feedparser.parse(rss)
          
        # getting lists of blog entries via .entries
        posts = blog_feed.entries
          
        # dictionary for holding posts details
        posts_details = {"Blog title" : blog_feed.feed.title,
                        "Blog link" : blog_feed.feed.link}
          
        post_list = []
          
        # iterating over individual posts
        for post in posts:
            temp = dict()
              
            # if any post doesn't have information then throw error.
            try:
                temp["title"] = post.title
                temp["link"] = post.link
                temp["author"] = post.author
                temp["time_published"] = post.published
                temp["tags"] = [tag.term for tag in post.tags]
                temp["authors"] = [author.name for author in post.authors]
                temp["summary"] = post.summary
            except:
                pass
              
            post_list.append(temp)
          
        # storing lists of posts in the dictionary
        posts_details["posts"] = post_list 
          
        return posts_details # returning the details which is dictionary
    else:
        return None
  
if __name__ == "__main__":
  import json
  
  
  data = get_posts_details(rss = feed_url) # return blogs data as a dictionary
    
  if data:
    # printing as a json string with indentation level = 2
    print(json.dumps(data, indent=2)) 
  else:
    print("None")


Output:



Last Updated : 24 Jan, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads