Skip to content
Related Articles
Open in App
Not now

Related Articles

Extract feed details from RSS in Python

Improve Article
Save Article
  • Difficulty Level : Easy
  • Last Updated : 24 Jan, 2021
Improve Article
Save Article

In the article, we will be seeing how to extract feed and post details using RSS feed for a Hashnode blog. Although we are going to use it for blogs on Hashnode it can be used for other feeds as well.

RSS means Rich Site Summary and uses standard web formats to publish information that changes frequently like blog posts, news, audio, video, etc. RSS documents often know as feed which consists of text, and metadata, like time and author’s name.

Installing feed parser:

We will be using the Feedparser python library for parsing the RSS feed of the blog. It is quite a popular library for parsing blog feeds.

pip install feedparser

Let’s understand this stepwise:

Step 1: Getting RSS feed

Use the feedparser.parse() function for creating a feed object which contains parsed blog. It takes the URL of the blog feed.


# url of blog feed
blog_feed = feedparser.parse(feed_url)

Step 2: Getting details from the blog.


# returns title of the blog site
# returns the link of the blog
# and number of entries(blogs) in the site.
# Details of individual blog can
# be accessed by using attribute name
# Getting lists of tags and authors.
tags = [tag.term for tag in blog_feed.entries[0].tags]
authors= [ for author in blog_feed.entries[0].authors]

Below is the full implementation: Now use the above code to write a function that takes the link of RSS feed and returns the details.


def get_posts_details(rss=None):
    Take link of rss feed as argument
    if rss is not None:
          # import the library only when url for feed is passed
        import feedparser
        # parsing blog feed
        blog_feed = blog_feed = feedparser.parse(rss)
        # getting lists of blog entries via .entries
        posts = blog_feed.entries
        # dictionary for holding posts details
        posts_details = {"Blog title" : blog_feed.feed.title,
                        "Blog link" :}
        post_list = []
        # iterating over individual posts
        for post in posts:
            temp = dict()
            # if any post doesn't have information then throw error.
                temp["title"] = post.title
                temp["link"] =
                temp["author"] =
                temp["time_published"] = post.published
                temp["tags"] = [tag.term for tag in post.tags]
                temp["authors"] = [ for author in post.authors]
                temp["summary"] = post.summary
        # storing lists of posts in the dictionary
        posts_details["posts"] = post_list 
        return posts_details # returning the details which is dictionary
        return None
if __name__ == "__main__":
  import json
  data = get_posts_details(rss = feed_url) # return blogs data as a dictionary
  if data:
    # printing as a json string with indentation level = 2
    print(json.dumps(data, indent=2)) 


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!