Skip to content
Related Articles

Related Articles

Web Scraping Financial News Using Python

Improve Article
Save Article
  • Difficulty Level : Medium
  • Last Updated : 07 Nov, 2022
Improve Article
Save Article

In this article, we will cover how to extract financial news seamlessly using Python.

This financial news helps many traders in placing the trade in cryptocurrency, bitcoins, the stock markets, and many other global stock markets setting up of trading bot will help us to analyze the data. Thus all this can be done with the help of web scraping using python language that can fetch all the financial news from the given source. Before discussing let’s cover some basic concepts of web scraping.

Module Needed

Request: This module has several built-in methods to make HTTP requests to specified URI using GET, POST, PUT, PATCH, or HEAD requests. An HTTP request is meant to either retrieve data from a specified URI or push data to a server.

pip install requests

Beautiful Soup: Beautiful Soup is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster.

pip install bs4

Steps Required:

Step 1: Import all the required libraries.

from bs4 import BeautifulSoup as BS
import requests as req

Step 2: Find the best website for finance news to get daily updates seamlessly.

https://www.businesstoday.in/latest/economy

Step 3: Inspect the tag in which news content is stored with the help of inspecting the HTML code.

 

Step 4: Now we will check the tag name and use that name in our code, i.e. Here, an anchor tag is used so we will use ‘a’ in our code.

 

Step 5: Specify the class in our code to get all the news heading in the anchor tag.

Python3




# IMPORT ALL LIBRARIES
from bs4 import BeautifulSoup as BS
import requests as req
 
 
webpage = req.get(url)  # YOU CAN EVEN DIRECTLY PASTE THE URL IN THIS
# HERE HTML PARSER IS ACTUALLY THE WHOLE HTML PAGE
trav = BS(webpage.content, "html.parser")
 
# TO GET THE TPYE OF CLASS
# HERE 'a' STANDS FOR ANCHOR TAG IN WHICH NEWS IS STORED
for link in trav.find_all('a'):
    print(type(link.string), " ", link.string)


Output:

The below output shows that it has two types of classes in its anchor tag that are “NoneType” and “bs4.element.NavigableString”.

Output for the type of classes in anchor tag

Output for the type of classes in an anchor tag

Step 6: To Fetch the news-related material we need only “bs4.element.NavigableString” class.

Step 7: Set the limit of the news character length to less than 35 characters.

Below is the complete implementation:

Python3




# IMPORT ALL THE REQUIRED LIBRARIES
from bs4 import BeautifulSoup as BS
import requests as req
 
 
webpage = req.get(url)
trav = BS(webpage.content, "html.parser")
M = 1
for link in trav.find_all('a'):
   
    # PASTE THE CLASS TYPE THAT WE GET
    # FROM THE ABOVE CODE IN THIS AND
    # SET THE LIMIT GREATER THAN 35
    if(str(type(link.string)) == "<class 'bs4.element.NavigableString'>"
       and len(link.string) > 35):
 
        print(str(M)+".", link.string)
        M += 1


Output:

 


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!