Extracting Code From GeeksForGeeks Article
Prerequisite:
Modules Needed
- requests- Requests allows you to send HTTP/1.1 requests extremely easily. This module also doesn’t come built-in with Python. To install simply type the given command in the terminal.
pip install requests
- bs4 :- Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this, type the given command in the terminal.
pip install bs4
Approach:
- Import modules
- Get the article name as input
- Initiate a get request to the URL
- Scrap the code and language name in which it is written using bs4
A lot can be done with this concept and using the given, for example you can directly save each code in separate file with their extension or you can scrap complete article and extract important information like writer details.
Below is the implementation.
Python3
import requests from bs4 import BeautifulSoup # input geeks for geeks article article = 'extract-authors-information-from-geeksforgeeks-article-using-python' index_Code = 3 # url # Making a GET request # to fetch article from # geeksforgeeks servers def getdata(url): r = requests.get(url) return r.text def codescrapper(soup, article = None ): codes_languages = soup.find_all( 'h2' , class_ = 'tabtitle' ) codes = soup.find_all( "div" , class_ = 'code-container' ) count_codes_language = len (codes_languages) print (url) if article and article < = count_codes_language: print (codes[article - 1 ].get_text()) else : for x in range (count_codes_language): print (codes[x].get_text()) if __name__ = = '__main__' : complete_article_html = getdata(url) soup = BeautifulSoup(complete_article_html, 'html.parser' ) codescrapper(soup, index_Code) |
Output:
Please Login to comment...