Working with XML Files in R Programming
XML which stands for Extensible Markup Language is made up of markup tags, wherein each tag illustrates the information carried by the particular attribute in the XML file. We can work with the XML files using the XML package provided by R. The package has to be explicitly installed using the following command:
install.packages("XML")
Creating XML file
XML files can be created by saving the data with the respective tags containing information about the content and saving it with ‘.xml’.
We will use the following XML file ‘sample.xml’ to see the various operations that can be performed on the file:
HTML
< RECORDS > < STUDENT > < ID >1</ ID > < NAME >Alia</ NAME > < MARKS >620</ MARKS > < BRANCH >IT</ BRANCH > </ STUDENT > < STUDENT > < ID >2</ ID > < NAME >Brijesh</ NAME > < MARKS >440</ MARKS > < BRANCH >Commerce</ BRANCH > </ STUDENT > < STUDENT > < ID >3</ ID > < NAME >Yash</ NAME > < MARKS >600</ MARKS > < BRANCH >Humanities</ BRANCH > </ STUDENT > < STUDENT > < ID >4</ ID > < NAME >Mallika</ NAME > < MARKS >660</ MARKS > < BRANCH >IT</ BRANCH > </ STUDENT > < STUDENT > < ID >5</ ID > < NAME >Zayn</ NAME > < MARKS >560</ MARKS > < BRANCH >IT</ BRANCH > </ STUDENT > </ RECORDS > |
Reading XML File
The XML file can be read after installing the package and then parsing it with xmlparse() function, which takes as input the XML file name and prints the content of the file in the form of a list. The file should be located in the current working directory. An additional package named ‘methods’ should also be installed. The following code can be used to read the contents of the file “sample.xml”.
Python3
# loading the library and other important packages library( "XML" ) library( "methods" ) # the contents of sample.xml are parsed data < - xmlParse( file = "sample.xml" ) print (data) |
Output:
1 Alia 620 IT 2 Brijesh 440 Commerce 3 Yash 600 Humanities 4 Mallika 660 IT 5 Zayn 560 IT
Extracting information about the XML file
XML files can be parsed and operations can be performed on its various components. There are various in-built functions available in R, to extract the information of the nodes associated with the file, getting the number of nodes in the file, and also the specific attributes of some particular node in the file.
Python3
# loading the library and other important packages library( "XML" ) library( "methods" ) # the contents of sample.xml are parsed # Load the packages required to read XML files. library( "XML" ) library( "methods" ) # Give the input file name to the function. res < - xmlParse( file = "sample.xml" ) # Extract the root node. rootnode < - xmlRoot(res) # number of nodes in the root. nodes < - xmlSize(rootnode) # get entire contents of a record second_node < - rootnode[ 2 ] # get 3rd attribute of 4th record attri < - rootnode[[ 4 ]][[ 3 ]] cat( 'number of nodes: ' , nodes) print ( 'details of 2 record: ' ) print (second_node) # prints the marks of the fourth record print ( '3rd attribute of 4th record: ' , attr) |
Output:
[1] number of nodes: 5 [2] details of 2 record: $STUDENT 2 Brijesh 440 Commerce [3] 3rd attribute of 4th record: 660
Conversion of XML to dataframe
In order to enhance the readability of the data, the XML data can be converted into a data frame consisting of a data frame comprising of rows and columns. R contains an in-built function xmlToDataFrame() which contains as input the XML file and outputs the corresponding data in the form of a data frame. This simulates the easy handling and processing of large amounts of data.
Python3
# Load the required packages. library( "XML" ) library( "methods" ) # Convert the input xml file to a data frame. dataframe < - xmlToDataFrame( "sample.xml" ) print (dataframe) |
Output:
ID NAME MARKS BRANCH 1 1 Alia 620 IT 2 2 Brijesh 440 Commerce 3 3 Yash 600 Humanities 4 4 Mallika 660 IT 5 5 Zayn 560 IT
Please Login to comment...