How to split a big dataframe into smaller ones in R?
In this article, we are going to learn how to split and write very large data frames into slices in the R programming language.
We know we have to deal with large data frames, and that is something which is not easy, So to deal with such large data frames, it is very much helpful to split big data frames into many smaller ones. We often use split functions to do the task. To split very large data frames, there are various steps let’s have a look at that.
Step 1: Let’s take a data frame on which we are going to apply the split operation to break it into small chunks.
P Q R SP1 2012-01 123 SP2 2022-01 143 SP3 2022-01 342 SP1 2022-02 542 SP2 2022-02 876 SP3 2022-02 982 SP1 2022-03 884 SP2 2022-03 936 SP3 2022-03 987
Step 2: Now, in this step, we need something which returns the data into the form of a table, and for that, we will use read.table() function. read.table() function is used to read the data from a text file, and then it returns the data in the form of a table. There are various arguments supported by this function, such as text files, headers, etc.
Syntax: read.table(filename, header = FALSE, sep = “”)
header: represents if the file contains header row or not.
sep: represents the delimiter value used in file.
P Q R 1 SP1 2012-01 123 2 SP2 2022-01 143 3 SP3 2022-01 342 4 SP1 2022-02 542 5 SP2 2022-02 876 6 SP3 2022-02 982 7 SP1 2022-03 884 8 SP2 2022-03 936 9 SP3 2022-03 987
Step 3: In this step, we will split the data frames into smaller ones, and for that, we have to use the split() function. It is a built-in R function that divides the vector or data frame into smaller groups according to the function’s parameters.
Syntax: split(x, f, drop = FALSE)
x: represents data vector or data frame
f: represents factor to divide the data
drop: represents logical value which indicates if levels that do not occur should be dropped
We need to create some new data frames using the content of any column i.e., Q and P. We will be using the content of column Q, and after that, name the data frames too; below is the code and screenshot referring to how to make a new data frame using the split function, name it and print the new data frame, Below used df1 is the name of the new data frame.
$`2012-01` P Q R 1 SP1 2012-01 123 $`2022-01` P Q R 2 SP2 2022-01 143 3 SP3 2022-01 342 $`2022-02` P Q R 4 SP1 2022-02 542 5 SP2 2022-02 876 6 SP3 2022-02 982 $`2022-03` P Q R 7 SP1 2022-03 884 8 SP2 2022-03 936 9 SP3 2022-03 987
Step 4: In this step, we will create a new data frame using column P’s content and naming it df2. Below code and screenshot refers to how to make a new data frame using the split() function, name it and print the new data frame, Below used df2 is the name of the new data frame.
$SP1 P Q R 1 SP1 2012-01 123 4 SP1 2022-02 542 7 SP1 2022-03 884 $SP2 P Q R 2 SP2 2022-01 143 5 SP2 2022-02 876 8 SP2 2022-03 936 $SP3 P Q R 3 SP3 2022-01 342 6 SP3 2022-02 982 9 SP3 2022-03 987
We can see from the output that SP1, SP2, and SP3 are separated, and that’s how we can split the large data frames into smaller ones.