dplyr Package in R Programming
The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.
The dplyr Package in R performs the steps given below quicker and in an easier fashion:
- By limiting the choices the focus can now be more on data manipulation difficulties.
- There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
- There are valuable backends and hence waiting time for the computer reduces.
Important Verb Functions
dplyr package provides various important functions that can be used for Data Manipulation. These are:
- filter() Function: For choosing cases and using their values as a base for doing so.
R
# Create a data frame with missing data d < - data.frame (name= c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age= c (7, 5, 9, 16), ht= c (46, NA , NA , 69), school= c ( "yes" , "yes" , "no" , "no" )) d # Finding rows with NA value d % > % filter ( is.na (ht)) # Finding rows with no NA value d % > % filter (! is.na (ht)) |
Output:
# A tibble: 4 x 4 name age ht school 1 Abhi 7 46 yes 2 Bhavesh 5 NA yes 3 Chaman 9 NA no 4 Dimri 16 69 no # A tibble: 2 x 4 name age ht school 1 Bhavesh 5 NA yes 2 Chaman 9 NA no # A tibble: 2 x 4 name age ht school 1 Abhi 7 46 yes 2 Dimri 16 69 no
- arrange(): For reordering of the cases.
R
# Create a data frame with missing data d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age = c (7, 5, 9, 16), ht = c (46, NA , NA , 69), school = c ( "yes" , "yes" , "no" , "no" ) ) # Arranging name according to the age d.name<- arrange (d, age) print (d.name) |
Output:
# A tibble: 4 x 4 name age ht school 1 Bhavesh 5 NA yes 2 Abhi 7 46 yes 3 Chaman 9 NA no 4 Dimri 16 69 no
- select() and rename(): For choosing variables and using their names as a base for doing so.
R
# Create a data frame with missing data d < - data.frame (name= c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age= c (7, 5, 9, 16), ht= c (46, NA , NA , 69), school= c ( "yes" , "yes" , "no" , "no" )) # startswith() function to print only ht data select (d, starts_with ( "ht" )) # -startswith() function to print # everything except ht data select (d, - starts_with ( "ht" )) # Printing column 1 to 2 select (d, 1: 2) # Printing data of column # heading containing 'a' select (d, contains ( "a" )) # Printing data of column # heading which matches 'na' select (d, matches ( "na" )) |
Output:
# A tibble: 4 x 1 ht 1 46 2 NA 3 NA 4 69 # A tibble: 4 x 3 name age school 1 Abhi 7 yes 2 Bhavesh 5 yes 3 Chaman 9 no 4 Dimri 16 no # A tibble: 4 x 2 name age 1 Abhi 7 2 Bhavesh 5 3 Chaman 9 4 Dimri 16 # A tibble: 4 x 2 name age 1 Abhi 7 2 Bhavesh 5 3 Chaman 9 4 Dimri 16 # A tibble: 4 x 1 name 1 Abhi 2 Bhavesh 3 Chaman 4 Dimri
- mutate() and transmute(): Addition of new variables which are the functions of prevailing variables.
R
# Create a data frame with missing data d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age = c (7, 5, 9, 16), ht = c (46, NA , NA , 69), school = c ( "yes" , "yes" , "no" , "no" ) ) # Calculating a variable x3 which is sum of height # and age printing with ht and age mutate (d, x3 = ht + age) # Calculating a variable x3 which is sum of height # and age printing without ht and age transmute (d, x3 = ht + age) |
Output:
# A tibble: 4 x 5 name age ht school x3 1 Abhi 7 46 yes 53 2 Bhavesh 5 NA yes NA 3 Chaman 9 NA no NA 4 Dimri 16 69 no 85 # A tibble: 4 x 1 x3 1 53 2 NA 3 NA 4 85 >
- summarise(): Condensing various values to one value.
R
# Create a data frame with missing data d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age = c (7, 5, 9, 16), ht = c (46, NA , NA , 69), school = c ( "yes" , "yes" , "no" , "no" ) ) # Calculating mean of age summarise (d, mean = mean (age)) # Calculating min of age summarise (d, med = min (age)) # Calculating max of age summarise (d, med = max (age)) # Calculating median of age summarise (d, med = median (age)) |
Output:
# A tibble: 1 x 1 mean 1 9.25 # A tibble: 1 x 1 med 1 5 # A tibble: 1 x 1 med 1 16 # A tibble: 1 x 1 med 1 8
- sample_n() and sample_frac(): For taking random specimens.
R
# Create a data frame with missing data d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ), age = c (7, 5, 9, 16), ht = c (46, NA , NA , 69), school = c ( "yes" , "yes" , "no" , "no" ) ) # Printing three rows sample_n (d, 3) # Printing 50 % of the rows sample_frac (d, 0.50) |
Output:
# A tibble: 3 x 4 name age ht school 1 Abhi 7 46 yes 2 Bhavesh 5 NA yes 3 Chaman 9 NA no # A tibble: 2 x 4 name age ht school 1 Dimri 16 69 no 2 Bhavesh 5 NA yes
Please Login to comment...