Skip to content
Related Articles
Open in App
Not now

Related Articles

Windows Function in R using Dplyr

Improve Article
Save Article
  • Last Updated : 10 Nov, 2022
Improve Article
Save Article

Aggregation functions in R are used to take a bunch of values and give us output as a single value. Some of the examples of aggregation methods are the sum and mean. Windows functions in R provide a variation to the aggregation methods in the sense that they return the number of outputs equivalent to the number of inputs if n number of inputs are taken, n outputs are returned. In this article, we will discuss the various windows functions that are available in R.

The functions we will covering in this articles are : 

row_number To rank the values.
min_rank To compute the rank so that the minimum rank until that element is thrown as output.
percent_rank To compute the rank so that the percentage rank between the values 0 and 1 is returned. 
cume_dist To compute a proportion of all values at most equal to the current rank.
Lead To compute the next element in sequence of values specified in the vector. 
Lag To compute the previous element in sequence of values specified in the vector. 
Cum Sum Method To compute the sum of values encountered till that particular index. 
Cum Prod Method To compute the product of values encountered till that particular index. 
Cum Min Method To calculate the minimum value encountered until that particular index value. 
Cum Max Method To calculate the maximum value encountered until that particular index value.
Cum Mean Method To calculate the mean value encountered until that particular index value.
Cum Any Method To check if any of the elements in the vector satisfy the result. 
Cum All Method To check if all of the elements in the vector satisfy the result.

Let’s see the syntax and Code for each function.

Row_number 

The row_number method is considered to be equivalent to the rank method. The missing values are left as it is.

Syntax: row_number(vec)

Arguments:  vec- the vector of values that have to be ranked

R




library(dplyr)
library(data.table)
#creating a data vector
companies =  c("Geekster","Geeksforgeeks","Wipro","TCS")
#printing the original vector
print(companies)
#computing the row number of the used vector
rn <- row_number(companies)
print(rn)


Output:

"Geekster" "Geeksforgeeks" "Wipro" "TCS"          
2 1 4 3

Explanation:

The row numbers of the supplied input vector are computed after sorting the values in increasing order. For instance, the word(GeeksForGeeks) in the first index is the smallest lexicographically. Therefore its row number is 1. This is followed by the word “Geekster” with the row number corresponding to 2. TCS gets row number 3 since it is next in order.

Min_rank

The min_rank method is also used to compute the rank in such a way that the minimum rank until that element is thrown as output.

Syntax: min_rank(vec)

Arguments: vec- the vector of values that have to be ranked

R




#computing the rank of the used vector
companies =  c("Geekster","Geeksforgeeks",
               "Geekster","Wipro","TCS")
min_rank <- min_rank(companies)
print(min_rank)


Output: 

2 1 2 5 4

Here we can see that Geekster has the min_rank of 2 so it assigned the same values i.e. not 3

Percent_rank 

The percent_rank method is also used to compute the rank in such a way that the percentage rank between the values 0 and 1 is returned. 

Syntax: percent_rank(vec)

Arguments: vec- the vector of values that have to be ranked

R




#computing the rank of the used vector
percent_rank <- percent_rank(companies)
print(percent_rank)


Output:

0.25 0.00 0.25 1.00 0.75

The values begin with the 0.0 percentage after being sorted in ascending order. 

Cume_dist 

The cume_dist method in R is equivalent to a cumulative distribution function. It is used to compute a proportion of all values at most equal to the current rank.

Syntax: cume_dist(vec)

Arguments: vec- the vector of values that have to be ranked

R




#computing the cume_dist of the used vector
dist <- cume_dist(companies)
print(dist)


Output:

0.6 0.2 0.6 1.0 0.8

Lead

The lead windows method in R is by default used to compute the next element in sequence of values specified in the vector. The lead value is not applicable for the last element of the input data object. 

Syntax: lead(vec)

Arguments: vec- the vector of values that have to be ranked.

R




#creating a vector
vec <- c(4,3,1,2,5)
print(vec)
 
lead <- lead(vec)
print(lead)


Output:

4 3 1 2 5
3 1 2 5 NA

Lag

The lag windows method in R is by default used to compute the previous element in sequence of values specified in the vector. The lag value is not applicable for the first element of the input data object, since there is no element before it.  

Syntax: lag(vec)

Arguments: vec- the vector of values that have to be ranked

R




lag <- lag(vec)
print(lag)


Output:

NA  4  3  1  2

Explanation : 

The lag method for the first element is not applicable. For the element at 1st index that is 3 , the lag value is equivalent to the value at the 0th index. 

Cum Sum Method

The cumsum() method is used to compute the sum of values encountered till that particular index. The cumsum value of the first element is equivalent to the value itself. 

Syntax: cumsum(vec)

Arguments: vec- the vector of values that have to be ranked

R




#creating a vector
vec <- 1:5
cumsum <- cumsum(vec)
print(cumsum)


Output:

1  3  6 10 15

Explanation : 

The sum of the first index element, 2 in the vector is 1+2 = 3. For the element 3 at index 2 in vector, cumsum = 1 + 2 + 3 = 6. Similarly the cumulative sums can be calculated. 

Cum Prod Method

The cumprod() method is used to compute the product of values encountered till that particular index. The cumprod value of the first element is equivalent to the value itself.

Syntax: cumprod(vec)

Arguments: vec- the vector of values

R




cumprod <- cumprod(vec)
print(cumprod)


Output:

1 2 6 24 120

Explanation :

The product of 0th index element is the value itself, equivalent to 1.The product of the first index element, 2 in the vector is 122 = 3. For the element 3 at index 2 in vector, cumprod = 1 * 2 * 3 = 6. Similarly the cumulative products can be calculated. 

Cum Min Method

The cummin() method is used to calculate the minimum value encountered until that particular index value. 

Syntax: cummin(vec)

Arguments: vec- the vector of values

R




#creating a vector
vec <- c(3,2,1,5,3)
cum_min <- cummin(vec)
print(cum_min)


Output:

3 2 1 1 1

Explanation : 

The min value encountered till first element is the element value itself. In the second element 2, the minimum becomes 2. For the third element, min becomes 1. The fourth element is greater than min value therefore, min remains same. 

Cum Max Method

The cummax() method is used to calculate the maximum value encountered until that particular index value.

Syntax: cummax(vec)

Arguments: vec- the vector of values

R




cum_max <- cummax(vec)
print(cum_max)


Output:

3 3 3 5 5

Cum Mean Method

The cummean() method is used to calculate the mean value encountered until that particular index value.

Syntax: cummean(vec)

Arguments: vec- the vector of values

R




cum_mean <- cummean(vec)
print(cum_mean)


Output:

3.00 2.50 2.00 2.75 2.80

Cum Any Method

The cumany method is used to check if any of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value. 

Syntax: cumany(vec)

Arguments: vec- the vector of values

R




cum_any_3 <- cumany(vec>3)
print("Any vector values greater than 3")
print(cum_any_3)
 
cum_any_0 <- cumany(vec==0)
print("Any vector values equal to 0")
print(cum_any_0)


Output:

"Any vector values greater than 3"
FALSE FALSE FALSE  TRUE  TRUE
"Any vector values equal to 0"
FALSE FALSE FALSE FALSE FALSE

Cum All Method

The cumall method is used to check if all of the elements in the vector satisfy the result. The elements of the vector at any particular index are taken in account to consider the function value. 

Syntax: cumall(vec)

Arguments: vec- the vector of values

R




#using cumall method
cum_any_3 <- cumall(vec>3)
print(cum_any_3)


Output:

FALSE FALSE FALSE FALSE FALSE

My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!