The apply function in R is used as a fast and simple alternative to loops. It allows users to apply a function to a vector or data frame by row, by column or to the entire data frame. Below are a few basic uses of this powerful function as well as one of it’s sister functions lapply. There are other functions in the apply family (sapply, mapply, rollapply, etc.) that I won’t discuss during this tutorial.
The apply function has three basic arguments. First is the data to manipulate (df), second is MARGIN which is how the function will traverse the data frame and third is FUN, the function to be applied (in this case the mean).
MARGIN = 1 means apply the function by rows
MARGIN = 2 means apply by column
MARGIN = c(1,2) means apply to the entire data frame.
Below I calculate the mean of each column of the data frame. The output is a vector of length 3.
x = apply(df, MARGIN = 2, FUN = mean)
x
## x1 x2 x3
## 10005.5 15.5 25.5
And a vector of length 10 when I apply with MARGIN = 1 (by row).
For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
## x1 x2 x3 x4
## 1 10001 11 21 A
## 2 10002 12 22 B
## 3 10003 13 23 C
## 4 10004 14 24 D
## 5 10005 15 25 E
## 6 10006 16 26 F
## 7 10007 17 27 G
## 8 10008 18 28 H
## 9 10009 19 29 I
## 10 10010 20 30 J
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
#Apply function to only certain column types of dataset
func = function(x) {
if (is.numeric(x)) #Test is input is numeric
paste0('
Then I apply the function by column using lapply.
x = lapply(df, FUN = func) #Apply function to each element of the data frame
x
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
x = data.frame(lapply(df, FUN = func)) #Apply function to each element of the data frame
x
## x1 x2 x3 x4
## 1 $10,001 $11 $21 A
## 2 $10,002 $12 $22 B
## 3 $10,003 $13 $23 C
## 4 $10,004 $14 $24 D
## 5 $10,005 $15 $25 E
## 6 $10,006 $16 $26 F
## 7 $10,007 $17 $27 G
## 8 $10,008 $18 $28 H
## 9 $10,009 $19 $29 I
## 10 $10,010 $20 $30 J
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
#Nested for loop to format dataframe as currency
for(j in 1:ncol(df)){
if (is.numeric(df[, j])) {
for (i in 1:nrow(df)) {
df[i,j] = paste0('
## x1 x2 x3 x4
## 1 $10,001 $11 $21 A
## 2 $10002 $12 $22 B
## 3 $10003 $13 $23 C
## 4 $10004 $14 $24 D
## 5 $10005 $15 $25 E
## 6 $10006 $16 $26 F
## 7 $10007 $17 $27 G
## 8 $10008 $18 $28 H
## 9 $10009 $19 $29 I
## 10 $10010 $20 $30 J
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x
For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(x, big.mark = ',')) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x
For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(df[i,j], big.mark = ‘,’))
}
i = 1
}
}
df
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x
For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(x, big.mark = ‘,’)) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x
For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
Contact Red Oak Strategic
From cloud migrations to machine learning & AI - maximize your data and analytics capabilities with support from an AWS Advanced Tier consulting partner.