Table of Contents
Need Help? Get in Touch!
The apply function in R is used as a fast and simple alternative to loops. It allows users to apply a function to a vector or data frame by row, by column or to the entire data frame. Below are a few basic uses of this powerful function as well as one of it’s sister functions lapply. There are other functions in the apply family (sapply, mapply, rollapply, etc.) that I won’t discuss during this tutorial.
First I create a data frame.
df = data.frame(x1 = 10001:10010 , x2 = 11:20 , x3 = 21:30) df
## x1 x2 x3 ## 1 10001 11 21 ## 2 10002 12 22 ## 3 10003 13 23 ## 4 10004 14 24 ## 5 10005 15 25 ## 6 10006 16 26 ## 7 10007 17 27 ## 8 10008 18 28 ## 9 10009 19 29 ## 10 10010 20 30
The apply function has three basic arguments. First is the data to manipulate (df), second is MARGIN which is how the function will traverse the data frame and third is FUN, the function to be applied (in this case the mean).
- MARGIN = 1 means apply the function by rows
- MARGIN = 2 means apply by column
- MARGIN = c(1,2) means apply to the entire data frame.
Below I calculate the mean of each column of the data frame. The output is a vector of length 3.
x = apply(df, MARGIN = 2, FUN = mean) x
## x1 x2 x3 ## 10005.5 15.5 25.5
And a vector of length 10 when I apply with MARGIN = 1 (by row).
x = apply(df, MARGIN = 1, FUN = mean) x
## [1] 3344.333 3345.333 3346.333 3347.333 3348.333 3349.333 3350.333 ## [8] 3351.333 3352.333 3353.333
The applied function can also be user-defined. Here I create a function where I calculate the mean of the input and add 1 to it.
#Define the function func = function(x) mean(x) + 1 #Apply by column x = apply(df, MARGIN = 2, FUN = func) x
## x1 x2 x3 ## 10006.5 16.5 26.5
Another usage is to apply a function to each element of a data frame. In the example below I add a dollar sign to each element of the data frame.
#Apply function to each element of data frame func = function(x) paste0('## x1 x2 x3 ## [1,] "$10001" "$11" "$21" ## [2,] "$10002" "$12" "$22" ## [3,] "$10003" "$13" "$23" ## [4,] "$10004" "$14" "$24" ## [5,] "$10005" "$15" "$25" ## [6,] "$10006" "$16" "$26" ## [7,] "$10007" "$17" "$27" ## [8,] "$10008" "$18" "$28" ## [9,] "$10009" "$19" "$29" ## [10,] "$10010" "$20" "$30"For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
df = data.frame(x1 = 10001:10010 , x2 = 11:20 , x3 = 21:30 , x4 = LETTERS[1:10]) df## x1 x2 x3 x4 ## 1 10001 11 21 A ## 2 10002 12 22 B ## 3 10003 13 23 C ## 4 10004 14 24 D ## 5 10005 15 25 E ## 6 10006 16 26 F ## 7 10007 17 27 G ## 8 10008 18 28 H ## 9 10009 19 29 I ## 10 10010 20 30 JNotice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
#Apply function to only certain column types of dataset func = function(x) { if (is.numeric(x)) #Test is input is numeric paste0(' Then I apply the function by column using lapply.x = lapply(df, FUN = func) #Apply function to each element of the data frame x## $x1 ## [1] "$10,001" "$10,002" "$10,003" "$10,004" "$10,005" "$10,006" "$10,007" ## [8] "$10,008" "$10,009" "$10,010" ## ## $x2 ## [1] "$11" "$12" "$13" "$14" "$15" "$16" "$17" "$18" "$19" "$20" ## ## $x3 ## [1] "$21" "$22" "$23" "$24" "$25" "$26" "$27" "$28" "$29" "$30" ## ## $x4 ## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
x = data.frame(lapply(df, FUN = func)) #Apply function to each element of the data frame x## x1 x2 x3 x4 ## 1 $10,001 $11 $21 A ## 2 $10,002 $12 $22 B ## 3 $10,003 $13 $23 C ## 4 $10,004 $14 $24 D ## 5 $10,005 $15 $25 E ## 6 $10,006 $16 $26 F ## 7 $10,007 $17 $27 G ## 8 $10,008 $18 $28 H ## 9 $10,009 $19 $29 I ## 10 $10,010 $20 $30 JFinally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
#Nested for loop to format dataframe as currency for(j in 1:ncol(df)){ if (is.numeric(df[, j])) { for (i in 1:nrow(df)) { df[i,j] = paste0('## x1 x2 x3 x4 ## 1 $10,001 $11 $21 A ## 2 $10002 $12 $22 B ## 3 $10003 $13 $23 C ## 4 $10004 $14 $24 D ## 5 $10005 $15 $25 E ## 6 $10006 $16 $26 F ## 7 $10007 $17 $27 G ## 8 $10008 $18 $28 H ## 9 $10009 $19 $29 I ## 10 $10010 $20 $30 JThis produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
xFor this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(x, big.mark = ',')) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}
Then I apply the function by column using lapply.The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
xFor this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(df[i,j], big.mark = ‘,’))
}
i = 1
}
}
dfThis produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
xFor this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, format(x, big.mark = ‘,’)) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
xFor this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.
Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.
Then I apply the function by column using lapply.
The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.
Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.
This produces the same output but is more complex to code.
Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.
Contact Red Oak Strategic
From cloud migrations to machine learning & AI - maximize your data and analytics capabilities with support from an AWS Advanced Tier consulting partner.
Related Posts
Data
Introduction The age of Big Data is upon us. From smartphones to sensors, the volume of data generated daily is staggering and presents chal...
Patrick Stewart
Data Processing
Introduction The advent of artificial intelligence (AI) in the business world is not just a fleeting trend; it's a transformative force resh...
Tyler Sanders
Data
Amazon Web Services (AWS) is, in my opinion, the single most powerful business-building tool in existence today. Bold claim, but with more t...
Tyler Sanders
Ready to get started?
Kickstart your cloud and data transformation journey with a complimentary conversation with the Red Oak team.