Skip to content

The apply function in R is used as a fast and simple alternative to loops. It allows users to apply a function to a vector or data frame by row, by column or to the entire data frame. Below are a few basic uses of this powerful function as well as one of it’s sister functions lapply. There are other functions in the apply family (sapply, mapply, rollapply, etc.) that I won’t discuss during this tutorial.

First I create a data frame.

df = data.frame(x1 = 10001:10010
                , x2 = 11:20
                , x3 = 21:30)
df
##       x1 x2 x3
## 1  10001 11 21
## 2  10002 12 22
## 3  10003 13 23
## 4  10004 14 24
## 5  10005 15 25
## 6  10006 16 26
## 7  10007 17 27
## 8  10008 18 28
## 9  10009 19 29
## 10 10010 20 30

The apply function has three basic arguments. First is the data to manipulate (df), second is MARGIN which is how the function will traverse the data frame and third is FUN, the function to be applied (in this case the mean).

  • MARGIN = 1 means apply the function by rows
  • MARGIN = 2 means apply by column
  • MARGIN = c(1,2) means apply to the entire data frame.

Below I calculate the mean of each column of the data frame. The output is a vector of length 3.

x = apply(df, MARGIN = 2, FUN = mean)
x
##      x1      x2      x3 
## 10005.5    15.5    25.5

And a vector of length 10 when I apply with MARGIN = 1 (by row).

x = apply(df, MARGIN = 1, FUN = mean)
x
##  [1] 3344.333 3345.333 3346.333 3347.333 3348.333 3349.333 3350.333
##  [8] 3351.333 3352.333 3353.333

The applied function can also be user-defined. Here I create a function where I calculate the mean of the input and add 1 to it.

#Define the function
func = function(x) mean(x) + 1

#Apply by column
x = apply(df, MARGIN = 2, FUN = func)
x
##      x1      x2      x3 
## 10006.5    16.5    26.5

Another usage is to apply a function to each element of a data frame. In the example below I add a dollar sign to each element of the data frame.

#Apply function to each element of data frame
func = function(x) paste0('

##       x1       x2    x3   
##  [1,] "$10001" "$11" "$21"
##  [2,] "$10002" "$12" "$22"
##  [3,] "$10003" "$13" "$23"
##  [4,] "$10004" "$14" "$24"
##  [5,] "$10005" "$15" "$25"
##  [6,] "$10006" "$16" "$26"
##  [7,] "$10007" "$17" "$27"
##  [8,] "$10008" "$18" "$28"
##  [9,] "$10009" "$19" "$29"
## [10,] "$10010" "$20" "$30"

For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.

df = data.frame(x1 = 10001:10010
                , x2 = 11:20
                , x3 = 21:30
                , x4 = LETTERS[1:10])
df
##       x1 x2 x3 x4
## 1  10001 11 21  A
## 2  10002 12 22  B
## 3  10003 13 23  C
## 4  10004 14 24  D
## 5  10005 15 25  E
## 6  10006 16 26  F
## 7  10007 17 27  G
## 8  10008 18 28  H
## 9  10009 19 29  I
## 10 10010 20 30  J

Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.

#Apply function to only certain column types of dataset
func = function(x) {
  if (is.numeric(x)) #Test is input is numeric
      paste0('

Then I apply the function by column using lapply.

x = lapply(df, FUN = func) #Apply function to each element of the data frame
x
## $x1
##  [1] "$10,001" "$10,002" "$10,003" "$10,004" "$10,005" "$10,006" "$10,007"
##  [8] "$10,008" "$10,009" "$10,010"
## 
## $x2
##  [1] "$11" "$12" "$13" "$14" "$15" "$16" "$17" "$18" "$19" "$20"
## 
## $x3
##  [1] "$21" "$22" "$23" "$24" "$25" "$26" "$27" "$28" "$29" "$30"
## 
## $x4
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.

x = data.frame(lapply(df, FUN = func)) #Apply function to each element of the data frame
x
##         x1  x2  x3 x4
## 1  $10,001 $11 $21  A
## 2  $10,002 $12 $22  B
## 3  $10,003 $13 $23  C
## 4  $10,004 $14 $24  D
## 5  $10,005 $15 $25  E
## 6  $10,006 $16 $26  F
## 7  $10,007 $17 $27  G
## 8  $10,008 $18 $28  H
## 9  $10,009 $19 $29  I
## 10 $10,010 $20 $30  J

Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.

#Nested for loop to format dataframe as currency
for(j in 1:ncol(df)){
  if (is.numeric(df[, j])) {
    for (i in 1:nrow(df)) {
      df[i,j] = paste0('

##         x1  x2  x3 x4
## 1  $10,001 $11 $21  A
## 2   $10002 $12 $22  B
## 3   $10003 $13 $23  C
## 4   $10004 $14 $24  D
## 5   $10005 $15 $25  E
## 6   $10006 $16 $26  F
## 7   $10007 $17 $27  G
## 8   $10008 $18 $28  H
## 9   $10009 $19 $29  I
## 10  $10010 $20 $30  J

This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x


For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.



Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.


Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, format(x, big.mark = ',')) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}
Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x


For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.



Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.


Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, format(df[i,j], big.mark = ‘,’))
}
i = 1
}
}
df


This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x


For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.



Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.


Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, format(x, big.mark = ‘,’)) #If TRUE, format as currency
else as.character(x) #If FALSE, return as character
}

Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


, x)
x = apply(df, MARGIN = c(1,2), FUN = func )
x


For this next example I will create a new data frame and demonstrate lapply which applies a function to a data frame by column and returns a list. This application is powerful because it allows you to conditionally apply the function to columns or rows of the dataframe.



Notice in the data frame I have three numeric columns and one character column. I want to format only the numeric columns as currency. First I write a user-defined function to test if the column is numeric and format if it is.


Then I apply the function by column using lapply.



The output of the lapply function is always a list. You can convert to a data frame as shown below by wrapping the lapply function in a data.frame() function.



Finally, in contrast to the lapply function. If I wanted to perform that manipulation without apply, it would require a nested for loop.



This produces the same output but is more complex to code.

Using the apply family of functions makes data manipulations simpler and faster. I have only touched the surface of the functionality of these powerful functions.


Contact Red Oak Strategic

Ready to get started?


Kickstart your cloud and data transformation journey with a complimentary conversation with the Red Oak team.