Functions in R

A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.

Function Components

The different parts of a function are:

Function Name − This is the actual name of the function. It is stored in R environment as an object with this name.
Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.
Function Body − The function body contains a collection of statements that defines what the function does.
Return Value − The return value of a function is the last expression in the function body to be evaluated.
## The syntax of creating functions
new_function_name <- function(arguments){
  FUNCTION_BODY
  everything_you_want_the_function_to_do_to arguments
}

R has many in-built functions which can be directly called in the program without defining them first. We can also create and use our own functions referred as user defined functions.

Built-in versus User-defined Functions

Simple examples of built-in functions are seq(), mean(), max(), sum(x) and paste(...) etc. They are directly called by user written programs and installed with R packages.

We can create user-defined functions in R that are specific to what a user wants and once created they can be used like the built-in functions.

The rule of thumb is if you have to copy and paste code more than twice, it’s quicker and cleaner to write a function.

Creating User-defined Functions

# load the tidyverse
library(tidyverse)

# create a function that takes a name input and prints it to the stdout
my_function <- function(my_name){
  print(paste0("My name is ", my_name))
}

# call the function
my_function("Cari")
## [1] "My name is Cari"

Multiple Variables in Functions

You can also write functions that take multiple input variables to complete needed tasks, with the general layout below:
# the syntax of using multiple variables in a function
new_function_name <- function(x, y){
  everything_you_want_the_function_to_do_to x
  additional_things_to_do_to y
  x + y
  etc.
}
The example below demonstrates how to write a function with multiple variables and syntax specific to calling the variables.
pow <- function(x, y) {
  # function to print x raised to the power y
  result <- x^y
  print(paste(x,"raised to the power", y, "is", result))
}
The arguments can be unnamed and follows the order listed in the function argument order:
# unnamed arguments
pow(8, 2)
## [1] "8 raised to the power 2 is 64"
The arguments can be named, in which case order doesn’t matter:
# named arguments
pow(y = 2, x = 8)
## [1] "8 raised to the power 2 is 64"
Lastly, the arguments can be mixed:
# mixed named/unnamed arguments
pow(2, x = 8)
## [1] "8 raised to the power 2 is 64"

A More Complex Example

Using the mpg dataset from the tidyverse package, write a function that will take a manufacturer, model, and year of car and calculate the average mpg.
# write the function in pieces
avg_mpg <- function(MANU, MODEL, YEAR){ # identify the variables
  man <- filter(mpg, manufacturer == MANU) # isolate by manufacturer
  mod <- filter(man, model == MODEL) # isolate by model
  year <- filter(mod, year == YEAR) # isolate by year
  year <- mutate(year, avgmpg = ((cty+hwy)/2)) # add a new column with the average mpg
  return(year)
}

# test your function and find the average mpg for the 2019 audi a4
avg_mpg("audi","a4","1999")
## # A tibble: 4 × 12
##   manufactu…¹ model displ  year   cyl trans drv     cty   hwy fl    class avgmpg
##   <chr>       <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>  <dbl>
## 1 audi        a4      1.8  1999     4 auto… f        18    29 p     comp…   23.5
## 2 audi        a4      1.8  1999     4 manu… f        21    29 p     comp…   25  
## 3 audi        a4      2.8  1999     6 auto… f        16    26 p     comp…   21  
## 4 audi        a4      2.8  1999     6 manu… f        18    26 p     comp…   22  
## # … with abbreviated variable name ¹​manufacturer
# use your function to find the average mpg for the 2008 toyota camry
avg_mpg("toyota", "camry", "2008")
## # A tibble: 3 × 12
##   manufactu…¹ model displ  year   cyl trans drv     cty   hwy fl    class avgmpg
##   <chr>       <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>  <dbl>
## 1 toyota      camry   2.4  2008     4 manu… f        21    31 r     mids…   26  
## 2 toyota      camry   2.4  2008     4 auto… f        21    31 r     mids…   26  
## 3 toyota      camry   3.5  2008     6 auto… f        19    28 r     mids…   23.5
## # … with abbreviated variable name ¹​manufacturer

Homework Problems

Problem 1

Using the ebird dataset from previous lessons, write a function that will take the scientific name as input and create a new file containing all the observation info for that bird. Name the output files with the scientific names for the following species:
Scientific Name
Anser caerulescens
Antrostomus carolinensis
Setophaga americana
Your output should look something like this:
## File Anser_caerulescens.csv created.
## # A tibble: 6 × 14
##    ...1 list_ID   commo…¹ scien…² date       time  count durat…³ locat…⁴ latit…⁵
##   <dbl> <chr>     <chr>   <chr>   <date>     <tim> <dbl>   <dbl> <chr>     <dbl>
## 1     1 S40748758 Snow G… Anser … 2017-11-26 10:28    16      20 US-MO      38.9
## 2     2 S33616660 Snow G… Anser … 2017-01-12 07:00     1      90 US-MO      38.6
## 3     3 S33809874 Snow G… Anser … 2017-01-20 16:26     1      59 US-MO      38.6
## 4     4 S35533959 Snow G… Anser … 2017-03-30 07:05     1     100 US-MO      38.6
## 5     5 S35698031 Snow G… Anser … 2017-04-04 07:00     1     127 US-MO      38.6
## 6     6 S35861224 Snow G… Anser … 2017-04-10 18:06     1      68 US-MO      38.6
## # … with 4 more variables: longitude <dbl>, count_tot <dbl>, month <dbl>,
## #   year <dbl>, and abbreviated variable names ¹​common_name, ²​scientific_name,
## #   ³​duration, ⁴​location, ⁵​latitude
## File Antrostomus_carolinensis.csv created.
## # A tibble: 1 × 14
##    ...1 list_ID  common…¹ scien…² date       time  count durat…³ locat…⁴ latit…⁵
##   <dbl> <chr>    <chr>    <chr>   <date>     <tim> <dbl>   <dbl> <chr>     <dbl>
## 1   872 S1665531 Chuck-w… Antros… 2004-04-22    NA     1       0 US-FL      27.2
## # … with 4 more variables: longitude <dbl>, count_tot <dbl>, month <dbl>,
## #   year <dbl>, and abbreviated variable names ¹​common_name, ²​scientific_name,
## #   ³​duration, ⁴​location, ⁵​latitude
## File Setophaga_americana.csv created.
## # A tibble: 6 × 14
##    ...1 list_ID   commo…¹ scien…² date       time  count durat…³ locat…⁴ latit…⁵
##   <dbl> <chr>     <chr>   <chr>   <date>     <tim> <dbl>   <dbl> <chr>     <dbl>
## 1  5979 S67802026 Northe… Setoph… 2020-04-25 11:48     1     140 US-OK      36.8
## 2  5980 S36840514 Northe… Setoph… 2017-05-13 10:40     1     275 US-MO      38.9
## 3  5981 S36840501 Northe… Setoph… 2017-05-13 10:20     1      20 US-MO      38.9
## 4  5982 S1666285  Northe… Setoph… 2004-04-23 13:30     1     210 US-FL      26.4
## 5  5983 S85318603 Northe… Setoph… 2021-04-11 09:50     1      74 US-OK      36.0
## 6  5984 S18555256 Northe… Setoph… 2014-05-25 08:40     3     300 US-MO      38.5
## # … with 4 more variables: longitude <dbl>, count_tot <dbl>, month <dbl>,
## #   year <dbl>, and abbreviated variable names ¹​common_name, ²​scientific_name,
## #   ³​duration, ⁴​location, ⁵​latitude

Problem 2

Your ouput file should look like this:
##   ...1   list_ID        common_name          scientific_name       date
## 1   14 S82037948         Snow Goose       Anser caerulescens 2021-02-20
## 2    2 S33616660         Snow Goose       Anser caerulescens 2017-01-12
## 3  872  S1665531 Chuck-will's-widow Antrostomus carolinensis 2004-04-22
## 4  872  S1665531 Chuck-will's-widow Antrostomus carolinensis 2004-04-22
## 5 5995 S18948766    Northern Parula      Setophaga americana 2014-06-29
## 6 5979 S67802026    Northern Parula      Setophaga americana 2020-04-25
##       time count duration location latitude longitude count_tot month year
## 1 17:15:00    26        4    US-OK 35.96645 -95.49374       696     2 2021
## 2 07:00:00     1       90    US-MO 38.63891 -90.28538       272     1 2017
## 3     <NA>     1        0    US-FL 27.18182 -81.35875        15     4 2004
## 4     <NA>     1        0    US-FL 27.18182 -81.35875        15     4 2004
## 5 11:00:00     6      360    US-MO 37.72918 -92.39330       101     6 2014
## 6 11:48:00     1      140    US-OK 36.78372 -98.18573        64     4 2020

Problem 3

Write a nested function that will complete Problems 1 and 2 in a single function, but with the following species names as argument values:
Scientific Name
Branta canadensis
Spatula discors
Anas platyrhynchos

Challenge Problem!

Write a loop that uses your function from Problem 3 and a list of bird names to completely automate the process!
This lesson written using online resources here and previous lessons. My R script with the homework solutions (although there are multiple ways to complete them) can be found here.