class: center, middle, inverse, title-slide # Programming Tools in Data Science ## Lecture #8: R package ### Samuel Orso ### 8 November 2021 --- # R package * Packages provide a convenient mechanism to distribute your code. * It follows strict conventions (structure, folder names, ...). * It allows testing and better maintenance. --- # Setup * You will need (at least) the following packages: ```r install.packages(c("devtools", "roxygen2", "testthat", "knitr")) ``` * Make sure your system is ready! ```r devtools::has_devel() ``` ``` ## Your system is ready to build packages! ``` (otherwise visit <https://r-pkgs.org/setup.html>) --- class: sydney-blue, center, middle # Demo --- # DESCRIPTION file * DESCRIPTION contains metadata of your package (authors, description, dependencies, contact, ...) * It should look like ```r Package: pkgtest Type: Package Title: What the Package Does (Title Case) Version: 0.1.0 Authors@R: person("John", "Doe", email = "john.doe@example.com", role = c("aut", "cre")) Maintainer: The package maintainer <yourself@somewhere.net> Description: More about what it does (maybe more than one line) Use four spaces when indenting paragraphs within the Description. License: MIT + file LICENSE Encoding: UTF-8 LazyData: true URL: https://https://github.com/ptds2021/pkgtest BugReports: https://https://github.com/ptds2021/pkgtest/issues RoxygenNote: 7.1.2 ``` --- * Use the `person` function for `Authors@R`, role includes: a. `"cre"`: (creator) for package maintainer; b. `"aut"`: (author) those who made substantial contributions to the package; c. `"ctb"`: (contributor) those who made smaller contribution; d. `"cph"`: (copyright holder) used for legal name for an institution or corporate body. * `License`: since the point of a package is to be distributed to others, you need to [choose a licence](https://choosealicense.com/licenses/). For example, [MIT](https://choosealicense.com/licenses/mit/) is permissive and can be called ```r usethis::use_mit_license() ``` --- # Dependencies * DESCRIPTION lists all the packages needed for your package to work. * `Depends` specifies the version of `R`; e.g. ```r Depends: R (>= 4.0.0) # don't forget the space! ``` * `Imports` lists the package that must be present (best practice is to write `pkg::fct()`); for example, suppose you need `ggplot2` and `dplyr` ```r Imports: dplyr (>= 1.0.7), ggplot2 (>= 3.3.5) ``` Versioning ensures that users have the right version of the package. * `Suggests` lists packages that can be used (for vignettes, test, datasets,...) but are not required. --- # Documenting your package * Documentation appears in the `man/` (manual) subfolder as `*.Rd` files. * We will generate documentation automatically using `roxygen2`. * You can either use `devtools::document()` or maybe simpler <img src="images/roxygen2.png" width="1439" style="display: block; margin: auto;" /> --- * It uses the syntax `#'` with tags `@` and is placed right before functions, e.g. ```r #' @title hello world function #' @return print a message #' @export hello <- function() { print("Hello, world!") } ``` * Main tags should for functions are `@title`, `@param`, `@author`, `@seealso`, `@details`, `@examples`, `@return` (click [here](https://r-pkgs.org/man.html) for more details) * **All** functions should be documented. **Some** should be exported (`#' @export`) * **Do repeat yourself** --- .pull-left[ <img src="images/pkgtest_hello_world.png" width="791" style="display: block; margin: auto;" /> ] .pull-right[ ```r #' `@title` hello world function #' `@author` John Doe #' `@details` #' A super fancy function to print Hello World! #' `@return` print a message #' `@examples` #' \dontrun{hello()} #' `@export` hello <- function() { print("Hello, world!") } ``` ] --- # Vignettes * A vignette is a RMarkdown document that provides more insights into your package. * Simply call `usethis::use_vignette("my-vignette")` to create `my-vignette`. * Add required packages in DESCRIPTION under `Suggests` --- # Namespace > Writing R extension, [Sec. 1.5](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-namespaces) > > The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path (so the base namespace precedes the normal search rather than being at the end of it). * NAMESPACE is generated automatically by `roxygen2` --- # Testing with examples * Testing ensures that your code is good and pays-off in the long-run. * Examples are good way to make sure the function work and are displayed to the user. * You can put more complex examples in `inst/examples/my_example.R` and test it using `@example inst/examples/my_example.R` --- # Example In `R/` ```r #' @title Compute regression coefficients #' @param x design \code{matrix} #' @param y \code{vector} of responses #' @details #' Compute the regression coefficients using \link[stats]{lm}. #' @importFrom stats lm coef #' @seealso \code{\link[stats]{lm}}, \code{\link[stats]{coef}} #' `@example /inst/examples/eg_lm.R` #' @export regression_coefficient <- function(x,y){ fit <- lm(y~x) coef(fit) } ``` In `/inst/examples/eg_lm.R` ```r ## linear regression regression_coefficient(x = cars$speed, y = cars$dist) ``` --- If you click on `check` <img src="images/pkg_check.png" width="606" height="505" /> --- Now suppose there is a mistake in the code, for instance in `/inst/examples/eg_lm.R` ```r ## linear regression regression_coefficient(x = cars$speed, y = cars) ``` <img src="images/pkg_check2.png" width="600" height="500" /> --- # Testing with `testthat` * Examples help to detect errors in the code, but their primary goals is informative for the users. * Examples are displayed to the users and concerns final end functions. * It is good practice to have broader and automated tests. * We are going to use `testthat`. Simply call `usethis::use_testthat()`. * When should you test a function? > Whenever you are tempted to type something into a print statement or a debugger expression, write it as a test instead. — Martin Fowler --- # Structure of `testthat` * `testthat` is organised hierarchically: 1. An **expectation**: it is a single test using `expect_some_fct`, these are functions that test an expression and throw an error if the result disagree with what was expected. 2. A **test**: regroup one or several **expectations** and is created with `test_that`. 3. A **test file**: regroup one or several **test**. It is an `R` file and its name and structure conventions follows this example: `tests/testthat/test_something.R`. --- For example, the file `tests/testthat/test_reg_coef.R` ```r test_that("regression coefficient input check",{ expect_error(regression_coefficient(x = cars$speed, y = cars)) }) test_that("regression coefficient output",{ expect_type(regression_coefficient(x = cars$speed, y = cars$dist), "double") }) ``` --- # Automated checking * It is not because you and your team does not experiment any bug that everything is okay. * `R` users have different configurations, different OS. * It is good practice to use GitHub actions: every time you push changes to the main repo, GitHub launches some action according to your spec. * To begin with, use `usethis::use_github_action_check_standard()` * More examples are displayed at <https://github.com/r-lib/actions/tree/master/examples> --- and if everything passes <img src="images/github_action.png" width="1413" /> --- Find all the code presented here: <https://github.com/ptds2021/pkgtest> --- # To go further * More details and examples in the book [An Introduction to Statistical Programming Methods with R](https://smac-group.github.io/ds/section-r-packages.html) * More material and details in [R Packages](https://r-pkgs.org/). * A lot of details (really!) in [Writing R extension](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Creating-R-packages) --- class: sydney-blue, center, middle # Question ? .pull-down[ <a href="https://ptds.samorso.ch/"> .white[<svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M369.9 97.9L286 14C277 5 264.8-.1 252.1-.1H48C21.5 0 0 21.5 0 48v416c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48V131.9c0-12.7-5.1-25-14.1-34zM332.1 128H256V51.9l76.1 76.1zM48 464V48h160v104c0 13.3 10.7 24 24 24h104v288H48z"></path></svg> website] </a> <a href="https://github.com/ptds2021/"> .white[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> GitHub] </a> ]