openstatsware
short course: Good Software Engineering Practice for R Packages
August 24, 2025
Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.
Photo CC0 by Kateryna Babaieva on pexels.com
Let’s assume that you used some lines of code to create simulated data in multiple projects:
Idea: put the code into a package
Obligation level | Key word1 | Description |
---|---|---|
Duty | must2 | “must have” |
Desire | should | “nice to have” |
Intention | may | “optional” |
Purpose and Scope
The R package simulatr is intended to enable the creation of reproducible fake data.
Package Requirements
simulatr must provide a function to generate normal distributed random data for two independent groups. The function must allow flexible definition of sample size per group, mean per group, standard deviation per group. The reproducibility of the simulated data must be ensured via an optional seed. It should be possible to print the function result. The package may also facilitate graphical presentation of the simulated data.
Useful formats / tools for design docs:
UML Diagram
R package programming
This script breaks all common clean code rules:
# fmt: skip
y=function(x){
s1=0
for(v1 in x){s1=s1+v1}
m1=s1/length(x)
i=ceiling(length(x)/2)
if(length(x) %% 2 == 0){i=c(i,i+1)}
s2=0
for(v2 in i){s2=s2+x[v2]}
m2=s2/length(i)
c(m1,m2)
}
y(c(1:7, 100))
[1] 16.0 4.5
We now refactor it by applying clean code rules…
CCR#1 Naming: Are the names of the variables, functions, and classes descriptive and meaningful?
# fmt: skip
getMeanAndMedian=function(x){
sum1=0
for(value in x){sum1=sum1+value}
meanValue=sum1/length(x)
centerIndices=ceiling(length(x)/2)
if(length(x) %% 2 == 0){
centerIndices=c(centerIndices,centerIndices+1)
}
sum2=0
for(centerIndex in centerIndices){sum2=sum2+x[centerIndex]}
medianValue=sum2/length(centerIndices)
c(meanValue,medianValue)
}
CCR#1 Naming
CCR#2 Formatting: Are indentation, spacing, and bracketing consistent, i.e., is the code easy to read
getMeanAndMedian <- function(x) {
sum1 <- 0
for (value in x) {
sum1 <- sum1 + value
}
meanValue <- sum1 / length(x)
centerIndices <- ceiling(length(x) / 2)
if (length(x) %% 2 == 0) {
centerIndices <- c(
centerIndices, centerIndices + 1)
}
sum2 <- 0
for (centerIndex in centerIndices) {
sum2 <- sum2 + x[centerIndex]
}
medianValue <- sum2 / length(centerIndices)
c(meanValue, medianValue)
}
CCR#2 Formatting
CCR#3 Simplicity: Did you keep the code as simple and straightforward as possible, i.e., did you avoid unnecessary complexity
Note:
CCR#3 Simplicity
CCR#4 Single Responsibility Principle (SRP): does each function have only a single, well-defined purpose
getMean <- function(x) {
sum(x) / length(x)
}
isLengthAnEvenNumber <- function(x) {
length(x) %% 2 == 0
}
getMedian <- function(x) {
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices, centerIndices + 1)
}
sum(x[centerIndices]) / length(centerIndices)
}
CCR#4 Single Responsibility Principle (SRP)
CCR#5 Don’t Repeat Yourself (DRY): Did you avoid duplication of code, either by reusing existing code or creating functions
CCR#5: DRY
Suppose you have a code block that performs the same calculation multiple times:
Create a function to encapsulate this calculation and reuse it multiple times:
CCR#5 Don’t Repeat Yourself (DRY)
CCR#6 Comments: Did you use comments to explain the purpose of code blocks and to clarify complex logic
# returns the mean of x
getMean <- function(x) {
sum(x) / length(x)
}
# returns TRUE if the length of x is
# an even number; FALSE otherwise
isLengthAnEvenNumber <- function(x) {
length(x) %% 2 == 0
}
# returns the median of x
getMedian <- function(x) {
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices,
centerIndices + 1)
}
getMean(x[centerIndices])
}
#' returns the mean of x
getMean <- function(x) {
checkmate::assertNumeric(x)
sum(x) / length(x)
}
#' returns TRUE if the length of x is an even number; FALSE otherwise
isLengthAnEvenNumber <- function(x) {
checkmate::assertVector(x)
length(x) %% 2 == 0
}
#' returns the median of x
getMedian <- function(x) {
checkmate::assertNumeric(x)
centerIndices <- ceiling(length(x) / 2)
if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices, centerIndices + 1)
}
getMean(x[centerIndices])
}
CCR#7 Error Handling
Recommended quality workflow for R packages:
CCR#8: TDD
Verification:
Are we building the product right?
Validation:
Are we building the right product?
CCR#8: TDD
Unit tests help to increase the reliability and maintainability of the code
R package testthat
Example: unit test passed
Example: unit test failed
Error: getMean(c(1, 3, 2, NA)) not equal to 2. Error: getMedian(c(1, 3, 2)) not equal to 2.
#' returns the mean of x
getMean <- function(x, na.rm = TRUE) {
checkmate::assertNumeric(x)
sum(x, na.rm = na.rm) / length(x[!is.na(x)])
}
#' returns TRUE if the length of x is an even number; FALSE otherwise
isLengthAnEvenNumber <- function(x) {
checkmate::assertVector(x)
length(x[!is.na(x)]) %% 2 == 0
}
#' returns the median of x
getMedian <- function(x, na.rm = TRUE) {
checkmate::assertNumeric(x)
centerIndices <- ceiling(length(x[!is.na(x)]) / 2)
if(anyNA(x) & !na.rm){
centerIndices <- NA_real_
} else if (isLengthAnEvenNumber(x)) {
centerIndices <- c(centerIndices, centerIndices + 1)
}
getMean(sort(x)[centerIndices])
}
Function name | Does code… |
---|---|
expect_condition | fulfill a condition? |
expect_equal | return the expected value? |
expect_error | throw an error? |
expect_false | return ‘FALSE’? |
expect_gt | return a number greater than the expected value? |
expect_gte | return a number greater or equal than the expected value? |
expect_identical | return the expected value? |
expect_invisible | return a invisible object? |
expect_length | return a vector with the specified length? |
expect_lt | return a number less than the expected value? |
expect_lte | return a number less or equal than the expected value? |
expect_mapequal | return a vector containing the expected values? |
expect_message | show a message? |
expect_named | return a vector with (given) names? |
Function name | Does code… |
---|---|
expect_no_condition | run without condition? |
expect_no_error | run without error? |
expect_no_message | run without message? |
expect_no_warning | run without warning? |
expect_output | print output to the console? |
expect_s3_class | return an object inheriting from the expected S3 class? |
expect_s4_class | return an object inheriting from the expected S4 class? |
expect_setequal | return a vector containing the expected values? |
expect_silent | execute silently? |
expect_true | return ‘TRUE’? |
expect_type | return an object inheriting from the expected base type? |
expect_vector | return a vector with the expected size and/or prototype? |
expect_visible | return a visible object? |
expect_warning | throw warning? |
covr: Track and report code coverage for your package
Let’s assume we have added a generic function to cat a simulation result:
We can go into the details by clicking on a file name:
CCR#2: Formatting
CCR#2: Formatting
Two popular R packages support the tidyverse style guide:
The devtools function spell_check runs a spell check on text fields in the package description file, manual pages, and optionally vignettes.
How to link the styler1 package to a keyboard shortcut:
Take your package project and refactor it, i.e., apply the linked clean code rules:
Apply CCR#8 to your package project:
In the current version, changes were done by (later authors): TODO Daniel Sabanés Bové
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
The source files are hosted at github.com/openstatsware/shortcourse-iscb2025, which is forked from the original version at github.com/RCONIS/workshop-r-swe-zrh.
Important: to use this work you must provide the name of the creators (initial authors), a link to the material, a link to the license, and indicate if changes were made.