1 Introduction

openstatsware short course: Good Software Engineering Practice for R Packages

Audrey Yeo T. Y.

August 24, 2025

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Welcome to GSWEP4R at ISCB46!



We have an open source spirit here. Please sit with someone you do not already know.



Be Curious. Be Respectful. Be Kind.

Audrey

  • M Sci Biostatistics (Zurich), M Nursing (Sydney)
  • Currently : Founder of Finc-Research
  • Previously : Master Thesis in Longitudinal Cluster Analysis, Biostatistician at Roche for 4 years, Paeds / Onc RN from Melbourne, Australia
  • Lead developer of phase1b package, Bayesian Framework for Clinical Trials and openstatsware member
  • Enjoys developing statistical software, especially writing tests
  • Connect on

Alessandro

  • Ph.D. in Biostatistics from the University of Leicester with a thesis on hierarchical modelling (longitudinal, survival)
  • Currently a biostatistician at Red Door Analytics AB in Stockholm, Sweden since November 2022
  • Previously: post-doc researcher and biostatistician at Karolinska Institutet, Stockholm, Sweden
  • Maintainer of multiple R packages on CRAN, contributor to several other, co-chair of the openstatsware working group
  • Connect on

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of inferential.bio and RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - already 3 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 60 statisticians from more than 30 organizations
  • What: Engineer packages and promote best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control and modern collaboration techniques on GitHub.com
  • Learn how to make an R available to others

Program outline

9.00 - 9.15 Introduction and outline
9.15 - 10.00 R package syntax
10.00 - 10.30 Exercise
10.30 - 11.00 Coffee break
11.00 - 11.45 Package quality
11.45 - 12.30 Exercise
12.30 - 13.30 Lunch break
13.30 - 14.15 Collaboration via GitHub
14.15 - 15.00 Exercise
15.00 - 15.30 Coffee break
15.30 - 16.15 Publication of R packages
16.15 - 16.45 Exercise
16.45 - 17.00 Summary and Q&A

House-keeping

What you will need

  • Github.com (free) account
  • Local R development environment with
    • git
    • Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Name? 🌍
  • Organization? 🏢
  • Motivation for this workshop/ what would you like to learn 🧠
  • Favorite food? 🍭
  • Favorite music? 🪗

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successful statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature user & contributor community
    • Where deeper understanding is crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

Be aware that starting small is also learning a new set of vocabulary. Engineering terms are active, and specific. We’re here to bring you along!

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Extra motivation for today’s course:

check out the diversity alliance hackathon on Sep 30 and see if you can apply today’s learned skills there!

Question, comments?

License information