1 Introduction

openstatsware short course: Good Software Engineering Practice for R Packages

April 18, 2024

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Audrey

  • M Nursing (Sydney), M Sci Biostatistics (Zurich)
  • Paeds / Onc RN from Melbourne, Australia
  • COVID ICU, Royal Children’s Hospital 2021
  • Biostatistician at Roche for 4 years
  • Founder of finc-research
  • Enjoys developing
  • Connect on

Alessandro

TODO

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of inferential.bio and RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - already 3 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 60 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control and modern collaboration techniques on GitHub.com
  • Learn how to make an R available to others

Program outline

9.00 - 9.15 Introduction and outline
9.15 - 10.00 R Package Syntax
10.00 - 10.30 Exercise
10.30 - 11.00 Coffee break
11.00 - 11.45 Package Quality
11.45 - 12.30 Exercise
12.30 - 13.30 Lunch break
13.30 - 14.15 Collaboration via GitHub
14.15 - 15.00 Exercise
15.00 - 15.30 Coffee break
15.30 - 16.15 Publication of R Packages
16.15 - 16.45 Exercise
16.45 - 17.00 Summary and Q&A

House-keeping

What you will need

  • Github.com (free) account
  • Local R development environment with
    • git
    • Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Name? 😚
  • Organization? 🏢
  • Motivation for this workshop/ what would you like to learn 🧠
  • Favorite food? 🦑
  • Favorite music? 🪈

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

Be aware that starting small is also learning a new set of vocabulary. Engineering terms are active, and specific. We’re here to bring you along!

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information