By Aleksandra Pawlik, Ngoni Faya, Joseph Guhlin, Megan Guidry, Tom Harrop, Dinindu Senanayake
12 August 2020
Rapid development of computational bioinformatics tools means we can more easily push research boundaries. However, it comes at a cost.
The complexity of the software chain that needs to be installed and configured to run advanced workflows results in researchers spending hours, if not days, trying to set up and debug the computational environment before they can even start to analyse their data. Complex setups also hinder reproducibility, one of the core principles of science. Fortunately, there is a solution growing in popularity - containers.
Simply put, containers allow wrapping up software packages in an executable environment which can be moved from one computer to another regardless of which operating system these computers run. You can think of it as a “computer inside a computer”, contained in a relatively lightweight file. Once created, containers make the analysis workflow fully reproducible without the need to install and configure the required software.
Containers were created to ease the life of software developers, but they quickly made their way into the research world. It’s therefore useful for researchers to understand more about container processes.
There are several container systems, the most popular being Docker. However, due to several factors, it is not the best fit for High-Performance Computing (HPC), the environment commonly used for bioinformatics analysis. Singularity, with its security focus and high-compatibility with Docker, is currently the best container solution for HPC.
In July 2020, Genomics Aotearoa, New Zealand eScience Infrastructure (NeSI) and Manaaki Whenua - Landcare Research (MWLR) came together to organise a webinar introducing Singularity containers for reproducible bioinformatics. The webinar was delivered by two experienced container users, theoretical genomicists from the University of Otago, Joseph Guhlin and Tom Harrop. They gave an overview of containers and their main features, followed by an easy to follow tutorial “Genome assembly of diatom Chr 17”.
The webinar attracted over 70 participants from New Zealand universities and CRIs - this turnout emphasising the usefulness and widespread interest of containers for modern-day research.
Learning Singularity, just like any other computational tool, takes time and practice. However, the potential gains for researchers are significant. They can easily reuse containers shared by their peers, tool and pipeline developers, and create their own, making their research robust and reproducible. It is also becoming best practice, and sometimes even a requirement, to publish not just datasets but also the workflows in containers alongside scientific papers.
Resources are abundant for learning containers, but here are some selected online tutorials prepared specially for researchers:
- Containers on HPC and Cloud with Singularity by Pawsey Computing Centre
- Singularity Containers for Bioinformatics by Pawsey Computing Centre
- R Docker tutorial by ROpenSci Labs
Research is underpinning New Zealand’s response to economic, health and environmental challenges. Creating opportunities for researchers to learn about containers and other digital best practices, then, is critical for ensuring tomorrow’s decisions are based on robust and reproducible research outputs.
Want to know more about what NeSI and Genomics Aotearoa are doing to support capability growth among New Zealand’s researchers?