Back to top anchor
Open main menu Close main menu

Epigenome-wide association study pipeline

 A Genomics Aotearoa project has developed a significant new process that will help researchers interpret and develop genetic health information - an epigenome-wide association study pipeline.

Epigenome-wide association studies (EWAS) are an approach aimed at exploring and understanding how interactions between genetic background and the environment could affect human wellbeing. Professor Greg Jones and Genomics Aotearoa postdoctoral fellow Dr Basharat Bhat from the University of Otago have been looking at a way of standardising techniques and tools to analyse epigenetic information from their unique EWAS studies that is high quality, replicable and enables us to better understand health conditions.

Basharat Bhat and Greg Jones

Basharat Bhat and Greg Jones teaching a workshop

Current methods for genome-wide DNA methylation studies use techniques that rely heavily on careful bioinformatics analysis for interpretation, requiring experience and resources not readily available. Several bioinformatics tools developed to analyse the DNA methylation array data involve high-performance computational resources, programming skills and user experience to achieve valid results. They also do not provide end-to-end EWAS data analysis support, and the most widely used open-source tools lack some of the specific features researchers need.

Over the last year, Basharat has combined several existing techniques and introduced new functionality. The result is the Epigenome-Wide Association Study Pipeline (EWASP) - a more sophisticated method with more accuracy and greater accessibility.

This pipeline will help interpret information from Greg's unique cohort vascular studies, but it also represents the next global wave in utilising valuable knowledge from any genome. This applies not just in human health but can also be used to analyse farmed animal genomes and potentially in conservation too. The pipeline is now being promoted through the New Zealand genomics community.

Advantages of this new approach

The advantage of this novel approach is that users don't have to code, so more people are able to directly analyse the data from their studies. So, health practitioners studying vascular disease or cancer, for instance, can have more of a role in interpreting data they have been collecting, ultimately helping to tailor solutions to health problems. It also opens research possibilities - for instance, in the social sciences.

This new pipeline enhances Genomics Aotearoa's efforts to build the genomic and bioinformatic capability New Zealand currently needs to be able to best use information from the genomes of humans, plants, animals and bacteria specific to our country.

How it works

EWASP is an intuitive and researcher-friendly web based platform designed around a model-view-controller (MVC) schema.

At the front end of this comprehensive data analysis web server for population-based epigenome-wide association studies, the user inputs raw data and the phenotype file, then selects different filtering parameters, potentially confounding variables and submits a job for processing. A process scheduler roughly calculates the resources required to complete each job. EWASP integrates different existing packages / modules for EWAS with in-house developed scripts for a complete end-to-end analysis.

The process includes an interactive JavaScript library for data visualisation, data export options and to provide data backups at each step.

The EWASP web server incorporates several user-friendly features, including:

  1. Multiple data upload schema
  2. Accepting raw IDAT files (the standard array output) and (pre-processed) β-values (the typical format for publicly available data)
  3. A comprehensive range of data normalisation options
  4. Data imputation
  5. Non–parametric testing for identification of significant differentially methylated CPG positions (DMPs) associated with continuous variables
  6. Interactive data visualisation
  7. A comprehensive analysis summary
  8. Methods to facilitate the identification of mQTLs
  9. Tools to correct for cell composition effects: (A) an option to automatically generate blood cell composition coefficients using the Houseman Extended method; or (B) an export option that facilitates the generation of similar coefficients using the Horvath DNAmAge tool. This also allows for the generation of Methyl_Age and GRIMAGE (calibrated to predict risk of chronic disease-related mortality) scores.

For more information, contact Dr Basharat Bhat,


Find out more about Epigenome-Wide Association Study Technology here