Back to top anchor
Open main menu Close main menu

Advanced genome assembly workshop

Genomics Aotearoa and the New Zealand eScience Infrastructure (NeSI) recently hosted an advanced workshop with global leaders in genomics to hear first-hand about emerging techniques used on the first draft of the human pangenome reference (published in May 2023).

Researchers from the Human Pangenome Reference Consortium (HPRC), visiting from the United States, presented to a group of New Zealand genomics researchers in July 2023 at the University of Otago, spearheaded by Genomics Aotearoa Variome project co-leads Associate Professor Phil Wilcox and Professor Stephen Robertson with Dr Ann McCartney, Assistant Researcher, University of California Santa Cruz (UCSC) Genomics Institute.

Dr Julian Lucas (Senior Bioinformatics Systems Analyst at UCSC Genomics Institute) led the masterclass on human genome assembly using long-read sequencing platforms, supported by Brandon Pickett (NIH/NHGRI Bethesda MD) and Linelle Abueg (The Rockefeller University).

Dr Karen Miga4

The focus was on read handling, assembly generation, evaluation and curation. This included comparisons between sequence platforms and popular long-read assemblers, examples of good and bad assemblies, and approaches to assembly phasing.

Dr Karen Miga (University of California, Santa Cruz) also presented a Genomics Aotearoa Friday seminar at the University of Otago and online on July 7, covering “Expanding studies of global genomic diversity with complete, telomere-to-telomere (T2T) assemblies.”

Dr Miga, voted amongst Time’s 100 most influential people for 2022, co-founded the Telomere-to-Telomere (T2T) Consortium in 2012 - an open, community-based effort to generate the first complete assembly of a human genome. She is the Director of the Reference Production Center for the Human Pangenome Reference Consortium (HPRC). 

This Pangenome work represents a new era in genetic analysis, following on from the first human genome map completed in 2003 – at that time one of the great feats of science. 

The 2003 draft of the human genome, while a tremendous resource for scientists, represented only an incomplete representation of the extent of the human genome.

Now that the full extent of the human genome can now be sequenced using the Telomere-to-telomere approach, initiatives like the pangenome can capture variation across the entire extent of the genome. This makes it so much more representative of variation in all regions of the genome from diverse populations across the globe. It is a step towards fairer and more equitable science.

Genomics Aotearoa’s role is to build capability and capacity in the rapidly changing fields of genomics and bioinformatics, and a workshop of this standard helps New Zealand to see how we can use and adapt new bioinformatic practices for human health, environment and primary production benefits.

Genomics Aotearoa Bioinformatics training co-ordinator Dr Tyler McInnes said it was exciting for New Zealand researchers to be learning about the processes used in this breakthrough from scientists leading the world in this research, and so soon after they have been developed.

“The T2T reference genome is a leap forward in the field of human genomics. Equally important is the development of the methodology used to build this new resource.” 

“Dr Miga’s team shared insights and skills at our workshop that will help researchers here in Aotearoa to refine our own methods and develop resources in fields including evolutionary genomics, conservation and protection genomics, aquaculture and microbiological genomics.” 

“We’re looking forward to be able to learn, discuss, try and then pass this knowledge on.”

Some explanations
A genome is the complete set of genetic material present in a cell or organism; bioinformatics is the methods and software tools for understanding the biological data derived from genomics. And a pangenome represents the entire set of genes within a species – both the variable and invariant fractions.

Part of this the process of assembling genomes is long read and short read sequencing - short-read sequencing produces reads that are shorter in length, while long-read sequencing produces reads that are longer. While short reads can capture most genetic variation, long-read sequencing allows the detection of complex structural variants that may be difficult to detect with short reads. 

Pangenomes are based on long read assemblies.

About the Human Pangenome project
The Human Pangenome Reference Consortium is a project funded by the National Human Genome Research Institute in the USA, to sequence and assemble genomes from individuals from diverse populations to better represent genomic landscape of diverse human populations.

The human reference genome has formed the backbone of human genomics since its initial draft release more than 20 years ago. However, the primary sequences within genomes have gaps of unknown sequences; these missing reference sequences create an observational bias which has limited our ability to understand diversity. 

Pangenome references, which have rapidly progressed over the past few years, help to overcome this reference bias, to the point it’s now feasible that pangenomes will be used for common genomic analyses. 

The Telomere-to-Telomere (T2T) consortium finished the first complete sequence of a haploid (single set of chromosomes) human genome - T2T-CHM13. The new human pangenome reference, published in May 2023, is more comprehensive and incorporates the missing eight percent of the human genome sequence, adding over 100 million new bases.

The significance of this is that it captures substantially more diversity from different human populations than what was previously available. This is extremely important for indigenous populations who have been overlooked or labelled as “different to the reference.”