By Professor Murray Cox, Genomics Aotearoa researcher based at Massey University
Do I have to finish my favourite genome?
That’s an often-asked question. Geneticists generally strive to produce high-quality genomes that sequence every last gene, making full use of the state-of-the-art technologies coming on stream.
Sequencing DNA means determining the order of the four chemical building blocks – called “bases” – that make up the DNA molecule.
The sequence tells scientists the kind of genetic information that is carried in a particular DNA segment. For example, scientists can use sequence information to determine which stretches of DNA contain genes and which stretches carry regulatory instructions, turning genes on or off. Importantly, data from a genome sequence can highlight changes in a gene that may cause disease.
Reference genomes are the cornerstone of modern genomics. These high-quality genomes are differentiated from draft genomes by their completeness – few gaps, low number of errors, and a high percentage of sequence assembled into chromosomes.
Obviously the more detailed the genome, the more information it contains.
At Massey University, we have been studying Epichloë – a genus of fungi forming an agriculturally important symbiosis with grasses. While Epichloë genomes are small, they are genuine eukaryotic genomes, meaning they have large repeats, introns and multiple chromosomes. In other words, they are messy.
That has given us a good model to play with, and we wanted to see if we could build a finished Epichloë reference genome.
We have now produced the first complete genome for this group, and one of the first completed genomes of any fungus. This reference opens the door for researchers to understand more about how DNA sequences can vary among individuals and populations, and what the implications are for evolution and population management.
But completing this genome is only part of the story– we have also refined and standardised some of the methods used to get to this point.
Using Epichloë as an exemplar provides the opportunity to answer some fundamental questions about genome sequencing for Genomics Aotearoa:
- What is the best method for finishing small eukaryote genomes?
- Why are finished genomes useful?
- When should you bother and when should you not?
Taking a draft genome to a whole genome sequence
There are several methods that are often used to produce a whole genome sequence. Different technologies offer different opportunities, and bigger genomes have more challenges to work through.
It can be hard to know which method to use, and when to combine methods or types of data.
We wanted to take some of the trial and error out of the finishing process and standardise a pipeline that could be used beyond the Epichloë project.
We worked our way methodically through some different options and came up with a hierarchy of methods combining long and short reads. This is largely a function of data (long reads), coverage (lots) and – unfortunately for now – manual perseverance.
That said, it’s not rocket science. We think this hierarchy will help others, providing a useful road map for assembly and analysis.
How important is it to finish a genome?
This is all well and good, but the real question is actually not how to finish genomes, but when and why finished genomes are useful.
We are not producing genomes for genome’s sake, we are producing these DNA sequences to answer biological questions. So we need to be clear about what we are trying to achieve, and what biology we can learn from the information. We need to focus again on using genome sequences to ask big questions about biology.
And that means one of your questions should also be: when should you not bother?
If you just want a subset of genes, need a gene set for transcriptomics or are having a quick look at population diversity, then you don’t have to finish the genome.
If you want to know more about the biology, structure and function, then perhaps persevere – the finished genome will be worth it.
New technology and techniques are moving quickly – but we as researchers still need to ask a very simple question – why are we doing what we’re doing? Being clearer about that will help us use genomic information in a way that makes a real difference in the world.