Back to top anchor
Open main menu Close main menu

Glossary of genomic terms

A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q     S   T   U   V   W   X   Y   Z

Term Definition
Assembly DNA sequencing technology cannot read whole genomes in one go; instead it reads short pieces of bases from a genomic sequence. Sequence assembly is aligning and merging fragments from a longer DNA sequence to reconstruct the original sequence.
Bioinformatics The science of analysing genomic data.
Candidate genes Genes of interest related to phenotypes or disease states.
Clinical geneticist This is a medical doctor with special training in genetics who meets with patients to evaluate, diagnose and manage genetic disorders. Clinical geneticists also assist in the management of genetic diseases by identifying preventable complications through early and accurate diagnosis and surveillance.

CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats) is a method of genetic manipulation consisting of two key molecules that introduce a change into the DNA. The molecules are:

  • An enzyme called Cas9 which can cut strands of DNA at a specific location in the genome so that short sections of DNA can then be added or removed.
  • A piece of RNA sequence called guide RNA (gRNA), located within a longer RNA which guides Cas9 to the correct part of the genome to cut.
DNA methylation An epigenetic process where DNA bases are modified by addition of a methyl group. Methylation can change the activity of a DNA segment without changing the genome sequence.
Epigenetics The influence of non-genetic factors on the genome, which affects gene expression. Non-genetic factors include the environment like diet, gut microbiota, toxin and drug exposure, psychological and physical stressors and levels of activity throughout life. Measuring the epigenetic changes that occur in diseases, including cancer and heart disease, can provide understanding of the underlying mechanisms.
Epigenome The set of all epigenetic modifications to an individual’s genome.
Eukaryote An organism whose cells contain a nucleus surrounded by a membrane and whose DNA is bound together by proteins (histones) into chromosomes. Animals, plants and fungi are eukaryotes.
Exemplar research A research model that provides leadership and examples for further research in similar fields.
Finishing the sequence Finishing an assembly involves refining the genomic sequence to eliminate sequencing errors and to close gaps.
Genetic admix The presence of DNA in an individual from a distantly-related population or species, as a result of interbreeding between populations or species who have been reproductively isolated and genetically differentiated. Admixture results in the introduction of new genetic lineages into a population.
Genetic counsellors These are healthcare professionals with training in human genetics and counselling who guide patients and their whānau with a genetic disorder through the process of understanding and making informed healthcare decisions.
Genetic gain The rate of genetic improvement within a breeding population over time. An important concept in conventional quantitative genetics and breeding, genetic gain can be defined as the increase in performance achieved annually through artificial selection.
Genetic-linkage mapping Illustrates the order of genes on a chromosome and the relative distances between those genes.
Genome annotation The process of attaching biological information to genetic sequences.
Genome duplication A widespread phenomenon in plant genome evolution, where an organism can sometimes inherit two copies of the genome of its parents, instead of only one copy. The doubling of DNA then persists over generations and the duplicated copies can result in the evolution of new functions.
Genome map Helps scientists to define which parts of the genome are physically linked to each other. The landmarks on a genome map that aid navigation might include short DNA sequences, regulatory sites that turn genes on and off, and genes themselves.
Genome variation types Genome variations include mutations and polymorphisms. Mutation is often used to refer to a variation that is associated with a specific human disease, while the word polymorphism implies a variation that may or may not affect a physical characteristic. Genetic variations also include gene deletions, gene additions and structural variations.
Genome-wide association studies A search for parts of the genome associated with characteristics of interest, one example being human diseases.
Genomic signature Genomic regions of DNA sequences that provide information about the activity of a specific group of genes in a cell or tissue.
Genotype (noun) An organism's set of genetic variations.
Genotype (verb) To determine genetic variation in a genome.
Germline The sequence of cells which develop into eggs and sperm.
Imputation The process of replacing missing data with substituted values.
Introns Extra sequences inside genes.
Long read sequencing (also known as third generation sequencing)

Involves new forms of sequencers that can read long distances down one strand of DNA. There are currently two effective technologies:

  • Pacific Biotechnology (PacBio) - an imaging approach that allows the detection of the incorporation of single labelled base pairs one after another into a strand of DNA being replicated (
  • Oxford Nanopore (Nanopore) - uses tiny charged pores that a strand of DNA is drawn into, and as each base passes through the hole it changes the charge in a way that can be measured ( These changes in charge are then assigned to each base and the sequence is built up from there.
Linked read technology Uses a unique barcode system to label short DNA sequences from individual molecules that are close to each other on the genome, so they can be linked to create longer sequence reads.
Metabolomics This process detects chemicals or metabolites and provides a read-out of what chemicals are in a tissue at any one time.
Microbiome The community of microorganisms (such as fungi, bacteria, and viruses) that exists in a particular environment. The environment could be anything from the body of an animal or plant or soils and waterways.
Mutations The changing of the structure of a gene, caused by the alteration of single base units in DNA, or the deletion, insertion, or rearrangement of larger sections of genes or chromosomes. The resulting variant form may be transmitted to subsequent generations.
Non-coding DNA Genes (coding DNA) account for a small percentage of the DNA in the genome - knowing the entire genome sequence will help scientists study the parts of the genome outside the genes. Non-coding DNA includes the regulatory regions that control how genes are turned on an off, as well as long stretches of DNA of unknown function.
Nonsense mutation A DNA mutation that results in a non-functional protein.
Nucleotides A single base of DNA is made up of A, C, G or T. Base pairs are the two opposing nucleotides on a double-stranded DNA molecule. A pairs with T, and C pairs with G.
Omics or 'omics Omics is the collective name for the disciplines that characterise and quantify pools of biological molecules that translate into the structure, function, and dynamics of organisms. Examples include genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics.
Pangenome A pangenome (also pan-genome or supragenome) is the entire set of genes from all strains or varieties within a group of organisms.
Pathway In genetics, a pathway is a set of genes that work together in a biological process.
Phenotype Observable characteristics influenced both by an organism’s genotype and by the environment.

This is the study of how genomic variation within the individual or their disease (including gene expression, epigenetics, germline and somatic mutations) influences one person’s response to drugs. The aim is to optimise drug therapy by maximising therapeutic effect and minimising adverse effects.

Pipeline A process for the preparation, development, production and analysis of genomic data.
Population genomics The large-scale application of genomic information to study populations, including entire genomes of an entire species (e.g., kākāpō).
Prokaryote Unicellular microbial organism that lacks a nucleus. All bacteria are prokaryotes. A related group are Archaea – these are also unicellar but are different to bacteria.
Proteomics RNA (made from turned-on genes) is translated into protein. Proteomics is a technique to look at the broad range of proteins in a cell or tissue (using mass spectrometry). We can usually identify which proteins are present in a cell and what they are doing.
RNA RNA (ribonucleic acid) is the molecule that takes information from DNA to make protein and has many other activities. RNA is only made from genes in the DNA that are turned on.
Short read sequencing (also known as next generation or second generation sequencing) Small segments of DNA strands put in order then assembled in the genome. This is assembled by looking at the sequence of each chunk and finding sequences that overlap (aligning).
SNPs About 90 percent of human genome variation can be accounted for by single nucleotide polymorphisms, or SNPs (pronounced "snips"). These are variations that involve just one nucleotide, or base.
Somatic Refers to the cells of the body in contrast to the cells that make sperm or eggs.
Structural variation or Structural Variants (SVs) Structural variation describes individual or group differences in genome structure, such as gene deletions, insertions, duplications, inversions, and translocations. These variant regions are scattered throughout genomes and are often associated with gene expression changes and observable differences among individuals (phenotypic differences).
Transcriptomics Sequencing RNA from a tissue or cell to measure the set of active genes.
Variome The whole set of genetic variations found in populations of species.
Whole genome sequencing (WGS) The process of determining the complete DNA sequence of an organism's genome.