Introduction to DNA barcoding

Overview

DNA barcoding of fungi is coming of age and standard methods are starting to appear such that it is now practical to prepare samples with minimal equipment and outsource part of or all of the laboratory analysis. We can use DNA barcoding for the documentation and taxonomic revision of local species. By using DNA barcoding we should be able to verify that some of the local species that have a similar morphology to European species used in the fungi reference books are of different taxa and potentially identify unique species of fungi in the west Pennsylvania region.

What is DNA Barcoding and how does it work?

DNA, or deoxyribonucleic acid, is the hereditary material in humans, fungi and almost all other organisms. DNA is made from four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). The order, or sequence, of these bases is important as it encodes the messages that can be used to produce proteins, which are the complex molecules that do most of the work in our bodies. DNA is located in the nucleus of the cell, humans, fungi and other complex organisms also have a small amount of DNA in cell structures known as mitochondria. Mitochondria generate the energy the cell needs to function properly. A short section of DNA from a standardized region of the genome is used to identify different species, in the same way a supermarket scanner uses the familiar black stripes of the UPC barcode to identify your purchases. Unlike a supermarket that can not identify a product if the barcode is not in the database a DNA barcode may have a partial match to related species in a DNA database. Shown here is the barcode for Craterellus tubaeformis or Yellowfoot which was obtained from a Whole Foods store and used as a test subject during out initial study.

In animals the DNA used for a barcode is from the mitochondrial DNA (mtDNA) as it has a relatively fast mutation rate, which results in significant variation in sequences between species and, in principle, a comparatively small variance within species. The mitochondrial cytochrome c oxidase subunit I (CO1 or COX1) gene is used as the main barcode area.

The function and structure of the CO1 gene is described at Wikipedia. However the CO1 gene was found to be unsuitable for fungi and plants because it did not mutate fast enough between species. The fungi DNA barcoding community has proposed the internal transcribed spacer (ITS) regions that bracket the conserved nuclear ribosomal repeat unit or 5.8S region to be the standard barcode marker for fungi. The ITS region is a good choice for DNA barcoding because it is easy to amplify even from small quantities of DNA (due to the high copy number of rRNA genes) and has a high degree of variation even between closely related species. More about the ITS region can be found at Wikipedia. The DNA sequences from all the different DNA sequencing projects are submitted to public databases, the largest of which is the GenBank database. Genebank currently contains nearly one million ITS entries that cover many of the abundant macrofungi.

DNA Barcoding Procedure

The general procedure for collecting a DNA sample and the obtaining its barcode is as follows:

Collect a specimen. It can be collected from the field, natural history museums, zoos, botanical gardens and seed banks to name a few sources.
DNA is extracted from a tiny piece of tissue taken from the specimen.
The barcode region is isolated, replicated using a process called polymerase chain reaction (PCR) amplification and then sequenced. The sequence is represented by a series of letters ATCG representing the nucleic acids – adenine, thymine, cytosine and guanine.
The sequence is then analyzed and using computer software compared to entries in GeneBank and other public databases like the Barcode of Life Data Systems (BOLD) database – a reference library of DNA barcodes. If it is unique it can be added to these databases.

Polymerase chain reaction

Polymerase chain reaction (PCR) is a molecular biology technique to copy or amplify a single or a few copies of a piece of DNA resulting in many millions of copies. The technique works by heating and cooling the target DNA with short DNA fragments or primers that are complementary to the target DNA, a DNA polymerase enzyme and a mixture of the 4 single nucleotides building blocks. In each thermo cycle the sample is heated so that the DNA melts and separates into single strands. As the sample cools the primers bind to the target DNA and the DNA polymerase enzyme duplicates the region of DNA between the primers. As the next cycle begins the heating separates the DNA from the newly created amplified fragment and both the original DNA and the new fragments are used in the next amplification step. The quantity of the target DNA doubles in each cycle. A single thermo cycle can be performed in 2 to 3 minutes and a typical PCR reaction might consist of 30 cycles. The PCR technique is described in more detail at Wikipedia.

ITS1 primer sequence TCCGTAGGTGAACCTGCGG
ITS4 primer sequence TCCTCCGCTTATTGATATGC

It may be necessary to purify the target DNA fragment if the PCR product is contaminated by either non-specific amplification products, primer-dimers or large quantities of unused PCR primers. If the PCR product is separated on an agarose gel the DNA can be viewed under UV and checked for purity. Alternatively a spin column purification method might be used.

DNA sequencing

Traditional DNA sequencing, that was used to sequence the human genome for the first time, is based on the Sanger method and is very similar to PCR. The PCR product is mixed with the primers, the 4 single nucleotides building blocks and a DNA polymerase enzyme. There is an additional component, a small fraction of dideoxyribonucleotide bases. These are just like regular DNA bases, except they are missing one of the chemical groups used to join DNA bases together, the 3′ hydroxyl group. Once a dideoxyribonucleotide bases is incorporated on to the end of a DNA strand, there is no way to continue elongating it. Depending on the DNA sequencing method used the dideoxyribonucleotide bases may be labelled with fluorescent dyes, a different color for each of the 4 bases.

As in PCR the sample is heated to melt and separate the DNA strands and cooled to anneal the primer and then the DNA polymerase starts to extend the strands. There are many billions of strands in the starting mixture. Every so often a dideoxyribonucleotide base is used instead of a standard DNA base and extension of that strand is terminated. By the end of the reaction there are shortened strands with a dideoxyribonucleotide base terminating it at every possible position in the original strand. If our PCR product is 400 bases long by the end of the reaction there would be a mixture of 400 strands of different lengths from the shortest, 1 base long, to the longest at 400 bases long. The mixture of products are then separated in tubes of gel by electrophoresis. The shortest molecules travel down the tube fastest and pass a UV laser and detector. The DNA stands fluoresce in the laser light and the color, green, red, yellow and blue represents the dideoxyribonucleotide base, A, T, C and G that stopped the reaction. As the strands are separated the dye colors are recorded and the DNA sequence can be read from the first base to the last by converting the bands of color in to the sequence.

Modern DNA sequencing methods, or next generation sequencing, use different many of the same aspects as the Sanger sequencing approach described above but may detect the positions of the bases at a different points in the reaction and with different types of detectors from microscopes to measuring the hydrogen atoms released during the reaction. The next generation sequencing instruments are much faster and cheaper than the traditional method. You can read more about them at Wikipedia.

Identification of the DNA barcode

The DNA barcodes generated by the sequencing are about 500 base pairs long. They need to be trimmed so that the sequence of the genes at the start and beginning of the ITS region are removed. The sequences can then be compared to public databases of DNA using computer search algorithms the most popular being BLAST: Basic Local Alignment Search Tool. Any high scoring matches to the barcode are evaluated. If the species has been analyzed in the past then the search should find a match with near 100% identity. If the species has never been analyzed in the past then a new record will be prepared and submitted to the public genome databases. Additional analysis can be performed to compare the barcode to those of similar species if any are known.

DNA barcoding in the press

In 2008 a 5-year, $150 million multinational effort called the International Barcode of Life Project, or iBOL was initiated. Also in 2008 two high school students took 60 samples from Sushi in NYC restaurants and stores. From the results of their analysis two of the four restaurants and 6 of the 10 grocery stores had sold mislabeled fish. In 2013 there was a European meat adulteration scandal. DNA barcoding was used to determine the original species of the meat products in the sample. In the United Kingdom 27 beef burger products were tested, 37% were positive for horse DNA, and 85% were positive for pig DNA. Of 31 beef meal products tested, 21 were positive for pig DNA but all were negative for horse DNA.

More introductions to barcoding

There are quite a few introductions to DNA barcoding by organizations that specialize in the topic that may be easier to understand than the above document or go into more details. Ones I like are by iBOL, Cold Spring Harbor Laboratory and Barcode of Life.

WPMC DNA barcoding project

Around the time of the horse meat scandal the club decided to initiate its own DNA barcoding project and developed a system of protocols, infrastructure and documentation that are distributed in a DNA barcoding kit.