Tag: PhD projects

Using DNA barcoding to genetically identify inbred Drosophila lines

Using DNA barcoding to genetically identify inbred Drosophila lines

Working with a large collection of Drosophila melanogaster strains takes large amounts of manual work. First, to ensure a stable growth of the different strains. Second, to generate enough sample material for experiments. Because of this chances are that by mistake sometimes strains are mis-labeled or get mixed up. These small mistakes of-course negatively influence all analysis further downstream.

In meta-genomics projects they use known polymorphic sites (ranging from microsatellites to SNPs) to identify what organisms are inside a sample. Now that within the Drosophila Genetic Reference Panel (DGRP) all inbred strains are fully sequenced we can use the polymorphic sites (in our case SNPs) to identify each strain uniquely using the same barcoding idea. I developed several functions in R that help selecting regions to be targeted using cheap old-fashioned RT-PCR to identify all of the strains uniquely without having to construct unique primers for each individual strain.

In the example below we selected a region that, when sequenced, is able to identify 28 unique strains using the 31 SNPs inside this specific region. By selecting several of this regions you can identify every strain with a high confidence.

 

De-novo sequence assembly of high coverage genomes

De-novo sequence assembly of high coverage genomes

As part of my PhD studies I was working on some individuals from the Drosophila Genetic Reference Panel (DGRP) that are sequenced with high coverage. For 2 strains (RAL 375 and 852) an average sequence coverage of 25X sequence data is available making things like de-novo sequence assemblies possible. Using the Velvet we constructed several de-novo assemblies for both individuals (n50 of 40k and 50k) and studied these assemblies to check for sequence divergency and potential large structural variance.

We took the contigs constructed by velvet for each of the individuals and aligned them against different reference genomes of Drosophila and some close related species using Exonerate. By plotting the length versus exonerate score (as a measurement of sequence similarity of contigs towards the reference genomes) it becomes clear that contigs of our de-novo sequence assemblies show the largest sequence similarity with the Drosophila melanogaster reference genome, something that was of-course expected. This is a positive result that strengthens our believe into that you can use sequence assemblers like Velvet to re-construct unknown genomes when sufficient sequence coverage is available.

One contig is showing a more interesting result that we haven’t been able to explain sufficient enough. This one contig is showing a much lower exonerate score than expected based on the contig length (exonerate score is scaling linear with contig length normally). This could indicate that in this specific contig something biological relevant is happening or that in this specific contig Velvet made a mistake. With the current data and coverage for that area we cannot conclude any of these results.