Derived from this image by NASA Goddard Space Flight Center under CC BY 2.0

ddRAGE: A data set generator to evaluate ddRADseq analysis software

Abstract

High-throughput sequencing makes it possible to evaluate thousands of genetic markers across genomes and populations. Reduced-representation sequencing approaches, like double-digest restriction site-associated DNA sequencing (ddRADseq), are frequently applied to screen for genetic variation. In particular in nonmodel organisms where whole-genome sequencing is not yet feasible, ddRADseq has become popular as it allows genomewide assessment of variation patterns even in the absence of other genomic resources. However, while many tools are available for the analysis of ddRADseq data, few options exist to simulate ddRADseq data in order to evaluate the accuracy of downstream tools. The available tools either focus on the optimization of ddRAD experiment design or do not provide the information necessary for a detailed evaluation of different ddRAD analysis tools. For this task, a ground truth, that is, the underlying information of all effects in the data set, is required. Therefore, we here present ddRAGE, the ddRAD Data Set Generator, that allows both developers and users to evaluate their ddRAD analysis software. ddRAGE allows the user to adjust many parameters such as coverage and rates of mutations, sequencing errors or allelic dropouts, in order to generate a realistic simulated ddRADseq data set for given experimental scenarios and organisms. The simulated reads can be easily processed with available analysis software such as stacks or pyrad and evaluated against the underlying parameters used to generate the data to gauge the impact of different parameter values used during downstream data processing.

Publication
Molecular Ecology Resources (preprint)
Date