Small insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs. However, inference of indels from next-generation sequence data is challenging, and so far methods for identifying indels lag behind methods for calling SNVs in terms of sensitivity and specificity.
Dindel is a program for calling small indels from short-read sequence data (‘next generation sequence data’). It is currently designed to handle only Illumina data. Dindel takes BAM files with mapped Illumina read data and enables researchers to detect small indels and produce a VCF file of all the variant calls.
The simplest use of Dindel is to consider all in indels in a read-alignment file (a BAM file), and to test whether each of these is a real indel or a sequencing error or mapping error. Dindel can also test candidate indels or sequence variants from other sources, e.g. a database of known variants, or indels called on a different sample.
It is a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included.
Dindel requires as input a file with mapped reads, a set of candidate indels, and the library insert size distributions. All of these can be inferred from the alignments produced by a read mapper, and Dindel has an option to extract candidate indels and the insert size distribution from the read-alignment file. For this, Dindel accepts files in the SAMtools BAM format. The user may choose to augment candidate indels from the read-alignment file with candidate indels or SNPs from alternative sources. For example, it is also possible to provide known SNPs and their population allele frequencies at this stage.
The core of the Dindel program is the realignment of reads to candidate haplotypes for each realignment window defined in the preprocessing step. It is possible to test indels discovered with other methods using Dindel, for instance longer indels obtained through assembly methods. Dindel will then realign both mapped and unmapped reads to see if the candidate indel is supported by the reads. Dindel outputs both genotype likelihoods and includes a script to convert these to a VCF file with indel and SNP calls.
There is basic support for outputting realigned BAM files for each realignment-window. These realigned BAM files can be used to call SNPs near (candidate) indels.