Ph.D. student Charlotte Darby and former Masters student Ravi Gaddipati posted a preprint describing their work on Vargas, a heuristic-free read alignment software tool that runs efficiently on modern SIMD and multithreaded architectures. Heuristics are rules that allow aligners to ignore certain portions of the search space that seem to contain only low-scoring alignments. Avoiding heuristics has a major advantage and a major disadvantage. The advantage is that the answer produced is definitely the correct one as dictated by the alignment scoring function, with no risk that it was eliminated by a heuristic. This gives us the unique ability to evaluate aligners and the specific effects their heuristics are having. The disadvantage is that it is extremely work-intensive; drastically slower than heuristic aligners.
In the study, we demonstrate that our implementation is efficient due to hardware-level vectorization with SIMD instructions and multi-threading for parallelization. Then, we measure alignment correctness for popular aligners like Bowtie 2, BWA-MEM, BWA aln, HISAT2 and vg for tens of thousands of real sequencing reads based on whether the optimal alignment score is achieved. This highlights the scenarios where the tool’s heuristics fail to identify the optimal alignment for each read. We also evaluate how the aligner’s mapping quality compares to the mathematical definition based on whether the optimal alignment location is achieved. Finally, we show an example workflow of how Bowtie 2 alignment parameters can be optimized using Vargas based on a small set of reads annotated with the optimal alignment score to improve the mapping of ChIP-seq data.
Please check out the paper and the software, available from GitHub.