Ph.D. candidate Charlotte Darby, extending work by former Masters student Ravi Gaddipati, published a study describing Vargas, a heuristic-free read alignment software tool. The study appeared in the journal Bioinformatics. The open source Vargas tool runs efficiently on modern SIMD and multithreaded architectures. By avoiding heuristics — rules that allow aligners to ignore certain portions of the search space that seem to contain only low-scoring alignments — we can ensure that the answer produced by Vargas is the best possible as dictated by the alignment scoring function. This gives us the unique ability to evaluate aligners and the specific effects their heuristics are having. The disadvantage is that it is extremely work-intensive; drastically slower than heuristic aligners.
We demonstrate that Vargas is efficient, due in large part to its use of vectorization with SIMD instructions. We also use Vargas to measure alignment correctness for popular aligners like Bowtie 2, BWA-MEM, HISAT2 and vg for tens of thousands of real sequencing reads based on whether the optimal alignment score is achieved. This highlights the scenarios where the tool’s heuristics fail to identify the optimal alignment for each read. We evaluate how the aligner’s mapping quality compares to the mathematical definition based on whether the optimal alignment location is achieved. Finally, we show an example workflow of how alignment parameters for various aligners can be optimized using Vargas based on a small set of reads annotated with the optimal alignment score.
Please check out the paper and the software, available from GitHub.
(Impressively, Charlotte also published a separate study describing her work as a summer at 10x Genomics the previous summer in the same journal two days later. Check it out: scHLAcount.)