There has been much interest in methods for building and aligning to “graph genomes,” which take variants into account when aligning sequencing reads. But basic questions remain: Which variants should we include in the reference? Is including more variants always better? How close these promising new methods come to the ideal of aligning to a personalized genome?
Jacob Pritt and PI Langmead just finished a manuscript (posted on bioRxiv) addressing these and other important questions. We introduce models for assessing the pros and cons of including a particular genetic variant in the graph genome. We implement these methods in a new, open source software tool called FORGe.
Our experiments show that graph-genome alignment techniques (HISAT2 in particular) are computationally affordable, genuinely improve the “bottom-line” in terms of accuracy and bias, and approach the performance of an ideal personalized genome. FORGe-constructed graph genomes perform better on alignment accuracy measures than linear genomes or typical HISAT2-generated graphs. In short: graph genomes can be usable and useful provided we take care in selecting which variants to include.