Ph.D. candidate Taher Mun, together with Alan Kuhnle and co-authors, published a journal article and accompanying software article in the Journal of Computational Biology. We demonstrate new methods for text indexing and querying using the r-index, which represents an advance on earlier methods like the RLFM index and FM Index. This new method makes it computationally feasible to both index and query huge repetitive collections of text. We show that this method can index many human genome assemblies, enabling queries that are faster than Bowtie for large repetitive collections. The journal article describes advances first described in our RECOMB 2019 article “Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.” In the future, this method could be used to index and query collections of thousands of human genome assemblies.
The accompanying software demonstrates how to get started with running the software, which already works with DNA data. The open source software is available at: https://github.com/alshai/r-index.
This work is in collaboration University of Florida (Boucher Lab), Dalhousie University (Travis Gagie), and University of Eastern Piedmond (Giovanni Manzini).