A team led by Ph.D. student Omar Ahmed just published a journal paper in iScience describing a novel method and new software tool called SPUMONI. SPUMONI can rapidly match sequencing reads to a pan-genome index, i.e. an index consisting of many strains or individuals. At SPUMONI’s core is a novel algorithm for computing “matching statistics” against an efficient kind of pan-genome index called an r-index. (I am copying some text I used to describe this project — which was then only a preprint — in an earlier post.)
We evaluated the method in the context of targeted Nanopore sequencing. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI’s index and peak memory footprint are also 15 to 4 times smaller than minimap2, respectively. These improvements become even more pronounced with even larger reference databases; SPUMONI’s index size scales sublinearly with the number of reference genomes included. This could enable accurate targeted sequencing even in the case where the targeted strains have not necessarily been sequenced or assembled previously.
SPUMONI has been accepted to the RECOMB-SEQ workshop and will be presented there (virtually) in August 2021.
Huge congrats to the whole team, including Omar, Massimiliano Rossi, Mike Schatz, Sam Kovaka, Christina Boucher and Travis Gagie!
This work is based on the MONI method and algorithm by Massimiliano Rossi and others, to be presented (virtually) at RECOMB 2021 in August 2021.