In collaboration with groups at University of Florida (Boucher Lab), CeBiB and Diego Portales University (Travis Gagie), and University of Eastern Piedmond (Giovanni Manzini), Taher Mun and Ben Langmead released two preprints describing recent work on indexing and querying highly repetitive sequence collections. Both of the preprints — named “Prefix-Free Parsing for Building Big BWTs” and “Efficient Construction of a Complete Index for Pan-Genomics Read Alignment” — build on the RLFM Index construction and present improved techinques for building the BWT and Suffix-Array-sample data structures needed to turn the RLFM into a full-fledged genomic index. With these projects, we are approaching the goal of building practical, queryable genome indexes for many individuals of the same species, a kind of “pan genome.” In particular, we are studying how to do this with human genome sequences in preparation for the fast approaching day when high-quality long-read assemblies of human genomes appear frequently.
Published:November 21, 2018
Bookmark the permalink
Both comments and trackbacks are currently closed.