A paper by Li Song, Liliana Florea and Ben Langmead describing a new software tool called Lighter just appeared in Genome Biology. Lighter is a tool for correcting sequencing errors in next-generation DNA sequencing datasets. Lighter is extremely fast and memory efficient because, unlike comparable tools, it does not count k-mers (length-k substrings) in the input reads. In large datasets, counting k-mers requires a great deal of space. Previous tools have proposed ways to trade off between space and time by strategically shuffling counts onto and off of disks, or by using Bloom filters to offload some of the counting burden. But Lighter avoids counting entirely, instead using a combination of sampling and Bloom filters. As sequencing depth increases, Lighter can take sparser samples without sacrificing accuracy and without requiring additional space. Lighter can use the same amount of memory to correct a 10-fold coverage dataset as for a 100 or 1000-fold coverage dataset. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy. The Lighter code is hosted in a public GitHub repository: https://github.com/mourisl/Lighter.
Lighter appears in Genome Biology
Published:November 16, 2014
Bookmark the permalink
Both comments and trackbacks are currently closed.