High-throughput life science instruments, especially DNA sequencers, are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in approximately one week. Sequencing has become a ubiquitous tool in the study of biology, genetics and disease. Today, because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, and power.
The laboratory’s goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal by:
- Developing methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: Bowtie, Bowtie 2, Kraken 2, Dashing, r-index, SPUMONI, Vargas, Lighter, Samovar, Arioc, HISAT. See also our read alignment review.
- Developing scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: recount3, Snaptron, Megadepth, Recount & recount2, Ascot, Intropolis, Rail-RNA, Rail-dbGaP, Myrna, Crossbow, Boiler. See also our cloud computing review.
- Making output from our software as interpretable and free of bias as possible. See: Qtip, FORGe, Reference Flow and the related LevioSAM tool.
We are passionate about teaching, both in the classroom at online e.g. in our highly-rated Algorithms for DNA Sequencing course on Coursera. We freely distribute various teaching materials, including lecture videos, screencasts, lecture notes, and programming notebooks. These span subjects from programming in C/C++ to applied algorithms and data structures in computational biology. See the Teaching Materials page for links and details.
The lab is located at the Johns Hopkins University Department of Computer Science.