Banner10
Screen Shot 2014-02-02 at 5.20.13 PM

High-throughput life science instruments, especially DNA sequencers, are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in approximately one week. Sequencing has become a ubiquitous tool in the study of biology, genetics and disease. Today, because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, and power.

The laboratory’s goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal by:

  1. Developing methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: BowtieBowtie 2, Kraken 2, Dashing, r-index, SPUMONI, Vargas, Lighter, SamovarArioc, HISAT.  See also our read alignment review.
  2. Developing scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: recount3, Snaptron, MegadepthRecount & recount2, AscotIntropolis, Rail-RNA, Rail-dbGaP, Myrna, Crossbow, Boiler.  See also our cloud computing review.
  3. Making output from our software as interpretable and free of bias as possible.  See: Qtip, FORGe, Reference Flow and the related LevioSAM tool.

We are passionate about teaching, both in the classroom at online e.g. in our highly-rated Algorithms for DNA Sequencing course on Coursera. We freely distribute various teaching materials, including lecture videos, screencasts, lecture notes, and programming notebooks. These span subjects from programming in C/C++ to applied algorithms and data structures in computational biology. See the Teaching Materials page for links and details.

The lab is located at the Johns Hopkins University Department of Computer Science.