Including preprints.  Google Scholar version.


Kempa D and Langmead B. Fast and space-efficient construction of AVL grammars from the LZ77 parsing. To appear in ESA conference, 2021.

Preprint available

Mun T, Chen NC, Langmead B. LevioSAM: Fast lift-over of variant-aware reference alignments. Bioinformatics. May 25:btab396, 2021.

Describes the LevioSAM software for translating (“lifting”) alignments between reference genomes

Pre-built LevioSAM indexes, e.g. for human major-allele references, are available from the Bowtie websites

Ahmed O, Rossi M, Kovaka S, Schatz MC, Gagie T, Boucher C, Langmead B. Pan-genomic matching statistics for targeted nanopore sequencing. iScience. Jun 8;24(6):102696., 2021

Describes the SPUMONI software for real-time sequence classification

Boucher C, Gagie T, I T, Köppl D, Langmead B, Manzini G, Navarro G, Pacheco A, Rossi M. PHONI: Streamed matching statistics with multi-genome references. Data Compression Conference (DCC), 2021.

Rossi M, Oliva M, Langmead B, Gagie T, Boucher C. MONI: A pangenomics index forfinding mems. To appear in RECOMB conference, 2021.

Preprint available

Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B. Megadepth: efficient coverage quantification for bigwigs and bams. Bioinformatics. In press, 2021.

Describes the Megadepth software for quantifying genomic intervals from bigWig and BAM files

Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biology, 22(1):8, Jan 2021.

Describes the Reference flow software and framework for avoiding reference bias by aligning to multiple references


Darby CA, Gaddipati R, Schatz MC, Langmead, B. Vargas: heuristic-free alignment for assessing linear and graph read aligners. Bioinformatics, 36(12):3712–3718, Jun 2020.

Describes the Vargas software for heuristic-free alignment to linear and graph genomes

Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G. Efficient Construction of a Complete Index for Pan-Genomics Read Alignment. Journal of Computational Biology, 27(4):500–513, Apr 2020.

Describes the r-index software for indexing and matching against pan-genome collections

Imada EL, Sanchez DF, Collado-Torres L, Wilks C, Matam T, Dinalankara W, Stupnikov A, Lobo-Pereira F, Yip CW, Yasuzawa K, Kondo N, Itoh M, Suzuki H, Kasukawa T, Hon CC, de Hoon MJL, Shin HW, Carninci P, Jaffe AE, Leek JT, Favorov A, Franco GR, Langmead B, Marchionni L. Recounting the FANTOM CAGE-Associated Transcriptome. Genome Research, 30(7):1073–1081, Jul 2020.

Ling JP, Wilks C, Charles R, Leavey PJ, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S. ASCOT identifies key regulators of neuronal subtype-specific splicing. Nature Communications, 11(1):137, Jan 2020.

Describes the ASCOT resource for exploring alternative splicing


Baker DN, Langmead, B. Dashing: fast and accurate genomic distances with Hyper-LogLog. Genome Biology, 20(1):265, Dec 2019.

Describes the Dashing software tool for genomic sketching and distance estimation

Wood BE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1):257, Nov 2019.

Describes the Kraken 2 software tool for metagenomic classification

Wulfridge P, Langmead B, Feinberg AP, Hansen KD. Analyzing whole genome bisulfite sequencing data from highly divergent genotypes. Nucleic Acids Research. 47(19):e117, Nov 2019.

Madugundu AK, Na CH, Nirujogi RS, Renuse S, Kim KP, Burns KH, Wilks C, Langmead B, Ellis SE, Collado-Torres L, Halushka MK, Kim MS, Pandey A. Integrated Transcriptomic and Proteomic Analysis of Primary Human Umbilical Vein Endothelial Cells. Proteomics. 2019 Aug;19(15):e1800315.

Darby CA, Fitch JR, Brennan PJ, Kelly BJ, Bir N, Magrini V, Leonard J, Cottrell CE, Gastier-Foster JM, Wilson RK, Mardis ER, White P, Langmead B, Schatz MC. Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads. iScience. 2019 May 29;18:1-10.

Describes the Samovar software tool for mosaic variant detection from linked-read data

Winner of best paper at RECOMB-seq

Boucher C, Gagie T, Kuhnle A, Langmead B, Manzini G, Mun T. Prefix-free parsing for building big BWTs. Algorithms for Molecular Biology. 2019 May 24;14:13.

Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G. Efficient Construction of a Complete Index for Pan-Genomics Read Alignment. 2019 Apr. Research in Computational Molecular Biology (RECOMB), pp 158-173.

Mangul S, Martin LS, Langmead B, Sanchez-Galan JE, Toma I, Hormozdiari F, Pevzner P, Eskin E. How bioinformatics and open data can boost basic science in countries and universities with limited resourcesNature Biotechnology. 2019 Mar;37(3):324-326.


Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2019 Feb 1;35(3):421-432.

Pritt J, Chen N, Langmead B. FORGe: prioritizing variants for graph genomes. Genome Biology. 2018 Dec 17;19(1):220.

Describes the FORGe software tool

Langmead B, Nellore A. Cloud computing for genomic data analysis and collaborationNature Reviews Genetics. 2018 May;19(5):325.

Wilks C, Gaddipati P, Nellore A, Langmead B. Snaptron: querying splicing patterns across tens of thousands of RNA-seq samples. Bioinformatics. 2018 Jan 1;34(1):114-116.

Describes the Snaptron web service and software (client and server).


Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J, Morton J, Leek JT, Langmead B. Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics. 2017 Dec 15;33(24):4033-4040.

Describes the Rail-RNA software tool, as presented at HiTSeq 2016.

Langmead B. A tandem simulation framework for predicting mapping quality. Genome Biology. 2017 Aug 10;18(1):152.

Describes the Qtip software.

Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. Reproducible RNA-seq analysis using recount2. Nature Biotechnology. 2017 Apr 11;35(4):319-321.

Describes the recount resource and Bioconductor package.

Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B, Irizarry RA, Leek JT, Jaffe AE. Flexible expressed region analysis for RNA-seq with derfinder. Nucleic Acids Research. 2017 Jan 25;45(2):e9.

Describes the derfinder differential expression tool.


Nellore A, Jaffe AE, Fortin JP, Alquicira-Hernández J, Collado-Torres L, Wang S, Phillips RA, Karbhari N, Hansen KD, Langmead B, Leek JT. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive . Genome Biology. 2016, 17:266.

Describes the Intropolis resource. Research highlight by Robert & Watson.

Darby MM, Leek JT, Langmead B, Yolken RH, Sabunciyan S. Widespread splicing of repetitive element loci into coding regions of gene transcripts. Molecular Genetics. 2016 Nov 15;25(22):4962-4982.

Pritt J, Langmead B. Boiler: lossy compression of RNA-seq alignments using coverage vectors. Nucleic Acids Research. 2016 Sep 19;44(16):e133.

Describes the Boiler RNA-seq alignment compression tool.

Nellore A, Wilks C, Hansen KD, Leek JT, Langmead B. Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce. Bioinformatics. 2016 Aug 15;32(16):2551-3.

Describes the Rail-dbGaP software and protocol.

The Computational Pan-Genomics Consortium (incl Langmead B). Computational pan-genomics: Status, promises and challenges. Briefing in Bioinformatics. 2016 Oct 21.


Reinert K, Langmead B, Weese D, Evers DJ. Alignment of Next-Generation Sequencing Reads. Annual Reviews: Genomics and Human Genetics. 2015;16:133-51.

Frazee AC, Jaffe AE, Langmead B, Leek JT. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics. 2015 Sep 1;31(17):2778-84.

Describes the Polyester software tool.

Kim D, Langmead B, Salzberg S. HISAT: a fast spliced aligner with low memory requirementsNature Methods 2015 Apr;12(4):357-60.

Describes the HISAT software tool, based on Bowtie 2.

Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT.  Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature Biotechnology 2015 Mar;33(3):243-6.

Describes the Ballgown software tool.

Wilton R, Budavari T, Langmead B, Wheelan S, Salzberg S, Szalay, A. Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space. PeerJ 2015 3:e808.

Describes the Arioc software tool.


Frazee AC, Collado Torres L, Jaffe AE, Langmead B, Leek JT. Measurement, Summary, and Methodological Variation in RNA-sequencing. Statistical Analysis of Next Generation Sequencing Data. Springer International Publishing, 2014. 115-128.

Song L, Florea L, Langmead B.  Lighter: fast and memory-efficient error correction without countingGenome Biology,2014 Nov 15;15(11):509.

Describes the Lighter software tool.

Hansen KD, Sabunciyan S, Langmead B, Nagy N, Curley R, Klein G, Klein E, Salamon D, Feinberg AP. Large-scale hypomethylated blocks associated with Epstein-Barr virus-induced B-cell immortalizationGenome Research. 2014 Feb;24(2):177-84.


Schatz MC, Langmead B. The DNA Data DelugeIEEE Spectrum. July, 2013.

Slashdotted. JHU news release and magazine article.


Herb BR, Wolschin F, Hansen KD, Aryee MJ, Langmead B, Irizarry R, Amdam GV, Feinberg AP. Reversible switching between epigenetic states in honeybee behavioral subcastesNature Neuroscience. 2012 Oct;15(10):1371-3.

Johns Hopkins Medicine news piece

Gurtowski J, Schatz MC, Langmead B. Genotyping in the cloud with crossbowCurr Protoc Bioinformatics. 2012 Sep;Chapter 15:Unit15.3.

KD Hansen*, Langmead B*, Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology, 2012;13:R83. * Equal contribution

Describes the BSmooth software tool.

Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2Nature Methods. 2012, 9:357-359.

Describes the Bowtie 2 software tool. Selected for author profile.


Frazee A, Langmead B, Leek JT. ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasetsBMC Bioinformatics. 2011, 12:449.

Describes the ReCount database.

Hansen KD*, Timp W*, Corrada Bravo H*, Sabunciyan S*, Langmead B*, McDonald OG, Wen B, Wu H, Liu Y, Diep D, Briem E, Zhang K, Irizarry RA, Feinberg AP. Increased methylation variation in epigenetic domains across cancer types .Nature Genetics. 2011 Jun 26;43(8):768-75. * Equal contribution

Langmead B. Aligning Short Sequencing Reads with BowtieCurr Protoc Bioinformatics. 2010 Dec;Chapter 11:Unit 11.7.


Leek JT, Scharpf RB, Corrada Bravo H, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in high-throughput dataNature Reviews Genetics. 2010 Sep 14.

Langmead B, Hansen KD, Leek JT. Cloud-scale RNA-sequencing differential expression analysis with MyrnaGenome Biology. 2010;11(8):R83

Describes the Myrna software tool.

Langmead B. Cloud Computing for Data Analysis: Toward the Plateau of ProductivityBio IT-World. 2010 August; Vol. 9, No. 4: 36.

Schatz MC, Langmead B, Salzberg SL. Cloud computing and the DNA data race.Nature Biotechnology. 2010 Jul;28(7):691-3.


Langmead B Highly Scalable Short Read Alignment with the Burrows-Wheeler Transform and Cloud Computing 2009; Master’s thesis, University of Maryland.

Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computingGenome Biology. 2009;10(11):R134

Describes the Crossbow software tool.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genomeGenome Biology. 2009;10(3):R25

Describes the Bowtie software tool. Winner: Genome Biology Award for outstanding article in the journal Genome Biology in 2009. Selected for minireview: The Need for Speed