We posted a preprint on our thread scaling work in Bowtie, Bowtie 2 and HISAT: https://doi.org/10.1101/205328. While work on efficient genomics software has generally focused on speed on a fixed, small number of threads, general-purpose processors are now capable of running hundreds of threads of execution simultaneously in parallel. Intel’s Xeon Phi Knights Landing architecture for example supports 256–288 simultaneous threads. The underlying architectures are also quite complex; a many-core processor more resembles a small computer cluster on a chip than the Pentium chips of the past. Methods used to maximize efficiency on older processors are not well suited and new efforts are needed to allow genomics tools to exploit these architectures.
In this preprint we tackled the problem of scaling read aligners to hundreds of threads on general-purpose processors. We concentrate on the Bowtie, Bowtie 2 and HISAT tools since they are widely used and representative of a wider group of embarrassingly parallel tools. We explore key issues posed by these architectures and suggest solutions and measure their effect on thread scaling. We ultimately achieve excellent thread scaling to hundreds of threads, and this manuscript announces the availability of the new official releases of Bowtie and Bowtie 2 tools that implement these ideas. We also suggest a small change to common genomics file formats, e.g. FASTA and FASTQ, that can yield substantial additional thread scaling benefits.
This work is supported by funds for the Intel Parallel Computing Center at Johns Hopkins University, NIH/NIGMS grant R01GM118568, and the Stampede 2 resource at the Texas Advanced Computing Center (TACC) that we accessed through XSEDE research project TG-CIE170020.