Nae-Chyun’s paper describing the “reference flow” alignment framework appeared in the journal Genome Biology today. Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, consisting of one string per chromosome. But a linear reference is an arbitrary point of reference; using a single linear reference causes “reference bias,” a tendency…
Chris defends
Huge congratulations to Chris Wilks, who successfully completed his Ph.D. defense on September 30th! Chris’ thesis, titled “Enabling Efficient and Streamlined Access to Large Scale Genomic Expression and Splicing Data”covers his work on Snaptron (paper), the recount3 project, a recent project studying alignment errors in long-read RNA-seq datasets, and several other collaborative efforts (e.g. Rail-RNA,…
Charlotte defends
Huge congratulations to Charlotte Darby, who successfully completed her Ph.D. defense on May 26th! Charlotte’s thesis, titled “Computational methods addressing genetic variation in next-generation sequencing data” covers her work on Samovar (paper), scHLAcount (paper) and Vargas (paper), among other projects. Charlotte was co-advised by Ben and by Dr. Mike Schatz. Next, Charlotte will join Rahul…
snapcount in Bioconductor
Led by Software Engineer Rone Charles and Ph.D. candidate Chris Wilks, we submitted an R/Bioconductor package called snapcount, which was accepted and is included in Bioconductor 3.11. snapcount makes it easy to query the powerful Snaptron server using a natural, accessible set of query functions. Specifically, you can query measurements for genes, exons, splice junctions…
Vargas in Bioinformatics
Ph.D. candidate Charlotte Darby, extending work by former Masters student Ravi Gaddipati, published a study describing Vargas, a heuristic-free read alignment software tool. The study appeared in the journal Bioinformatics. The open source Vargas tool runs efficiently on modern SIMD and multithreaded architectures. By avoiding heuristics — rules that allow aligners to ignore certain portions of…
r-index papers in JCB
Ph.D. candidate Taher Mun, together with Alan Kuhnle and co-authors, published a journal article and accompanying software article in the Journal of Computational Biology. We demonstrate new methods for text indexing and querying using the r-index, which represents an advance on earlier methods like the RLFM index and FM Index. This new method makes it…
Reference flow preprint
Student Nae-Chyun Chen and colleagues just posted a new preprint describing his work on the “reference flow” alignment framework. Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, made up of a single string per chromosome. But failure to account for genetic variation causes reference bias and confounding of results…
FC-R2 in Genome Research
A new study describing our FC-R2 (for: “FANTOM-CAT recount2”) resource is out in Genome Research. FC-R2 is a new quantification of the recount2 summaries using the more inclusive annotation produced by the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) project. This annotation consists of over 109,000 coding and noncoding genes. By combining this annotation with the recount2 resource,…
ASCOT in Nature Comms
The ASCOT study appeared in Nature Communications today. ASCOT is a new resource allowing researchers to visualize and query alternative splicing patterns in public RNA-Seq data. The resource is freely available at ascot.cs.jhu.edu. To populate ASCOT, we used Snaptron to identify splice-variants across tens of thousands of bulk and single cell RNA-Seq datasets in human…
Vargas preprint
Ph.D. student Charlotte Darby and former Masters student Ravi Gaddipati posted a preprint describing their work on Vargas, a heuristic-free read alignment software tool that runs efficiently on modern SIMD and multithreaded architectures. Heuristics are rules that allow aligners to ignore certain portions of the search space that seem to contain only low-scoring alignments. Avoiding…