We posted a preprint today describing the new recount3 resource. This effort was led by (now graduated) Ph.D. student Chris Wilks and spanned multiple research groups including our group, Kasper Hansen’s group and Leonardo Collado Torres’ group.
recount3 is a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as the study explorer and other complementary web resources. Summaries in recount3 data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. But summaries can also be queried using the Snaptron service, via a REST API or the snapcount package.
recount3 improves on our previous recount2 resource in most ways, notably size. recount3 includes summaries for 316,443 human runs from the Sequence Read Archive, plus 416,803 mouse runs, plus all runs from GTExV8 and TCGA.
The Monorail analysis system and workflow can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data.
Check out the Quick Access guide to get started.
Congrats to the whole team!