Rail-dbGaP preprint

Abhinav Nellore and colleagues released a study describing a protocol and software tool for analyzing protected genomic data on a commercial cloud. Public sequencing archives like the SRA contain thousands of trillions of bases of valuable sequencing data. More than 40% of the SRA is human data protected by provisions such as dbGaP. To analyze dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.

We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol is applicable to any MapReduce tool running on Amazon Web Services. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9,662 samples from the dbGaP-protected GTEx consortium dataset. These are important first steps toward making it easy for typical biomedical investigators to study protected data, regardless of their local IT resources or expertise.

Date
Categories
Tags
Permalink
Status

Published:January 15, 2016

Uncategorized

Bookmark the permalink

Both comments and trackbacks are currently closed.