Amazon, Google, IBM, Microsoft join forces with MIT and Harvard on cloud-based genome analysis toolkit
The Broad Institute of MIT and Harvard is teaming up with Amazon Web Services, Cloudera, Google, IBM, Intel and Microsoft to provide its Genome Analysis Toolkit, which it calls GATK, as a cloud-based service.
GATK will be available via SaaS, or software as a service, and Broad Institute will continue to offer the toolkit as a direct download for on-premise use.
“By providing a cloud-hosted solution, we can greatly expand access and facilitate usage of these genome analysis tools,” Eric Banks, senior director of Data Sciences and Data Engineering at Broad, said in a statement.
Banks developed the GATK software package, which has attracted more than 31,000 registered users to date.
“The vast majority set up an extensive local compute and storage infrastructure to process the huge amount of information required to conduct genomic analyses,” Banks added.
By making GTAK available as a cloud-based services, Banks said Broad is hoping to eradicate traditional barriers to scaling those resources.
Broad Institute executives expect users will be able to access cloud-based GATK options beginning later this year.
The new collaborations will also help Broad Institute drive the development of GATK4, the next generation of GATK based on the Apache Spark open source distributed computing framework, according to the institute.
GATK4 will utilize Spark to facilitate parallelism and in-memory computations, thus speeding up the methods. GATK4 will also extend the range of use cases supported by GATK to include cancer, structural variation, copy number variation, and more.