Scientists understand that data is the fuel that powers insight, discovery, and innovation, and because of that they are keen on having the best data storage options available.
For example, notes Mark Pastor, director of data intelligence solutions at Quantum, a data storage company, in a recent commentary, the Institute of Cancer Research (ICR) says big data analytics plays an important role in the discovery of cancer drugs. “Scientists are analyzing vast amounts of data—from patient samples, genomic sequencing, medical images, lab results, experimental data, pharmacological data, and many other sources—to help in their efforts.”
And according to Dr. Bissan Al-Lazikani, head of data science at ICR, more data is better. “The more data we are gathering,” says Dr. Al-Lazikani, “the more patients we are profiling, the smarter the computer algorithms: the better we are becoming at discovering drugs for cancer.”
Perhaps the key to understanding the importance of storage, Pastor observes, is to remember that “(d)ata is not stagnant. It has a lifecycle; it grows and ages. In addition, it must be managed. Once data is created, it must be stored, accessed for computational analysis and collaboration, archived for future use, and protected at every step against the risk of loss. As the amount of scientific data at research institutions grows, these tasks become more difficult.”
In simple terms, “high performance is important in research. Faster computing power means more data can be analyzed in less time, which can accelerate the research process. Storage infrastructure plays a significant role in the performance of computing environments. High performance requires an infrastructure capable of fast I/O operations without bottlenecks. When storage capacity reaches the multiple petabyte level, maintaining high performance access is a challenge.”
For example, he says, as storage size grows, data backup procedures must change. “When data reaches the petabyte level, traditional data backup operations are no longer able to handle the volume. Still, data must be protected against hardware failures. Installing secondary storage arrays for the purpose of data replication is one way to backup data. But that can be an expensive solution.”
As he sees it, to build a storage infrastructure capable of handling the growing volume of scientific data, research institutions must find ways to blend different storage technologies together, preferably implementing multiple tiers of storage. “In a multi-tier environment, total storage capacity is broken into different forms of media. There is high-performance disk or flash storage for active files—those files that are part of an active project or are undergoing computational analysis. The remainder of the capacity consists of tape or cloud storage.”