Big data is, well, big. The Data revolution is here. The rate at which data is being created is accelerating at an exponential rate. I was recently reading that by 2020, the total data in the universe will grow from 3.2 zettabytes to 40 zettabytes (Roughly a billion terabytes in one zettabyte). Such massive amount of data is creating many challenges for big data storage and access. Worldwide, there are more than 500,000 data centers, covering square footage area equal to about 6,000 football fields (Source: Emerson Network Power).
For organizations to be able to take advantage of big data, what is needed is the real-time analysis and reporting while providing efficient storage and processing of the massive data. Let’s look at some of the challenges in the big data storage –
Data security and sensitivity are huge issues. To protect the high value data from intrusion, theft, cyber-attacks, or corruption, data scientist need to take extra care. Considering the factors of security, privacy, and regulatory compliance, many businesses are moving away from public cloud environments and choosing to store the data on private cloud or protected infrastructure. Apart from protecting the environment, businesses could consider using techniques such as attribute-based encryption and also apply access controls to protect their sensitive data.
Data Transfer Rates
In this fast-growing business environment, data gathered from multiple primary sources need to quickly move to multiple sources to enable quick and real-time analysis. Traditional transport methods have been struggling to move the massive amount of data at high speed. Any delay in moving data into or out of storage is not acceptable. The use of public cloud for data storage has been seriously hampering data transfer rates. Many businesses have started using private HPC. IT systems now need to be designed to accommodate these changing requirements along with the traditional requirements of high availability, reliability, and backup.
With the explosion of Big Data, the Cloud providers are finding ways to handle the extra storage and processing needs. Performance needs are also increasing very fast. Traditional hard disk drives are proving to be inadequate for the current and future needs. For most of the businesses, fast access to data is a requirement today. In order to address the growing need of higher and faster performance, many cloud providers are turning to flash storage. In comparison with HDD, Flash storage clearly wins on performance. While there is the drawback of higher costs, the costs have been declining consistently. Experts seem to be predicting that the cost of flash storage will soon be comparative with HDD in the very near future.
Large volume of data can exceed the capacity threshold of traditional storage systems. Such storage systems cannot deal with the data volume – this can lead to storage sprawl with storage silos, multiple points of management, and consumption of a large amount of floor space, power and cooling. To deal with these issues, businesses have started adopting object-based storage systems to easily scale to large volumes of data objects within a single managed system. These systems, with their robust metadata, enable easier content management and tracking. These systems also use dense, low-cost disk drives to optimize the physical storage space.
Considering the big potential of Big Data and potential analytics market opportunity, it is not surprising that big companies like EMC and NetApp have introduced storage systems for Hadoop environments. These systems tackle the scalability and data protection issues which were prominent with HDFS. Of course, there is no one solution which fits all. It is important to look at each business problem individually and define the customized storage approach.
To know more email: email@example.com
Anupam Bhide | Calsoft Inc.
Latest posts by Anupam Bhide (see all)
- Take a Closer Look at Your Storage Infrastructure to Resolve VDI Performance Issues - May 12, 2017
- Container Ecosystem Trends 2017 - April 14, 2017
- Data Center Infrastructure Automation Trends 2017 - March 30, 2017