Unveiling Emerging Data-centric Storage Architectures
The digital way of interactions is becoming normal as data is the output of all our everyday activities. This is the reason even enterprises are adopting data-centric architectures and it is estimated in one of the articles by Tony Bishop that by 2024, Global 2000 Enterprises will create data at a rate of 1.1 million gigabytes per second and will require 15,635 exabytes of additional data storage annually. In fact, at the recent SNIA SDC USA 2020, emerging storage architectures was a hot topic among the speakers.
In one of the keynote sessions at SDC 2020, Pankaj Mehra, VP Storage Pathfinding, Samsung Electronics, discussed at length about “Emerging Data-centric Storage Architectures“. He covered in his keynote, advanced workload-optimized SSDs by Samsung for data at a large scale. Let’s dig deeper to understand the takeaways from his session.
Challenges of data at a large scale
Pankaj explained the current scenario of data at a large scale with the bottlenecks and inefficiencies faced by enterprises:
- You want your processing power and processing bandwidth to not bottleneck, because to handle data at scale, you need the ability to process and move that data to the processing that scales with it.
- With large data, you inevitably end up with a very large number of objects and if the metadata that you have is too granular or insufficiently granular, then you will end up with metadata inefficiencies.
- In some of the infrastructure trends, we are now disaggregating storage from the compute mode, which wants to be increasingly stateless. We are moving towards an architecture where our storage is connected to the data center fabric, in this case, the choice of protocol inevitably revolves around NVMe over Fabric, NVMe over TCP, and more. Here the question is, how many times, where for instance you should be terminating that wire protocol. We will notice that in current architectures you can have bottlenecks due to repeated terminations of protocols, buffering, and re-buffering which leads to latencies and pipeline bubbles.
So, the idea of noticing bottlenecks led to inefficiencies such as:
- Inability to deliver both performance and scale due to the bottleneck of processing power and processing bandwidth
- Wasted endurance, wasted memory bandwidth due to the metadata inefficiency of object storage & retrieval
- CPU overhead of I/O, CPU overhead of I/O virtualization due to wire protocol termination for disaggregated flash
These inefficiencies and bottlenecks are not unknown to the community, but the problem is that the ideas that act as a solution apply to the system and software level. Pankaj highlighted these ideas and how they are in the industry for many years and how we are revisiting these in a new way:
- Disaggregated Storage: This is also called a fiber channel where we tried to move our peripherals over the network. These days we have more unified data center fabrics, where we are moving storage to work over ethernet and TCP. Next, we want to move from system and software level to SSD level.
- Computational Storage: The idea predates over to the time when we used flash memories in the data center largely. It was practiced in tandem with the mainframes. The idea was at the system and software level, same as disaggregated storage. But with the new type of computational storage architecture, we are noticing that this idea is ready to move to the SSDs.
- Key-value Device: This is a more familiar idea to the industry, that is of object storage. There have been several efforts by SNIA and various companies to try and build object storage devices, which are now matured as second-generation devices that are more native key-value devices.
In his session ahead, Pankaj highlighted the growing importance of computational storage and how it is scaling to accelerate data-rich workloads. He further explained some of Samsung’s use cases around SmartSSD, Ethernet SSD, Key-value SSD, and more with a comparative study highlighting its features and benefits from an application point.
In our upcoming keynote session at SNIA SDC India 2020, we will elaborate on the idea of computational storage and its position in the market. You can reach out to our speaker Rohit Srivastava during the session and ask your queries about CS at the event. For that, don’t forget to register first!
You can also watch the complete keynote session by Pankaj at SNIA SDC USA 2020 here.