Data center storage plays a crucial role in the IT infrastructure of today’s enterprises, with demand for it growing rapidly and continuously. To meet this demand, organizations must manage an increasing volume of data generated by new applications, ensure fast transfer rates and low-latency access by using suitable storage technologies, and address issues related to redundancy, uptime, and resilience against various types of failures.
Data center storage comprises the integrated hardware, software, and processes used for storing, managing, and distributing digital data in a centralized location. It includes storage devices such as HDDs, SSDs, and tape drives, arranged in racks and clusters for optimized operation and efficiency.
Dgtl Infra explores the intricacies of data center storage, offering insights from the types of storage solutions (DAS, NAS, and SAN) to the next-generation technologies. Whether you’re focused on the different kinds of information stored in data centers, the importance of these storage systems, or looking into how the cloud impacts storage technologies, we provide a wealth of information that comprehensively covers the topic. Keep reading to explore the capacity and cost considerations of data center storage, and get familiar with leading companies driving innovation in this space.
What is Data Center Storage?
Data center storage is the collective hardware, software, systems, and processes used to store, manage, and distribute large amounts of digital information in a centralized computing environment. This storage infrastructure consists of different types of hardware like hard disk drives (HDDs), solid-state drives (SSDs), and tape drives, as well as software for data management and backup.
The storage hardware is mounted in dedicated storage trays, which are then placed within racks, cabinets, or chassis. These are organized into rack units (U) and arranged in rows within the data center. For enhanced organization and efficiency, racks and cabinets can be further assembled into clusters. These clusters are interconnected, allowing the storage hardware to operate as a unified system within the data center.
For example, in Microsoft’s data center in Washington state, the racks on the right side (as shown in the image below) house HDD JBODs (Just a Bunch Of Disks), demonstrating a specific arrangement of storage hardware within the facility.
Additionally, data center storage involves a set of policies and procedures designed to manage the storage and retrieval of data effectively. This includes detailing methods for data collection, ensuring the security of stored data, implementing access control measures, maintaining data availability, setting storage quotas, and establishing regular backup schedules.
What Data is Stored in Data Centers?
Data centers store a wide range of digital information for various organizations and services, including:
- Web Content: Comprises websites, online applications, and services, along with their related data such as HTML files, scripts, multimedia content (including text, audio, images, and video), and databases
- Business Data: Companies rely on data centers for storing essential internal data. This includes customer records, transaction histories, employee information, financial records, and proprietary research. Many organizations also host their databases within data centers, utilizing SQL and NoSQL technologies
- Email and Communication Records: Data centers are crucial for storing email data, which includes attachments and metadata. They also store logs and content from communication platforms like Slack and Microsoft Teams
- Application and User Data: Information generated from software applications and social media platforms fall under this category, including personal profiles, communication history, usage metrics, and purchase records
- Big Data and Analytics: Data centers hold large datasets used for analytics, training machine learning models, and research purposes. This can involve data sourced from IoT devices, sensors, and user interactions
Importance of Data Center Storage
Data center storage is important for modern computing and information management for several reasons:
- Data Accessibility and Availability: Data centers are crucial for ensuring data is readily accessible and available for users and applications, facilitating efficient data retrieval and storage. This capability is vital for the smooth and continuous operation of organizations, especially for the online services and applications of online retailers, financial services, and healthcare providers, which require constant data access
- Data Security and Protection: Data centers provide security measures to protect sensitive data against unauthorized access, breaches, and cyberattacks through physical security, network security protocols, firewalls, and data encryption. They also feature backup and disaster recovery systems to safeguard data from loss due to hardware failure or natural disasters
- Scalability and Flexibility: Data centers provide scalable storage solutions that accommodate the growing data storage needs of expanding organizations, without requiring significant upfront investments in hardware and facilities. This scalability allows organizations to manage increasing data volumes efficiently, avoiding performance bottlenecks or downtime
- Cost Efficiency: Centralizing data storage in data centers allows organizations to achieve economies of scale, thereby reducing the costs associated with data management. These savings arise from reduced expenditures on hardware, maintenance, and energy. Data centers also implement technologies like data deduplication and compression to further enhance storage efficiency and lower costs
- Performance and Reliability: Utilizing high-speed storage networking technologies such as Fibre Channel, data centers process data efficiently and maintain redundant systems to offer high-performance and reliable storage solutions. This reliability is crucial for data center operations and services that depend on high-speed data retrieval and real-time processing, including big data analytics, artificial intelligence, and machine learning algorithms
- Compliance and Regulatory Requirements: Many industries are subject to strict regulations regarding data handling, privacy, and retention, such as the Health Insurance Portability and Accountability Act (HIPAA) in healthcare. Data centers support organizations in complying with these regulations by providing measures such as encryption for data at rest and in transit, implementing strong authentication mechanisms and role-based access controls (RBAC), and maintaining detailed audit trails of data access
Market Size of Data Center Storage
IDC reports indicate a significant growth in global data generation, with projections of an increase from 106 zettabytes (ZB) in 2022 to 291 zettabytes (ZB) by 2027, reflecting a compound annual growth rate (CAGR) of 22%.
However, it’s noteworthy that only a fraction of the generated data is actually stored. The expected expansion of storage capacity across different media types is predicted to reach an additional 17 zettabytes (ZB) by 2025, as per IDC’s findings.
On the financial side, Statista’s analysis of the global data center market’s “Storage” segment predicts revenue growth from $48.0 billion in 2023 to $62.6 billion by 2028, equating to a CAGR of 5.5%. Of this total, approximately 80% of the market is made up of on-premises enterprise storage, with the remaining 20% consisting of cloud storage.
Types of Data Center Storage
Data centers use various types of storage configurations, which are selected based on several factors, including the method by which stored data is accessed by servers – either directly or over a network.
Three main methods exist for servers in data centers to access storage devices:
- Direct-Attached Storage (DAS): In this method, the storage device is directly connected to the server. This provides dedicated storage to that server without involving a network
- Network-Attached Storage (NAS): NAS devices are connected to a standard TCP/IP Ethernet network, making the stored data accessible to multiple servers
- Storage Area Network (SAN): SAN represents a dedicated high-performance network specifically for storage, providing multiple servers with access to shared storage devices
DAS vs NAS vs SAN – Comparison of Key Features
|Direct-Attached Storage (DAS)
|Network-Attached Storage (NAS)
|Storage Area Network (SAN)
|Physically connected to a single server; no network
|Network-connected via a TCP/IP Ethernet network
|Dedicated high-speed network fabrics, like Fibre Channel
|SATA, SCSI, SAS
|NFS, SMB/CIFS, HTTP, FTP
|Fibre Channel, iSCSI, FCoE
|Limited scalability, as it’s directly attached to one device
|High scalability. Easy to add more devices to the network
|Very high scalability. Integrates many storage devices efficiently
|Fast, due to the direct connection, but limited by server’s own performance
|Weaker. Depends on network speed, resulting in higher latency due to network traffic
|Very high. Designed for high throughput and low latency
|High, uses host CPU/memory
|Low, operates independently
|Low, offloads storage processing to its own dedicated hardware and software
|Distance of Network
|Very short, within the same enclosure or room
|Moderate, within a building or campus
|Long, up to several miles, across buildings or even cities
|Simple, as it’s managed by the attached device
|Moderate. Managed through network protocols and interfaces
|Complex. Requires skilled IT personnel
|Limited, depends on the device’s own capabilities
|Often has built-in redundancy (e.g., RAID configurations)
|High. Multiple layers of redundancy and fault tolerance
|Small-scale, individual workstations with minimal data storage needs
|Small to medium-sized enterprises needing to share files across multiple users
|Large enterprises with significant data storage needs
|Low. Simple to setup with lower initial costs
|Moderate. Requires network infrastructure
|High. Requires specialized network components
Traditionally, servers in data centers stored information using direct-attached storage (DAS), which involved housing local disks within the server’s physical enclosure. However, this approach has evolved, with current server storage needs increasingly being met through storage area networks (SAN) and even cloud storage.
Large data centers, in particular, often rely on rack-mounted storage arrays accessible via a dedicated SAN. SANs are preferred for high-end storage applications due to their efficiency and smaller infrastructure footprint. Compared to DAS configurations, SANs use less rack space and floor space, consume less power, require less cooling, and simplify cabling, making them a more efficient and scalable solution for managing large volumes of data.
1. Direct-Attached Storage (DAS)
Direct-attached storage (DAS) is a digital storage system used in data centers, characterized by its physical connection to the server it supports, without a network connection standing between the server and the storage device. This system allows servers to be directly linked to storage devices, such as hard disk drives (HDDs) or solid-state drives (SSDs), using standard interface cables. The connection typically involves either enterprise-level HDDs and SSDs installed inside the server or external storage arrays housing multiple drives.
The communication between the server and its storage devices is managed by RAID (Redundant Array of Independent Disks), a set of protocols designed for data striping, data mirroring, and disk management. A RAID controller, positioned between the server and the storage disks, executes the RAID protocols. This setup enables the server to treat a collection of hard drives as a single, large drive.
DAS systems are connected to servers via protocols such as SATA, SCSI, or SAS and are directly managed by the host server, heavily depending on its processing power and memory.
Advantages of Direct-Attached Storage (DAS)
- Cost-Effectiveness and Simplicity: DAS is more affordable and easier to operate than networked data center storage solutions like NAS and SAN, as it requires minimal infrastructure, fewer components, and less management resources. Its price per gigabyte (GB) is low and continues to decrease, making it attractive to individuals and small-to-medium-sized businesses
- Performance: DAS provides high performance because it is directly connected to the server, offering faster data access and reduced latency compared to networked storage options, such as NAS and SAN, which can experience network congestion. Therefore, DAS is crucial for specific applications that demand high-speed data access, such as video editing and database-intensive tasks
Disadvantages of Direct-Attached Storage (DAS)
- Scalability Limitations: DAS is physically connected to a specific server, which limits its ability to expand easily beyond that singular connection. To scale up, additional storage devices would need to be connected to other servers, with scalability heavily dependent on the server’s capacity for expansion slots or external ports
- Reduced Accessibility and Sharing: Since DAS is directly connected to one server, it restricts remote data accessibility and sharing capabilities across a network or over the internet. This makes DAS less suitable for environments that require shared storage resources
2. Network-Attached Storage (NAS)
Network-attached storage (NAS) is a dedicated file-level storage device that is used in data centers. It facilitates data access for multiple users and client devices through TCP/IP Ethernet in local area networks (LANs). NAS systems typically use storage devices such as hard disk drives (HDDs) or solid-state drives (SSDs), organized into logical, redundant units known as RAID (Redundant Array of Independent Disks).
NAS devices are designed for simplicity and scalability, offering easy data storage, retrieval, and management. Unlike traditional file servers that may require complex setup and management, NAS devices enable direct data access by network users on a local area network (LAN) without the need for an intermediary application server.
Running on embedded system software, usually a streamlined microkernel operating system like an optimized version of Linux, NAS devices focus exclusively on managing file operations. Expanding storage capacity is straightforward, akin to adding a new server to the Ethernet network.
NAS enables cross-platform file sharing across various operating systems like Windows, macOS, and Linux, using protocols like NFS, SMB/CIFS, HTTP, and FTP for seamless file access and sharing.
Advantages of Network-Attached Storage (NAS)
- Ease of Access and Sharing: NAS simplifies network-based data handling by allowing multiple users and devices to access, edit, and share files simultaneously. This enhances collaboration and efficiency for organizations
- Simplified Management and Deployment: NAS systems are designed for easy setup and management, ideal for businesses with limited IT resources. They manage file locking to prevent data conflicts and ensure data integrity, allowing concurrent access and editing of files
Disadvantages of Network-Attached Storage (NAS)
- Limited Scalability: NAS systems face scalability limitations, as they are designed for small to medium-sized network environments. Expanding storage capacity significantly without deploying additional NAS units can be challenging. This limitation is due to the physical device size and network identifiers, which makes NAS less suitable for database applications
- Performance Constraints: NAS systems depend on shared network bandwidth, leading to potential performance bottlenecks in high data traffic environments or when processing large files. This can result in increased latency, slower data access and transfer rates, and lower reliability. Traffic from NAS devices competes with other data on an organization’s network, which can strain resources
3. Storage Area Network (SAN)
A storage area network (SAN) is a dedicated, high-speed network that connects servers to shared storage devices. Typically utilizing the Fibre Channel protocol, this network is designed exclusively for storage traffic, distinguishing it from the local area network (LAN) that manages general data communications.
Because SANs utilize a dedicated network, they enable the separation of processing functions from storage tasks, making SANs particularly well-suited for large corporate networks. In addition to Fibre Channel, other protocols used in SANs include iSCSI (Internet Small Computer Systems Interface) and Fibre Channel over Ethernet (FCoE).
SANs provide block-level access to storage within data centers, enabling servers to interact with storage devices – such as disk arrays and tape libraries – as if these were directly attached to the servers themselves. This configuration allows IT managers to run all storage as a unified entity, streamlining operations like backups and maintenance by offloading these tasks from the host servers, thus avoiding any impact on their processing capabilities.
A SAN includes more hardware components than DAS or NAS configurations. Below is a description of the hardware components commonly found in a SAN architecture:
- SAN Switches: SAN utilizes storage-specific Fibre Channel switches which act as the network’s core by connecting storage devices, such as RAID arrays or tape libraries, with the servers that need access to them
- Storage Arrays: Consist of multiple disk drives often organized into RAID configurations for redundancy and performance. These also include tape libraries, used mainly for long-term data storage and archival
- Host Bus Adapters (HBAs): These are installed in servers to connect them to the SAN network. They handle the input/output (I/O) processing and provide physical connectivity to the SAN, typically using the Fibre Channel protocol
- Interconnect Devices: Fibre Channel hubs, bridges, and routers help connect servers, storage arrays, and HBAs in the SAN
Advantages of Storage Area Network (SAN)
- High Performance: SANs offer high-speed connectivity, fast data transfer rates, and low latency, due to their use of Fibre Channel technology. Fibre Channel switches, for instance, can achieve speeds of 16, 32, 64, and 128 Gbps (Gigabits per second), facilitating rapid data transfer
- Scalability: SANs have the ability to add or remove storage devices without interrupting server operations. This allows for centralized management of large volumes of data. SAN architectures, particularly those based on switched fabric network topologies, can support thousands to millions of devices. For smaller organizations, the FC-AL (Fibre Channel-Arbitrated Loop) network topology accommodates up to 126 devices
Disadvantages of Storage Area Network (SAN)
- High Cost: SANs require a significant initial investment and ongoing maintenance costs, due to their complex infrastructure and the need for specialized hardware and software. This complexity also translates into higher training costs for IT personnel responsible for utilizing and managing SANs
- Complexity in Management: Setup and management of a SAN demands a comprehensive understanding of its components and architecture, which is intricate and vendor-specific. The various hardware elements, such as switches, hubs, and HBAs, requires specialized IT expertise for their configuration, integration with existing infrastructure, and the continuous management of these storage solutions
Advancements in Data Center Storage Technologies
Data center storage has traditionally relied on physical, hardware-based storage systems and server applications specific to particular devices. However, advancements in virtualization and cloud computing have transformed data center storage, enabling the abstraction of storage resources to offer more flexible and scalable storage solutions.
Storage virtualization in data centers is the process of abstracting physical storage from multiple, diverse, and independent storage devices across a network into a single, unified storage pool. This process allows for more effective management, allocation, and utilization of storage resources. It also significantly enhances both scalability and flexibility of these storage resources.
Storage virtualization is particularly beneficial in environments where many servers are connected to multiple storage arrays. In these environments, it simplifies the execution of tasks such as backup, archiving, and recovery, making them more straightforward for IT administrators. For example, storage virtualization enables automated storage tiering tailored to specific performance needs and data retention policies.
Types of Storage Virtualization
There are three main types of storage virtualization based on where the virtualization is implemented in the storage architecture: on the host (server), in the storage array, or in the network.
- Host-Based Virtualization: Pools server-attached storage as logical volumes using software on each server, enabling tasks like mirroring without burdening the network or storage subsystem
- Array-Based Virtualization: Centralizes storage virtualization within the storage hardware itself, managed via firmware without affecting SAN switches or hosts
- Network-Based Virtualization: Manages storage functions within the network layer, potentially causing bottlenecks if the server is overwhelmed by storage management demands
Cloud storage refers to the public cloud model where data is stored on remote servers and storage devices accessible via the internet, or “the cloud.” These servers and storage devices are configured, operated, maintained, and managed by cloud service providers (CSPs) like Amazon Web Services (AWS) and Microsoft Azure.
This service is offered as part of the Infrastructure as a Service (IaaS) model and includes specific types of offerings like Storage as a Service (STaaS) that provide data storage as a subscription.
Cloud storage services allow users to provision (and deprovision) storage instantaneously and pay only for the space they use, as well as potentially for data transfers out of the platform. Also, users can store and retrieve their data on demand from any location, enhancing accessibility and flexibility. The data stored in cloud storage is typically secured through encryption and multiple backups, ensuring reliability and data protection.
Cloud storage has become increasingly important as the proportion of corporate data stored in the cloud by organizations around the world has risen from 30% in 2015 to over 60% today.
Private Cloud Storage
Private cloud storage allocates dedicated storage resources within a secure, internal network, usually managed in an on-premises data center by the organization itself. This setup provides exclusive access, control, and customization of storage infrastructure to meet specific organizational needs and security requirements.
Additionally, an organization’s on-premises storage infrastructure, including servers and storage units, can be set up to replicate or back up data to the public cloud. This flexibility allows for the creation of hybrid cloud storage solutions, combining the benefits of both private and public cloud storage.
Next-Generation Storage Solutions and Technologies
Several innovative solutions and technologies are emerging in data center storage to address the growing demands for efficiency, scalability, and performance. These include:
- All-Flash Arrays: High-speed storage systems that use solid-state drives (SSDs) instead of traditional spinning hard disk drives (HDDs), offering superior performance and lower latency. Additionally, the rising adoption of storage protocols specifically designed for SSDs, like NVMe (non-volatile memory express) and NVMe-OF (NVMe over Fabric), are further improving the performance, reducing the latency, and increasing the throughput of all-flash arrays in data centers
- Scale-Out File Systems: A storage architecture that allows for the horizontal scaling of storage capacity and performance by adding more nodes, supporting flexibility and ease of expansion
- Object Platforms: Storage solutions designed for managing large amounts of unstructured data, using a flat namespace and unique identifiers for data retrieval
- Hyper-Converged Infrastructure (HCI): An integrated system combining storage, computing, and networking into a single framework, simplifying management and enhancing scalability
- Software-Defined Storage (SDS): An approach where software manages and abstracts underlying storage resources, offering flexibility and efficiency through policy-based management. SDS technology has been adopted by several hyperscale companies such as Meta Platforms (Facebook), Google, and Amazon
- Heat-Assisted Magnetic Recording (HAMR): A data storage technology that uses localized heating to increase the magnetic recording density, enabling higher capacity hard disk drives (HDDs) to meet the growing storage requirements of modern data centers
Capacity of Data Center Storage
Data center storage capacity is the total amount of digital information that a data center can store. This capacity, often measured in terabytes (TB) and petabytes (PB), is determined by the cumulative size of all storage devices within the data center, including hard disk drives (HDDs), solid-state drives (SSDs), and tape drives.
The storage capacity of data centers can be understood on a scale that ranges from individual devices to the entire facility. Here’s a breakdown illustrating the hierarchical organization of data storage and its capacity at various levels:
- Individual Storage Devices: Within the data center, individual HDDs, SSDs, and tape drives typically offer storage capacities ranging from 1 TB to more than 30 TB each
- Data Hall: These storage devices are organized into racks and cabinets, which are then arranged in rows within a data hall. A single data hall, with its multiple rows of storage device-filled racks, can contain upwards of 1,000 terabytes (or 1 petabyte) of storage capacity
- Data Center Facility: A mid-sized data center facility usually comprises several data halls (about 3 to 10), implying that the entire facility’s storage capacity could span multiple petabytes
- Large Enterprises: Major corporations operate numerous data centers worldwide – ranging from 10 to 50 or more. Therefore, the collective storage capacity across all these data centers can reach hundreds of petabytes, or even exabytes (thousands of petabytes). For context, TikTok and Spotify have been reported to store 470 petabytes and 460 petabytes of data with Google, respectively
Furthermore, the storage demands for artificial intelligence applications are notably significant. For instance, Phase 1 of Meta’s AI Research SuperCluster (RSC) data center includes a storage tier with 175 petabytes of bulk storage, 46 petabytes of cache storage, and 10 petabytes of network file system (NFS) storage.
Who Has the Largest Data Center Storage Capacity?
Determining the exact figures can be challenging, but Amazon Web Services (AWS) stands out as having the largest data center storage capacity globally. Its portfolio boasts an immense scale, with storage spanning hundreds of exabytes, possibly extending into zettabytes. This is distributed across AWS’ extensive footprint, which includes over 38 million square feet dedicated to its data centers.
Cost of Data Center Storage
The cost of data center storage is typically measured in dollars per gigabyte ($/GB) of capacity. This cost can significantly vary depending on multiple factors, such as the storage technology used (for instance, HDDs versus SSDs), the total storage capacity, performance requirements measured in Input/Output Operations Per Second (IOPS), and the purchasing company’s scale (notably, whether it’s a hyperscale data center operator or not).
Below are the typical cost ranges for various data center storage solutions:
- Hard Disk Drives (HDDs): $0.01 to $0.10 per GB
- Solid-State Drives (SSDs): $0.04 to $0.20 per GB
- Hybrid Flash Arrays (HFAs): $0.05 to $0.25 per GB, depending on the SSD to HDD ratio
It is crucial to recognize that these prices are for the raw storage capacity only. The total cost of ownership (TCO) for data center storage also includes factors like the data center infrastructure, power and cooling costs, maintenance, and any software or licensing fees required to manage the storage environment.
Data Center Storage Companies
Data center storage companies encompass a variety of companies, including cloud service providers (CSPs), colocation data center operators, storage hardware providers, storage original equipment manufacturers (OEMs), and virtualization software providers.
Cloud Service Providers
The leading cloud service providers (CSPs) offer a diverse choice of bulk storage services. These services range from local solid-state drives (SSDs) attached to virtual machines (VMs), to zonal and regional persistent disks, and to storage buckets, which provide object storage services. Below are examples of common object storage solutions offered by the major CSPs:
- Amazon Web Services (AWS): Amazon Simple Storage Service (S3) is a scalable, high-speed, web-based cloud storage service designed for online backup and archiving of data and application programs
- Microsoft Azure: Azure Blob Storage is a massively scalable object storage solution for unstructured data offering high availability, security, and performance
- Google Cloud Platform (GCP): Google Cloud Storage is a unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving
- Oracle Cloud: Oracle Cloud Infrastructure Object Storage is a durable storage service that provides unlimited capacity for storing and accessing any type of data
- IBM Cloud: IBM Cloud Object Storage is a flexible, scalable, and simple storage solution that is designed for high durability, resiliency, and security
Colocation Data Center Operators
In a colocation data center environment, customers rent space for their servers and storage infrastructure within a third-party facility, operated by companies like Equinix and Digital Realty. These colocation providers are responsible for maintaining the building, ensuring environmental controls (like power and cooling), and providing physical security and network connectivity.
The customer, on the other hand, is responsible for purchasing their own storage hardware, as well as the installation, configuration, maintenance, and management of their storage equipment within the rented space.
Storage Hardware Providers
The largest data center storage hardware providers in the world are Dell EMC, Hewlett Packard Enterprise (HPE), NetApp, Hitachi, Pure Storage, IBM, and Huawei. These companies specialize in producing and supplying a wide range of data storage solutions and systems to meet the needs of enterprise customers.
Storage Original Equipment Manufacturers
Globally, the leading storage original equipment manufacturers (OEMs) are Seagate, Western Digital, and Toshiba. These companies design, manufacture, and supply data center storage components such as hard disk drives (HDDs) and solid-state drives (SSDs).
Virtualization Software Providers
Virtualization software providers for data center storage include:
- VMware: Offers software-defined storage (SDS) solutions for data centers through products like vSAN and vSphere Storage
- Nutanix: Offers a hyper-converged infrastructure (HCI) solution that combines computing, storage, and networking into a single system. It includes the Acropolis Operating System (AOS) software