Data center redundancy is considered a critical best practice for IT infrastructure, directly supporting the reliability and availability of these computing facilities. Today, most enterprise and cloud service provider data centers utilize redundant components and systems to avoid single points of failure.

Data center redundancy refers to the practice of deploying duplicate critical components and infrastructure to allow for continuous operation in the event of failures. It involves having backup systems for power, cooling, networking, storage, and compute to minimize downtime and data loss.

Dgtl Infra provides a comprehensive overview of data center redundancy, exploring its significance and the various components that contribute to a robust infrastructure. We also examine the different redundancy configurations, including N, N+1, N+2, 2N, and 2N+1, as well as the four tiers of data center redundancy, ranging from basic capacity to fault tolerant systems. Additionally, we provide insights into the factors that you should consider when choosing the appropriate level of redundancy for your organization’s needs.

What is Data Center Redundancy?

Data center redundancy refers to the practice of duplicating critical components and systems within a data center to provide continuous operation in the event of failures or disruptions. This includes redundant power supplies, cooling systems, backup generators, network connections, and data storage, all working together to minimize downtime and maintain uninterrupted service. By implementing redundancy, data center operators aim to provide high availability and reliability to their end users.

Data Center Redundancy with Illuminated Redundant Racks of Servers with Lights and Reflections on Floor

Importance of Data Center Redundancy

Data center redundancy is crucial for ensuring the continuous operation and availability of critical IT infrastructure and services. Here are the major reasons why data center redundancy is important:

  1. Business Continuity and Disaster Recovery: Redundancy ensures that businesses can continue their operations without interruption, even in the event of hardware failures and power outages. It also enables quick recovery from natural disasters, cyber-attacks, or human errors. According to Gartner, the average cost of IT downtime is $5,600 per minute. For larger enterprises, this can go up to $540,000 per hour. Redundancy helps minimize downtime, thus saving companies from these substantial financial losses
  2. High Availability: Redundancy enables high availability of applications and services, typically 99.99% (known as “four nines”) or 99.999% (“five nines”). With multiple redundant components, if one fails, another can take over seamlessly, ensuring uninterrupted access for users
  3. Data Protection: Redundant storage systems, such as RAID (Redundant Array of Independent Disks), protect data from loss due to disk failures. In a study by the University of Texas, it was found that 94% of companies suffering from a catastrophic data loss do not survive – 43% never reopen and 51% close within two years. Redundancy helps replicate data across multiple drives, preventing such catastrophic data losses
  4. Improved Performance: Redundancy can enhance system performance by distributing workloads across multiple components. This load balancing helps prevent bottlenecks and ensures optimal resource utilization. For example, if a data center employs redundant network switches, traffic can be rerouted immediately if one switch fails, preventing any noticeable slowdown for users
  5. Maintenance and Upgrades: With redundant systems, maintenance tasks or upgrades can be performed without causing downtime. One set of components can be taken offline for maintenance while the redundant set continues to serve users. This allows data center operators to patch systems, replace faulty hardware, or install software updates without interrupting critical business operations
  6. Compliance and Regulatory Requirements: Industries such as healthcare, financial services, and government have strict regulations and standards that mandate data center redundancy. Compliance with these requirements helps organizations avoid legal and financial penalties while maintaining customer trust. In the healthcare industry, HIPAA regulations require redundant systems to ensure the continuous availability and protection of patient data

Infrastructure Components for Data Center Redundancy

The main infrastructure components used for data center redundancy are:

Infrastructure Components for Data Center Redundancy in Side-by-Side Diagram of Power Cooling Network Storage Server
  • Power Infrastructure: Uninterruptible Power Supply (UPS) units, backup generators, redundant power distribution units (PDUs), dual power feeds from separate utility substations
  • Cooling Systems: Redundant air conditioning units (CRAC or CRAH), chilled water systems with backup chillers, multiple cooling towers, redundant pumps and piping
  • Network Connectivity: Redundant network switches and routers, multiple internet service providers (ISPs), diverse fiber optic paths, redundant firewalls and load balancers
  • Storage Systems: Redundant storage arrays (e.g., RAID configurations), dual storage controllers, redundant SAN (Storage Area Network) switches
  • Servers and Computing: Redundant servers and server clusters, virtualization, containerization, automated failover and load balancing mechanisms

Data Center Redundancy Configurations

Data center redundancy configurations or topologies refer to the design and implementation of backup systems and components within a data center to ensure continuous operation and minimize downtime in the event of failures or disruptions. These redundancy architectures include N, N+1, N+2, 2N, and 2N+1. The goal is to achieve high availability and reliability of the data center’s infrastructure and services.

Glowing Circuit Board with Bright Traces of Power Paths Flowing Data Digital Technology Communications

Defining “N” in Redundancy

“N” refers to the baseline level of capacity or components needed to run a data center at full load. For example, if a data center requires three UPS modules to operate at full capacity, “N” would equal three.

“N” represents the minimum requirements for a fully functional system without any redundancy, making it susceptible to single points of failure. This means that any disruption, such as hardware issues, maintenance, or outages, would render the data center inoperable until the problem is resolved.

When discussing redundancy, the “N” is often followed by a plus sign and a number, indicating the level of redundancy built into the system.

Standard Redundancy Configurations

There are several common redundancy configurations used in data centers:

ConfigurationN+1N+22N2N+1
Description1 additional component for redundancy2 additional components for redundancy2 separate, identical systems, providing full redundancy2 separate, identical systems, plus 1 spare component
Redundancy LevelModerateHighVery HighExtremely High
Failover CapacityHandles failure of 1 componentHandles failure of 2 componentsHandles failure of an entire systemHandles failure of an entire system plus 1 extra component
CostModerateHighVery HighExtremely High

Data centers can utilize different redundancy models for various components within the same facility. For instance, the UPS system might be configured as 2N, while the cooling system operates as N+1.

However, it is crucial to identify potential single points of failure that can compromise overall redundancy, such as the main switchgear, which is responsible for distributing power from the utility service or generators to the various electrical components.

N+1 Redundancy

N+1 is a data center redundancy configuration where there is one additional unit (+1) of a critical component or system on top of the minimum number required for normal operation (N). This extra unit serves as a backup in case one of the primary units fails, allowing the system to continue functioning without interruption.

In an N+1 redundant system, if any single component fails, the redundant component takes over its function, ensuring continuous operation. However, N+1 still presents a risk in the event of multiple simultaneous component failures.

The N+1 configuration is commonly used for Uninterruptible Power Supplies (UPS), cooling systems, backup generators, and network switches.

Example of N+1 Redundancy

Let’s consider a data center with a power requirement of 1 megawatt (MW). To achieve N+1 redundancy, the 1 MW load is served by five UPS (Uninterruptible Power Supply) modules, each rated at 250 kilowatts (kW). This means there is one additional or redundant 250 kW UPS module beyond the minimum four modules required to meet the 1 MW load.

N+1 Redundancy Diagram with Five 250 kW UPS Modules Connected to a Single 1 MW Load

The extra UPS module provides redundancy in case any single module fails. With N+1 redundancy, the data center can continue operating at full capacity, even if one of the five UPS modules goes offline, providing higher availability and uptime for the supported load.

N+2 Redundancy

N+2 is a data center redundancy configuration that provides high availability and fault tolerance by having two additional units (+2) of a critical component or system beyond the minimum required for the system to function normally. In this setup, “N” represents the number of components needed to run the system at full capacity, and “+2” indicates that there are two extra components available as backup.

Example of N+2 Redundancy

Let’s consider a data center with a power requirement of 1 megawatt (MW). To run at full capacity, the data center requires four power supply units (PSUs), each capable of providing 250 kilowatts (kW) of power.

In an N+2 configuration, ‘N’ represents the number of units needed to fully support the load, and ‘+2’ indicates there are two additional units for redundancy. So in this case, an N+2 redundancy configuration would need a total of 6 PSUs to be installed (4+2).

N+2 Redundancy Diagram of Six 250 kW Power Supply Units Connected to a Single 1 MW Load

In this N+2 redundant design:

  • If all PSUs are functioning properly, the 4 primary PSUs will handle the 1 MW load, and the 2 additional PSUs will serve as backup
  • If one PSU fails, the remaining 5 PSUs can still handle the full load without any interruption to the system
  • If two PSUs fail simultaneously, the remaining 4 PSUs can continue to run the system at full capacity

Overall, an N+2 redundancy configuration provides the data center with a high level of fault tolerance. The two additional 250 kW units act as backup in case any of the four primary units fail or need to be taken offline for servicing.

2N Redundancy

2N is a redundancy configuration where the data center has twice the amount of equipment and infrastructure needed to support its normal operations. This means that for every critical component, there is a fully redundant backup component that can take over in case of failure, ensuring continuous operation and minimizing the likelihood of downtime. Furthermore, 2N configurations also utilize two completely separate and independent power distribution paths.

By having duplicate components and distribution paths run concurrently, 2N configurations provide what is known as active-active redundancy, which is a key strategy for increased fault tolerance.

The following diagram depicts a 2N redundant power distribution system for a data center, with duplicate components on both the A and B sides providing two independent power distribution paths to the server:

Data Center Redundancy Illustration of a 2N Configuration Diagram with Dual Pathways for UPS Generators Switchgear Transformers
Source: Eaton.

One of the key advantages of a 2N architecture is the ability to perform maintenance on an entire set of components without disrupting normal operations.

Example of 2N Redundancy

Let’s consider a data center with a power requirement of 1 megawatt (MW). In a 2N configuration, the 1 MW load is served by two separate and independent 1 MW UPS modules. This setup provides full redundancy because both of the 1 MW UPS modules are capable of delivering the full 1 MW load on their own, if the other module fails.

2N Redundancy Diagram of Two 1 MW UPS Modules Connected to a Single 1 MW Load

The 2N configuration provides a high level of redundancy by having two fully independent modules, each sized to 100% of the load requirement. This guards against failure of any one UPS module taking the data center offline.

2N+1 Redundancy

2N+1 is a data center redundancy configuration that delivers high availability and minimizes downtime. In this setup, the data center has double the number of components (N) required to run the system (same as 2N), plus one additional component (+1) as a spare that can be used if any of the active components fail.

The 2N+1 arrangement allows the data center to continue functioning even if multiple components fail, as well as maintain N+1 redundancy in the event that the entire primary set of components goes down.

Example of 2N+1 Redundancy

Let’s consider a data center with a power requirement of 1 megawatt (MW). To run at full capacity, the data center requires two power supply units (PSUs), each capable of providing 500 kilowatts (kW) of power.

In a 2N+1 redundancy configuration, the setup would be as follows:

  • N = 2 (the number of PSUs required)
  • 2N = 4 (double the required number of PSUs)
  • +1 = 1 (one additional spare PSU)
2N+1 Redundancy Diagram of Five 500 kW Power Supply Units Connected to a Single 1 MW Load

Therefore, the data center would have a total of 5 PSUs (4 active and 1 spare). If up to 3 PSUs fail, the system can still operate at full capacity because it only needs 2 PSUs to function. The spare PSU can be used to replace any of the failed units, ensuring continuous operation.

Data Center Redundancy Tiers or Levels

Redundancy and data center tiers or levels are closely connected. The tier of a data center indicates its infrastructure capabilities, redundancy, and uptime guarantees. These levels are defined by the Uptime Institute, a global authority on data center standards, and are commonly known as the “Tier” system. There are four primary tiers:

LevelTier ITier IITier IIITier IV
RedundancyNoPartial; redundant componentsN+12N or 2N+1
Redundant Distribution PathsNoNoYes, but only one path active at a timeYes, all paths active simultaneously
Uptime Guarantee99.671%99.741%99.982%99.995%
Downtime per Year28.8 hours22 hours1.6 hours0.4 hours
Concurrently MaintainableNo; maintenance requires downtimeNo; maintenance requires downtimeYes, without taking data center offlineYes, without taking data center offline
CostModerateHighVery HighExtremely High

Tier I (Basic Capacity)

Tier I is the lowest level of redundancy as defined by the Uptime Institute’s data center tier classification system. Key characteristics include:

  • Redundancy: No redundancy for power, cooling, or network systems
  • Distribution Path: Single, non-redundant distribution path serving the IT equipment
  • Uptime: Expected uptime of 99.671% annually, equating to annual downtime of 28.8 hours
  • Disruption Impact: Susceptible to disruptions from both planned and unplanned activities
  • Maintenance: Partial shutdowns required for maintenance and repair work
  • Use Cases: Sufficient for small businesses with limited IT requirements

Tier II (Redundant Capacity Components)

Tier II classification builds upon the requirements of Tier I and adds redundant capacity components to the data center infrastructure. Key features include:

  • Redundancy: Partial N+1 redundancy focused on critical power and cooling components, such as Uninterruptible Power Supply (UPS) modules, Power Distribution Units (PDUs), chillers, CRAC/CRAH units, and pumps. However, Tier II does not require N+1 redundancy across all of its systems
  • Distribution Path: No redundancy in the distribution path. These data centers still have a single, non-redundant distribution path for power and cooling, which can be a single point of failure
  • Uptime: Expected uptime of 99.741% annually, which translates to annual downtime of 22 hours
  • Maintenance: Redundant components allow for planned maintenance without shutting down the entire data center. However, an unplanned outage or failure of the distribution path can still cause disruption
  • Use Cases: Suitable for businesses with less critical applications that can tolerate some downtime for maintenance or unplanned outages

Tier III (Concurrently Maintainable)

Tier III data centers provide a higher level of redundancy and fault tolerance compared to Tier I and Tier II data centers.

Uptime Institute Certified Tier III Facility Logo with Roman Numerals Sample
Source: Uptime Institute.

Key attributes of a Tier III data center include:

  • Redundancy: N+1 redundancy for power and cooling components, ensuring that if one component fails or needs maintenance, the redundant component can take over without disrupting operations
  • Concurrent Maintainability: Data center is served by multiple, independent power and cooling distribution paths, but only one path is active at a time. This allows for maintenance on any component without impacting the operation of the data center, meaning it is concurrently maintainable
  • Uptime: Designed to achieve a minimum of 99.982% uptime per year, which translates to no more than 1.6 hours of downtime annually
  • Maintenance: Planned maintenance activities can be performed on any component of the infrastructure without impacting data center operations. However, unplanned outages from unexpected equipment failures or human error may still cause disruptions
  • Use Cases: Appropriate for businesses that require high availability and can tolerate short, planned outages for maintenance, but do not have the most stringent requirements for continuous uptime

Tier IV (Fault Tolerant)

Tier IV represents the highest level of redundancy, availability, and resilience.

Uptime Institute Certified Tier IV Facility Logo with Roman Numerals Sample
Source: Uptime Institute.

Key characteristics of a Tier IV data center include:

  • Redundancy: 2N or 2N+1 redundancy for all critical components, meaning that every critical system, such as power, cooling, and networking, has at least two fully independent and redundant components, and potentially an additional backup component
  • Uptime: Designed to achieve a minimum of 99.995% uptime per year, which translates to a maximum of 0.4 hours of downtime annually
  • Concurrently Maintainable: Allows for any planned maintenance activity of power and cooling systems to take place without disrupting the operation of IT hardware located in the data center
  • Fault Tolerance: Data center can sustain at least one worst-case unplanned failure with no critical load impact, meaning it is fault tolerant. This ensures continuous operation even in the event of a failure
  • Power Supply: Minimum of two active power sources, typically utility and on-site generation, in addition to redundant backup power sources like UPS systems and generators
  • Cooling Redundancy: Systems are independently dual-powered, including chillers and cooling towers. The data center can maintain the required temperature and humidity levels even with a failure of one set of cooling equipment
  • Network Redundancy: Requires multiple active distribution paths that are simultaneously served by independent utilities. The network infrastructure must have a redundant configuration to ensure no single point of failure
  • Use Cases: Suitable for organizations with mission-critical applications that demand near-zero downtime, such as government agencies, military operations, and financial institutions that operate trading platforms or payment processing systems

READ MORE: Data Center Tiers – What’s the Difference Between 1, 2, 3, and 4?

Factors to Consider for Choosing Data Center Redundancy

When determining the appropriate data center redundancy configuration, tier, or level, businesses should consider several key factors:

  1. Business Criticality: Assess the importance of the applications and data hosted in the data center. Mission-critical systems require higher levels of redundancy to ensure minimal downtime
  2. Uptime Requirements: Consider the required uptime percentage for the systems hosted in the data center. Higher tier levels provide better uptime guarantees, such as Tier IV offering 99.995% uptime
  3. Budget: Higher redundancy levels come with increased costs for infrastructure, maintenance, and operations. Businesses must balance their budget with the required redundancy level
  4. Regulatory Compliance: Industries such as healthcare, financial services, and government, have strict regulations regarding data availability and disaster recovery. Ensure that the chosen redundancy level meets these requirements
  5. Maintenance and Upgrades: Consider the impact of maintenance activities and upgrades on the data center’s availability. Higher tier levels offer better fault tolerance during these events

Relationship between Redundancy and Tiers

As the data center tier level increases, the redundancy of critical components and distribution paths also increases. Higher redundancy provides better fault tolerance, minimizes the impact of planned maintenance, and reduces the likelihood of downtime due to equipment failures or disruptions.

Data Center Tier System Relationship with Redundancy Arrows in a  Diagram with Four Levels

On the other hand, the increased redundancy in higher data center tier levels comes with additional costs for infrastructure, maintenance, and operations. Organizations must balance their need for uptime and fault tolerance with the associated costs when choosing the appropriate tier level for their data centers.

Frequently Asked Questions

What is a Redundant Data Center?

A redundant data center is a facility that houses critical computing infrastructure and is designed to ensure continuous operation even in the event of failures or disruptions. It achieves this by incorporating redundant components, such as power supplies, cooling systems, generators, and network connections, which serve as backups in case the primary systems fail.

Redundant Data Center Glowing Interior Lights Server Racks Reflective Floor to Ceiling

Redundant data centers are crucial for organizations that require high availability and reliability for their IT services and applications.

What is the Difference Between Redundancy, Reliability, and Availability?

The key differences between redundancy, reliability, and availability in data centers are:

  • Redundancy focuses on duplicating equipment and systems to eliminate single points of failure
  • Reliability measures the consistency of a system’s performance over time and is essential for maintaining business continuity
  • Availability is the end result of having high reliability and appropriate redundancy. It is the amount of time the system is functioning and accessible to users annually, often quoted as a percentage (e.g., 99.999%)

Overall, redundancy and reliability contribute to availability, but availability also depends on factors beyond just the physical infrastructure, such as network connectivity and planned maintenance.

What is Uptime and Downtime in Data Center Redundancy Levels?

Uptime and downtime are inversely related; the higher the uptime, the lower the downtime, and vice versa. They are critical indicators of a datacenter facility’s performance and reliability. Maximizing uptime and minimizing downtime can be achieved through higher data center redundancy levels, which provide backup systems and failover mechanisms that can take over in the event of a failure. In turn, data centers can deliver superior availability and service quality for their customers.

Mary Zhang covers Data Centers for Dgtl Infra, including Equinix (NASDAQ: EQIX), Digital Realty (NYSE: DLR), CyrusOne, CoreSite Realty, QTS Realty, Switch Inc, Iron Mountain (NYSE: IRM), Cyxtera (NASDAQ: CYXT), and many more. Within Data Centers, Mary focuses on the sub-sectors of hyperscale, enterprise / colocation, cloud service providers, and edge computing. Mary has over 5 years of experience in research and writing for Data Centers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here