Buyer beware of the “one giant switch” data center network model

The IT world has been buzzing with Cisco’s recent announcements about its entry into the server market and its data center strategy. Here are some of my thoughts on this strategy and what it could mean for your own data center plans, both now and in the future.
The implications
In looking at Cisco’s data center strategy, most people have focused on the addition of servers as the big news. While that is certainly significant, largely overlooked are the broader implications of Cisco’s data center “world view” from a technical standpoint.
Cisco is a networking company through and through, so it’s not surprising that they view the world of IT from that perspective. That viewpoint might sound great if you’re only a networking guy, but let’s take a closer look at what such a strongly network-centric strategy implies for real-world data centers.
In a Cisco data center, all intelligence is centralized in the network core. A networking person (or team) manages all the traffic from a single point of control. Everything beyond the core is dumb, and all traffic flows through the middle of the network. Again, to a networking person that might sound good, but it warrants a closer look.
In essence, the Cisco strategy is to view the entire data center as one giant switch, meaning everything is ultimately managed like a switch port. There are more than 1400 commands in the Nexus CLI – this won’t be easy, folks. The approach requires investing in an extremely large, complex, and expensive networking system at the middle of the data center. It raises some pretty serious issues, including:
- Performance. In Cisco’s one-giant-switch model, all traffic must travel over a physical wire to a physical switch for every operation – creating a configuration that looks something like “line cards on a rope.” Consequently, traffic from one virtual server must traverse the network, make an elaborate “hairpin turn” within the physical switch, and then traverse the network again before it can reach another virtual server on the same machine – and return traffic (or a “response” from the second virtual machine) must do the same.
Each of these packet traversals accounts for multiple interrupts and data copies for your multi-core CPU. I hope you weren’t expecting to use all of that CPU for applications. It’s easy to see how performance suffers with such convoluted routing required for each and every process.
- Manageability. The objective of the one giant switch model is to reduce the number of things to manage, and from the networking perspective there is that potential. Everything is managed now from one giant complex switch requiring your CCIE to operate. However this does not take into account the roles and responsibilities of different centers of expertise within the data center.
With the world view that sees the entire data center as one giant switch, manageability is stripped from everyone except the networking experts. As a result, your server administrators will need to become highly advanced, certified networking engineers.
- Security. In a Cisco data center, security happens at the core of the network – making it easier to coordinate, but requiring the network traffic to be tagged with a new proprietary and unprotected frame format in order to force it into the core.
The analogy I like to use is of a tall office tower with a security force hired to protect sensitive operations taking place in a room on the top floor. Imagine you choose to post all the members of your security team together, standing outside the room on the top floor.
For this to work, potential adversaries must enter the building and be directed to the top floor to be greeted by security. It seems pretty easy for them to gain access to the rest of the building before the security checks. Now, imagine instead that you posted some of your security guys at the ground floor entrance, as well as some stationed outside the top-floor room. Seems like a more comprehensive and effective security approach, doesn’t it?
- Automation. Automation is heralded as one of the most powerful tools for reducing costs and boosting efficiency and performance in a data center. And rightly so. Properly implemented, data center automation is like going from a laborious, one-product-at-a-time manufacturing process to a fast, modern assembly line.
The key is, what do you automate? And at what level of abstraction is your automation able to operate? In a data center where all the intelligence is centralized at the network core, you are forced to rely on the finest level of granularity for your automation: the port. Automating solely at that level won’t deliver much advantage.
Think of it like this: If you are in charge of constructing a building, and you have the choice of automating the hammering of each nail or the placement of entire walls, which do you think will help you finish the building faster? Higher-level building blocks are needed to make automation effective, and this goes well beyond the abstraction of the network port.
- Scalability. Perhaps one of the most important hidden dangers of a centralized, network-centric, one-giant-switch model emerges if you want to scale your data center. Advocates of the one giant switch model don’t tell you that you need to plan and pay for all the ports you think you’ll ever need, up front, whether you end up using them or not. Data centers grow by adding compute and storage resources to the edge of the network. You better have made sure you paid for a giant switch with enough virtual ports and packet processing available to allow this growth.
Say you want to bring up a new rack of servers, with 200 ports in the rack. In Cisco’s model, all 200 ports will need to be instantiated on that one giant central switch, including configurations, memory, ACLs, packet processing, etc. At the time you buy the expensive giant switch – not later, when you actually want to bring up those 200 ports – you’ll need to pay for and install the full potential capacity.
From a business perspective, this inability to scale not only adds tremendously to the cost of your data center IT, it also threatens to make it more difficult to respond flexibly to changing conditions, such as serving a new category of customers, or maneuvering against a new competitor, or expanding into new markets, or adding more branch offices. Scaling bandwidth at the edge of the network simply makes more sense.
- Reliability. Here’s another major problem with the world view that sees the data center as one giant switch: It collapses your failure domains into one. If you strive for high availability (and what data center doesn’t strive for that?), a single point of failure is a disaster waiting to happen. Sure, you can buy two of the giant switches, but remember from the scalability discussion above, you are going to have to pay upfront for two that are each big enough to anticipate your growth.
Even if the giant-switch data center system can guarantee ‘six nines’ uptime, it’s still too much risk. With the physical topology restriction, someone pulling a cable by accident can bring down large chunks of the data center in one move. This line-cards-on-a-rope topology also makes it too easy for the wrong thing to get plugged in; for two ends of the link to disagree on protocol or firmware revisions; or for rogue elements to deliberately try to attack via stacking links. If the link to the remote line card is disrupted, the remote set of virtualized interfaces is useless. They don’t operate independently.
Even if the wires aren’t crossed, with the one-giant-switch model a network operator at the CLI can wipe out racks of servers with a simple misconfiguration of one of those cryptic 1400 commands. Many failures are the result of misconfigurations, and now the heart of the data center is controlled from one of the most complex devices in the system. A proper division of control between domain experts within the data center will lead to higher reliability.
- Cost. This is where the disadvantages of Cisco’s one-giant-switch data center world view become painfully clear: the cost of original purchase, the cost of scaling, the cost of ensuring high availability, the cost of security (and the risks of not securing your data center at the edge), the cost of management, and other costs we haven’t even touched on, including support costs, maintenance costs, and licensing costs.
In every category I’ve examined, it’s simply more expensive to buy, provision, manage, and maintain a data center that views the data center as one giant switch, and where everything looks like a switch network port.
The alternative
What, then, should a data center look like?
I think an ideal data center model would have a simple, high-performance, highly available core, as well as intelligence and automation at the network edge. It would acknowledge the various roles and skills needed to manage a real-world data center, leaving visibility and control in the hands of the experts in each area. It would be truly manageable, it would scale easily, it would automate at a level that delivers significant benefits to the various domain experts, it would offer multi-layered comprehensive security, and it would deliver great performance while keeping costs down.
If you think this approach sounds vaguely familiar, it should. At HP ProCurve, we’ve been talking about the fundamentals of our Adaptive Edge Architecture (AEA) for many years now. With the AEA – and the Adaptive Networks vision based on the AEA – policy control is managed from the center of the network, while policy enforcement occurs at the edge, at the point where users and devices attach.
Extended to the data center environment, a distributed-intelligence approach such as the AEA offers an alternative to the one-giant-switch view of the data center. In coming months, I will outline in greater detail HP’s and HP ProCurve’s vision for the next-generation data center.
In the meantime, if you’d like to learn more about the data center solutions we have in place today, please visit the following:
Paul Congdon is Chief Technology Officer of HP ProCurve, as well as an HP Fellow.