Pushing the limits of virtual networking

Recent developments in virtual networking open a world of possibilities but don't quite deliver -- yet

Recently, I was faced with an interesting virtual networking problem: how to allow a large virtualization environment to be failed over to a recovery data center and thoroughly tested, without impacting the production network. As is often the case, my quest for an elegant solution took me down the rabbit hole and led to consideration of several new virtualization networking technologies. In the end, I didn't achieve exactly what I set out to, but I did catch a glimpse of what the future might hold.

The challenge

In this case, a large VMware vSphere-based virtualization infrastructure had been configured to fail over to a duplicate secondary data center elsewhere on the same campus, the entirety automated through the use of VMware's Site Recovery Manager (SRM).

[ Server virtualization is far from a simple task. InfoWorld's expert contributors show you how to get it right in this 24-page "Server Virtualization Deep Dive" PDF guide. | Sign up for InfoWorld's Data Explosion newsletter to help deal with growing volumes of data. ]

Since both data centers had visibility to the same mutually redundant core route switches and associated layer two (L2) networks, an actual failover would be relatively seamless from a networking perspective, involving none of the dynamic routing or, worse, re-addressing that might be required in a typical geographic site failover. If the production data center were to become unavailable for whatever reason, SRM could simply bring the virtual machines back up at the secondary data center in virtual networks configured identically to those in the primary data center, and life would continue without any real changes to the network.

However, providing the means to thoroughly test this data center failover capability was a different matter. For a variety of reasons, the virtual machines running in the environment were divided into a fairly large number of L2 network segments, each consisting of a separate VLAN on the network and represented by a separate virtual port group on the vSphere hosts (in this case switched by Cisco's Nexus 1000V virtual switch).

Fortunately, SRM treats a real site recovery and a test recovery differently, allowing the administrator to perform a test without interrupting ongoing SAN replication (by creating volume clones rather than promoting replica volumes) and by allowing the administrator to specify separate virtual networks to be used. With this feature, I could create a "shadow" set of virtual networks in which the virtual machines could be started during a test, while the "real" virtual machines continued to operate and serve clients in the original networks.

Unfortunately, providing that duplicated set of virtual networks while allowing all recovered virtual machines to speak to each other and without requiring substantial changes to the network turned out to be a tall order without a perfect solution.

Solution No. 1: VLANs and VRF

The first thing I considered was building a shadow set of VLANs that would duplicate the existing, production VLANs. Configuring virtual portgroups based on these VLANs would allow virtual machines running on different members of the secondary virtualization cluster to talk to each other regardless of whether they booted up on the same host. The downside is that a separate set of VLANs and their associated configuration within the virtualization environment would need to be maintained -- an extra administrative headache, but not an enormous one.

1 2 3 Page 1
Page 1 of 3