Long Distance vMotion not New!
A few weeks back I was listening to a discussion about the concept of vMotion over long distance as if it was a new thing which I instantly disagreed with because I personally had been describing the concept for students when I was a VMware instructor 3 years ago. I knew it was possible to do by:
A. Having the right infrastructure in place – to simplify a fast bridged network
B. Using storage virtualisation solutions like DataCore.
DataCore has had for a long time the ability to synchronously mirror an active/active virtualised volume, which means 2 sets of ESX servers can see the same volume in a RW state as if the volume was local to that ESX server. This feature was nothing new for DataCore so in fact the idea of long distance vMotion has been achievable from the day vMotion went GA.
There were some caveats though, one being it was only feasible to achieve this when latency and bandwidth was not an issue and secondly technically at the time it wasn’t supported by VMware. At the time I tested this theory in the lab but didn’t record my research, so as an alternative I’ve asked a friend within DataCore to describe a real world case study thus proving the point. This concept wasn’t manufactured by DataCore but is a side effect of combining the 2 technologies. Following is a real world implementation of a stretched cluster as DataCore refer to it as:
Mike Beevor (DataCore) says.
Let’s start by looking at a real life application of SANSymphony. IoMart, a highly reputable hosting company offering 100% uptime, approached DataCore with the vision of being able to provide High Availability in a potentially heterogeneous storage environment to its hosted customers, DataCore were more than happy to oblige. The solution was delivered using industry standard software, based on readily available server and storage hardware. What followed was a solution that provided not only the HA that they were looking for, but managed to deliver it on scales more akin to DR!IoMart has 5 datacentre’s located around the UK, but the two that we are particularly interested in are in The City and in Maidenhead, a distance of approximately 20 miles, which I think that you’ll agree, is more than suitable to satisfy most company’s DR strategy. The environment created was done using standard x86 hardware, highly specified due to being of a hosting nature. There is 128GB RAM for caching, 4 Quad Core Processors and 8GB HBA’s for connectivity. The disk behind the server was also considered as low end commodity disk by the major manufacturer that was chosen. Naturally we can’t divulge the full details of the environment, we wouldn’t want to give away IoMart’s competitive advantage, but we can say that the cost was less than a 1/3 of the equivalent software and hardware from a well known manufacturer. DataCore SANSymphony was used to virtualise and manage the environment and the DataCentres have a fibre link between them and a DataCore SANSymphony server in each location.What we have achieved using this configuration is Synchronous replication, between the 2 sites, over a distance of 21 miles. This extends, not only through the storage layer replication, but also the Application server layer, in this instance ESX. Now, site to site replication is nothing new, but where this gets very interesting is that the failover is seamless and automated… at the storage layer, but also the failback is automated and seamless. Now, because we are grid based storage, rather than cluster, we are at no danger of a quorum instance, making this an extremely efficient and effective solution. It is also worth noting that the performance metrics within a grid present a linear performance growth model, as each node is able to dedicate its full computing power to the performance of the system rather than having to aggregate performance throughout a cluster and also dedicate some power to the arbiter within the cluster.
Essentially, we have created a Stretch Virtual Storage Grid, or SVSG as you will hear it referred to in future. The benefits of this type of infrastructure is that you can distribute the environment throughout several locations and ensure that unless a major city be taken out (possibly by Godzilla) then you have a fully distributed HA model on DR geographies.
DataCore’s Synchronous replication functionality operates on a forced cache coherency model, and is based on a grid architecture, replicating the i/o block between the cache on each DataCore server before sending the acknowledgement to the application server and committing the data to disk. By doing it this way, we obviate the problems associated with clustered storage, and allow a greater degree of performance and flexibility.
I didn’t want this to come across as a dig at Cisco as it’s Cisco who are currently branding this a new infrastructure application, far from it but if some tells me this is a new concept then I have to disagree. What I will say though it also looks like Cisco also doing a good job of providing a complete solution by providing tools to achieve things like extended VLANs as well as a good IO virtualisation platform and gone for standard hardware and protocols to drive it. Importantly, Cisco has a good hook into VMware in more than one way 😉 . So you can see all the right tools to architect this concept. As for Cisco playing a big part in a cloud that was always going to be a given and you can see how these kind of solutions are going to help.
Leave a Reply
You must be logged in to post a comment.