Categories: Infrastructure
Overview
I’ve recently seen a number of articles on the internet regarding products related to hyperconvergence - but after reading them, I still had no idea what hyperconvergence was.
This article from The Register is a good starting-point, but I still failed to grasp the concept completely. The following article is the result of a little further research..
Note: Hyperconvergence is also called Hyperconverged Infrastructure (aka HCI) or Software Defined Data Centers (aka SDDC).
AIUI, hyperconvergence is basically about bringing core GCP/AWS/Azure concepts into the on-premise datacenter. A hyperconverged platform provides hardware and software to support APIs for doing things like:
- setting up a virtual network
- setting up load-balancers
- allocating a block storage device
- allocating a VM
The platform also automates tasks such as:
- doing backups
- deduplicating data
Often, these are commercial products sold as “preconfigured racks” containing hardware and software to achieve the above goals.
An HCI unit can also be seen as “a SAN which can run VMs too”.
Products
Companies/products in this area include:
- Nutanix
- Simplivity
- Dell vmware vsan
- emv vxrail/vxrack
- Cisco springpath hyperflex
- Maxta
- DataCore
- Atlantis
- Pivot3
- Scale Computing
The Problem
Traditionally, a data-center is built from three separate components:
- a bunch of routers
- a set of racks of servers providing CPU resources
- and a separate set of racks providing storage (storage-area-network aka SAN)
However buying and configuring such systems is complex, as is managing them long-term. This is particularly difficult for smaller companies, and for “edge computing” systems where computing resources are deployed at individual company branch offices.
There are companies that offer “prebuilt racks” that combine all of the above; you just buy a rack (various sizes available) and connect it to power. However that still leaves the problem of how to deploy software onto this system; running a single OS on each physical server and installing applications directly onto that OS is the traditional approach - but the issues with doing that are well known. A more modern approach is to run a hypervisor on the hardware instead, and then deploy software into multiple VMs, where each application is nicely isolated from the others - no version conflicts, etc.
However creating/managing virtual machines, their associated virtual disks, and their virtual networks, is complex. Ideally there would be a nice user-friendly management interface for all this - particularly for smaller companies and edge-computing scenarios. This is the sort of thing that public clouds provide - and which converged/hyperconverged solutions aim to bring to on-premise solutions.
The growing popularity of running virtual machines on the CPU servers is also causing problems for SANs. Each physical server is allocated a virtual disk on the SAN, and the SAN can optimise sequential access patterns to a virtual disk - ie an application running on a physical server which makes efforts to perform io-friendly sequential access is recognised as such by the SAN. However when that physical server is running multiple VMs, they still share a single virtual disk on the SAN. And when multiple applications in different VMs on the same server are performing IO then the SAN sees what is effectively random IO operations, and can no longer optimise streaming access.
Similarly, When using a SAN for storage, the logical block store associated with a specific physical host (mounted on that host) can have various performance, replication and backup policies associated with it. However when that physical host runs multiple VMs, this approach is not effective - different VMs have different requirements.
The Hyperconverged Promise
Sellers of “hyperconverged” solutions claim to solve the above problems. They generally sell racks of computers, pre-populated with routers, compute resources and storage resources. They also preinstall a hypervisor on each node, and provide management software for the whole physical system.
Their management software allows users to allocate virtual servers, and deploy user-specific software on them. Their hypervisor-based infrastructure can then allocate and manage the VMs, networks, etc. - like a public cloud does.
In addition, because the hypervisors are aware of the operations occurring on each VM, they can integrate with storage to solve the storage-related problems mentioned above:
- streaming-vs-random-io per VM
- and replication/backup policies per VM
The idea is that HCI systems are not only easier/more flexible for users, but also requires fewer sysadmins as changes required by users are applied by the HCI management software rather than being change-requests that staff apply manually (potentially requiring staff who are experts in each technical area).
The term “edge computing” means that individual company branches have their own IT environment, in addition to a central company-wide data-center or public-cloud environment. Companies selling HCI products promote their “preconfigured racks” as good solutions for deploying into “edge environments” where there may be complex software but no dedicated operations staff.
Further benefits
A suitable datacenter software infrastructure makes it easier to deal with heterogenous hardware; as long as each manufacturer is supported by the infrastructure, the devices can be managed via a single tool. Direct management of such devices requires admin staff trained on the interfaces provided by each manufacturer.
A SDDC also provides failover at software level - VMs are restarted on a different host, storage fails over to a replica, routing is reconfigured on the fly. The amount of hardware needed is N+F
where N
is the number of nodes needed for service, and F
is the number of failed nodes that is acceptable. A manually-defined high availability solution often uses N*2
instead, as “on the fly” reconfiguration is too hard.
A SDDC make scaling easier - all resources are a single pool, rather than having to dedicate hardware to a specific task. In systems that are large enough, overprovisioning is also easier to manage - resources can be “time shared” between users.
Some solutions don’t have a distinct SAN, ie don’t have physical servers dedicated to storage. Instead, each node provides both CPU and storage; the hypervisor provides a “virtual SAN” which is distributed across all nodes. This gives the opportunity to provide storage for VMs on a specific physical server from disks attached to that same server - thus avoiding network traffic and greatly improving IO bandwidth. Interestingly, this is similar to the approach that “big data” solutions such as Hadoop/Spark are based on - though they implement this in user-space, not at the hypervisor level.
Administration policies are clearer and more centralized - defined in one place rather than defined in each device.
In a standard datacenter:
- customer buys racks, servers, routers, etc. from multiple suppliers and integrates them
- one physical server per app
With basic integrated systems:
- customer buys racks prepopulated with servers and routers (an “appliance”)
- hypervisors usually preinstalled on servers
- customer buys SAN separately
With a converged datacenter:
- prepopulated racks also include storage (external SAN not needed)
- rack options are limited: only specific cpu:storage:network rations are available
- one VM per app, manually allocated resources, different hardware suppliers with own admin interfaces
- deduplication, backups, etc. not part of delivered solution
With a hyperconverged datacenter:
- one software interface to manage entire cluster
- system looks like one integrated pool rather than individual racks
- each rack added to the datacenter extends the “global pool of resources”
- transparent mix of SSD and rotating storage
- storage taken preferentially from the same rack as the using VM
Notes and Open Questions
The general concept sounds good to me - having got used to public cloud environments, having to manage software deployments in on-premise datacenters now feels clumsy. Of course, the world is moving to even more abstract container-based layers such as Kubernetes now, rather than managing software at VM level - but there is still lots of software in the corporate world that expects to be installed “on a server” (whether physical or VM).
But Openstack and Apache CloudStack have been doing this for years. Are they “hyperconverged”?
VMWare has also been doing something similar for a long time; is VMWARE a SDDC provider too?
Hardware Components
The following products are not complete hyperconverged solutions, but offer some basic building blocks.
Amazon Web Services uses their Nitro architecture to offer VMs to end users. This combination of hardware and software has some aspects of “hyperconvergence” - though it is effectively an internal implementation detail of AWS.
Pensado is a startup with links to Cisco and Hewlett-Packard which sells hardware cards and matching software that makes it possible to build “local clouds”. Customers are likely to be companies offering cloud-like environments to external or purely internal customers. The cards plug into standard servers, and “offload” a lot of cloud-related concerns such as software-defined-networking (with related security rules), encryption, load-balancing, and monitoring. The hosts on which the cards are installed see these as standard network cards at the hardware level; presumably there is a configuration protocol that these cards accept and handle directly without involving the host.
References
- The Register: Hyperconverted Infrastructure
- Hyperconverged Infrastructure for Dummies / HPE SimpliVity Special Edition - to quote from the book itself: “this book is written primarily for IT executives and managers such as Chief Information Officers (CIOs), Chief Technology Officers (CTOs), IT directors, and technical managers.”