Hyper-converged seems to be all the rage at the moment. VMware's announcement of the EVO:RAIL has naturally got lots of tongues wagging. They are jumping into a market already well populated.
I was looking at the pricing/features and though it all looks a little expensive. Nutanix for example, will sell you the following:
- NX-1000: $80k+ (4 little nodes)
- NX-3000: $144k+ (4 big nodes)
- NX-6000: $120k+ (2 massive nodes)
I'd imagine that, once you include licensing, EVO:RAIL systems will come in similar. What do you get for your case:
- A COTS server (dual E5-26x0 Xeon, 64GB+ RAM)
- Networking (at least 2x 1Gb, but most are 10Gb)
- A few TB Storage
- Tend to rely on the customer to supply the network infrastructure, but I see no real problem with that. If I had all HP Procurves in my DC, I would pretty annoyed if I suddenly had a pair of Cisco's to worry about
- The while system is built on top of architectually identical building blocks
That last one does not necessarily mean that all the boxes are the same. It means there is no concept of a "storage node" or a "compute node". You just have "nodes" that all contain compute, storage, networking, whatever. You may have some that are bigger than others, but they all do the same job.
On top of that, you get some software special sauce to tie it all together. For EVO::RAIL, that looks something like this:
I decided to play a little game, a bit like when you want to buy a new computer: you go online, put together all the parts you want in a basket, look at, dream a little. After a few rounds of this you start justifying it to yourself, then you wife/accountant. Eventually, you build one final basket pull out the credit card an pull the trigger. Well, this is like that, but more expensive.
I think that we now have everything we need in the FLOSS world to impliment a Hyper-converged architecture. I suppose this is my attempt to document that as a some sort of reference architecture. A key component will be Openstack, but not necessarily everywhere.
The easy bit is the CPU and RAM: plenty (at least 6 cores with 64GB of RAM). Networking, surprisingly is also relatively simple. Anything will do (2x 1Gb will be fine an entry level node), but 2x 10Gb is preferable, Infiniband would also be great (Linux works beautifully with Infiniband). All this would (ideally) be put into a box that has multiple nodes in one box.
Storage is more complex. Tiering is essential, and I personally am not a fan of hardware RAID. Additionally, this needs to be replicated. The overall architecture would look something like:
Each node would have to at least an SSD, plus a decent number of spinny disks to get a sensible capacity. The spinners have no need for RAID - data protection is done on a cluster level higher up the stack. I think right now, I would probably run with Gluster, for reasons which I will revisit later. Ceph is probably where is till be at in the future though.
A good rule of thumb when using tiering is that your SSD should be roughly 10% of your spinning rust. So lets say a single 400-500GB SSD, along side 5TB of spinning disk.
For the clustering, you would need at least 2x 1Gb NICs dedicated to this. If one could dedicate a pair of 10Gb NICs then that would be awesome.
Finally, the OS needs to be installed on something, but a USB key or SD card is more than sufficient for that.
The complex bit is the initial configuration. What really makes the likes of Nutanix and EVO:RAIL stand out is the simplicity of install. The images above our the 4 steps it takes to get an EVO:RAIL cluster running and I have to admit it pretty darn good.
What we need to aim at is that you:
- download an image
- burn on to USB keys/SD cards
- boot all the nodes
- it works!
I think all the parts to auto-configure a cluster exist in the FOSS world. The problem is making it 100% plug and play. I would say that this is finally a genuine use for IPv6 and mDNS. Let's dedicate a pair of 1Gb/s NICs to cluster communications and do all that over IPv6.
This enables to get a fully working network going with no intervention from the user. Now we can have a Config Management system running over said network. If we use Puppet, then our secret sauce can be used as an ENC to configure all the nodes.
What this means is that amount of new code that needs to be written is relatively small. All we need is that initial configuration utility. To make it super simple, this could even be in "the cloud" and each cluster registers itself with a UUID. I suppose this would work in much the same way as etcd. I suppose this interface could also be where the user downloads their image files, thus the UUID could be part of said image. Make all that FLOSS, and people can host their own management portal if they prefer. It would function in much the same way as RHN/Satellite.
Anyway, once that is all done, Puppet can then go and do all the necessary configuration. There is quite a bit of integration that needs to happen here.
There are 2 obvious choices for the virtualisation layer:
These are necessarily mutually exclusive. The oVirt team are build in support for various Openstack technologies:
- Neutron for networking
- Cinder for block storage
- Glance for template storage
Different hosts could be tagged as either oVirt or Nova nodes depending on the type of app they are running. They all then share the same pool of storage.
The configuration would then be dealt with using Puppet roles.
When you need to add a new appliance, you just download an image with the correct UUID and it will add itself to the cluster.