Confidential VMs
This page describes the Humanode take on Confidential VMs.
What are Confidential VMs?
Confidential VMs aka Confidential Virtual Machines aka CVMs is a name of a family of hardware technologies that allow the execution of Virtual Machines in a special mode, where the RAM and CPU registries state owned by the VM are protected at the hardware level, and impossible to eavesdrop by the host OS and hypervisor. This means that a malicious admin or a hacker who gained access to the host system that runs the VMs is unable to compromise the data in the VM memory. Even the infamous cold-boot attack is blocked because the contents of the RAM stick are encrypted.
Why do we need Confidential VMs at Humanode?
We handle biometric data at Humanode, and we need to ensure the uniqueness of people using the system, while also preserving absolute privacy according to our trust model.
Confidential VMs allow us to run a system that has no external access besides the operational API (i.e. no SSH, no maintenance shell, no software updated, etc) and that does not have any operational code that can leak the biometric data - which allows us to implement a system that is safe for our users to send the biometric data into.
You can read more about our trust model in this article.
What is special in new Humanode CVMs?
We have been using CVMs for quite a while now, and they have been securing the biometric data of our users all this time. However, we were not satisfied with our solution: while we were able to verify (to a degree) the confidentiality of the data - our users were not able to do it. It is possible, however, to do better: build a fully transparent trust chain between the source code and the networked service available via the internet.
How, is this possible? Well, simple, actually: open source code, reproducible builds, direct access to the CVM-capable hypervisor, autonomous operation, and remote attestation. This is, in a nutshell, what Humanode CVMs do, compared to regular CVMs.
Open Source
The first step in gaining the trust of the users is opening the source code. It is a well-known formula, and we are glad to open-source everything we can.
That said, we, unfortunately, can't open-source everything. We are at a tough spot here, since the world-leading biometric providers we partner with ship proprietary code to us. We would be happy to use an open solution, but there is no open-source competition in this area yet. So, while an open-source solution is being built, we have to work with what we have - and we think we can do it, and it is not that bad either.
So, the idea behind our biometric server is simple: take the proprietary component and sandbox it heavily to prevent it from (accidentally or maliciously) leaking private biometric data outside of the CVM. Then build an open-source gateway into this sandboxed proprietary code, and only allow that to communicate with the world outside of the CVM. The gateway still executes inside the CVM, so it has an E2E encryption layer for securing data in transit, and the CVM provides memory encryption for securing data in processing - thus ensuring the private data is never handled in the plain-text form outside of the source system (i.e. user's device, that user implicitly trusts) - implementing our trust model without homomorphic encryption. We can open both the sandboxing and the gateway, effectively enabling the users to audit that we don't breach the trust model at the source code level.
Reproducible builds
To build a chain link from the source code to the binary artifact we need reproducible builds. We use Nix and NixOS as a base platform for our builds and CVM OS images. Nix is engineered from the ground up with the reproducible builds in mind, and they are a requirement for the whole base concept of Nix system to operate. Nix developers already invested heavily in achieving reproducible builds, and we are very grateful for their continuous effort.
We have built custom integration layers for the Nix and AMD SEV-SNP components, and we got a robust system that allows users to run a simple command and get exactly the same (byte-equivalent) build artifacts (qemu, kernel, initrd, etc) we get from the source code.
Direct access to the CVM-capable hypervisor
This point is mostly about the deployment model. We run on bare-metal servers, and this gives us low-level access to the hardware and host VM. We can run the resulting artifacts as a CVM directly and get the hardware security measurements without and middle-man-injected code, like, for instance, one would expect from a cloud CVM offering. What this means is that the raw hardware measurements will match the offline computed measurement of the build artifacts - linking the build and runtime stages with a trust relationship.
Remote attestation
The final link of the trust chain is the remote attestation - the ability for the hardware to sign the produced measurement (alongside a nonce of course) with a hardware-embedded CPU-specific private key. This allows us to get a CVM measurement from the code itself running inside of the CVM, signed by the actual CPU key. This signed measurement can be presented to the end user, and the end user can verify the CPU key signature via the remote attestation root key (AMD ARK/ASK key chain in particular for AMD SEV).
Autonomous operation
An important deployment aspect is the code in the Humanode CVM is never changed after launch. It is by design, and if we need to do an update - we'd rather redeploy the whole thing, losing all the private data, than upload new code to a CVM. We do not allow any operator access into the CVMs - as it would allow for potential data leakage or security compromise of the code running in the CVM. So, the CVM is in a sense autonomous - it runs completely without maintenance or intervention from the outside. So, since CVM itself protects the integrity and there are no maintenance backdoors - it should be impossible for the CVM operator to recover the confidential data, even if someone pushes them to do so.
This ultimately enables decentralized CVM deployments: since the operator of the CVM has no clue what data is inside, and the integrity of the code can be remotely attested before use, anyone can, in theory, be allowed to run a CVM - even with secret proprietary code in the picture, and super sensitive biometric data in mind.
Transparent chain of trust
All those links together form a transparent chain of trust for the end user: hardware creates and signs a measurement of the code running in a CVM, which the end user can verify the signature of and compare the measurement with the measurement computed offline for the build artifacts - and the build artifacts are, in turn, reproducibly built from the open source code that can be audited.
This is the first solution in existence (as far as we know) that enables the end user to truly know the system on the other end can be trusted. In our case - trusted not to leak the private biometric data - but this model is applicable in other scenarios as well.
This solution has some overlap in the properties with the smart contracts, but it can run unmodified x86-64 binaries (unlike, for instance, Intel SGX) and also work with private data (which requires significant complexity and limitations with Zero-Knowledge or Homomorphic Encryption schemes to implement in smart contracts).
How is it different from the other CVMs?
On the hardware level, the CVMs themselves are the same. It is the trust model that is different.
Usually, CVM platforms (i.e. cloud platforms) aim to assure the CVM integrity to the owners of the CVM.
Humanode CVMs aim to assure the end users of the code that they interact with the code they expect, and can audit.
Where do I get it?
This is currently a work in progress, and we will open-source things when they are ready.
If you still like to try it, or develop on our CVM technology - reach out to us at the #devs
channel in our Discord. We may give access to a select few - but keep in mind you will also need a bare-metal server with an AMD EPYC Zen 3 or newer platform to do anything useful.
Last updated