Do not share any user data of your customers in a public blockchain network you have no control over. If the data is accidentally revealed at any time there is no way back.
You probably have seen many blockchain implementations that promise to protect privacy. This blogpost gives you a small overview about the real possibilities of providing privacy on blockchains. After a short overview of the problems, I split the solutions into five layers to protect the privacy of blockchains. I am happy about feedback and suggestions for improvement.
There are several key challenges when using blockchains, which are changing the game of “keeping data private” a little bit.
When operating on a blockchain, we probably want everyone to be able to validate that the ledger is correct. We want that the data is stored correctly and available. At this point, we do not care about the payload. We just want to make sure all the blocks are correct and the consensus protocol is working. So all the information the blockchain needs to operate must be available to all participants. No matter if we talk about private or public blockchains. This also includes metadata, for example uptime or hash power of your node, timestamps, transaction information and much more, depending on your blockchains system.
The next challenge directly concerns the core concept of blockchains: We cannot delete data. Data which might be currently not sensitive could become sensitive over the lifespan of the blockchain. Which is in the case of bitcoin more than nine years. Imagine your rent a service but you don’t want anyone to know you’re using that service. Because you have heard that it should be more secure, you pay with bitcoin. Now that you own bitcoin, you also spend a lot of coins in other shops. But if one of these shops gets hacked and your data gets to the public, everyone knows your bitcoin address and where you spent money. And you can’t do anything about it.
Another problem with the revealing of data: if enough data is revealed, supposed “anonymity” can turn quickly to pseudonymity. This means that an actually anonymous key can be connected to your person. With the potential that AI, Big Data and Graph Analysis offer we are already able to do this.
But once we get past all this, there’s another challenge: Smart-Contracts. Smart Contracts are small programmes which interact on blockchains or with blockchain data. Probably the most famous example is ethereum, which already offers a huge amount of so-called “DApps”, distributed applications. We can use this smart-contracts, for example, to start a crowdfunding project or participate in the famous game “cryptokitties”. These small programs work on the data stored in the individual blocks. If we encrypt the data of the individual blocks, the Smart-Contracts can no longer work with them.
This is why it is important that we find a suitable solution for each individual use-case.Therefore I’ve come up with a “privacy layer-cake” for blockchains.
This layer could be applied to your solution independently or in combination. Each layer strengthens the protection of private data but also increases the technical and organisational effort required.
Layer Zero — Manual Encryption
The simplest and most obvious way of preserving privacy is to encrypt the payload of your transaction manually. We could use, for example, RSA or Elliptic Curve Cryptography (ECC). The data is shared among the possessors of the key. As I mentioned before, this solution is not suitable for smart-contacts because the functions of the contract can not evaluate the encrypted data. advantage of this method is that we can use it anywhere, e.g Ethereum, where we can store a payload on the blockchain.
The downside of encrypting the payload manually is, that if once the data is revealed, there is no way back until the whole blockchain is deleted on every node. This is almost impossible for public blockchains like bitcoin.
Layer 1 — Hashing data and Private Collections
This principle seems easy, but the implementation is more complex than the first layer. In this layer, we only store the hash of our private data (e.g. of a private document) in the payload. The data is protected since it cannot be recreated from the hash. The real data is sent to everyone we want to share the information with.
In some existing solutions there is a second and private ledger to store the private data. The implementation takes care of the private messages automatically and treats them as actual payloads from blogs. This way we can even use them with special smart-contracts (not with every smart-contract, because the smart-contract have to access the secret data, too). Sending the data directly to other users also has the advantage, that we can address the recipient directly.
In Hyperledger Fabric this kind of data is called “private data collection” and uses the “gossip protocol” to establish the connection between peers. Also, there is a ethereum implementation, which is called “Quorum”, which also implements this feature.
Layer 2 — Side-Chains and Channels
In addition to the steps above, we can split your blockchain network up into smaller parts. Each chunk exists in a separate (blockchain-) network and might have an intermediate blockchain that connects the networks.
There are multiple ways to realise this. On public blockchains, we often see so-called side-chains, e.g. in RSK, which split from the main chain and return there after a series of transactions. (Note: RSK does not enable privacy per se. It’s just a good example for sidechains, which are also great for scaling and providing other features).
There are also various solutions in the area of private blockchains. For example the “Channels” of the Hyperledger Fabric blockchain are completely separate from other channels and a transfer of data is not intended.
To name a use-case, this layer of privacy can be used if we have a subset of business partners that do not want to give insights into business transactions. We also could define internal project within a task group and delete it later without affecting the main network.
Layer 3 — Application Authentication
In the future of blockchains, you won’t access the blockchain directly over a node on your device. especially with business blockchain applications, a company may use blockchain-technology in their backend and the users access the node over a simple Web interface. This makes it possible to design the access control on the application level.
The decision, who gets which information, still depends on the blockchain software. However, the physical separation is no longer given.
An example: A consumer protection organisation is part of a consortium with a number of car manufacturers. The blockchain can verify that all information is correct. The consumer protection organisation now provides a blockchain node for customers to receive information about purchased products. Each manufacturer offers a number of information, but only passes it on to individual, authorised users.
You could implement this — and it’s fairly simple — with the “Hyperledger Composer” tool, a software abstraction for Hyperledger Fabric networks. With this tool, we can easily define user access with an Access Control List (ACL) and even integrate authentication mechanisms like OAuth.
Layer 4 — Zero Knowledge
The last layer of preserving privacy on the blockchain is a cryptographic method called zero-knowledge proof. It must be provided by the blockchain itself and can be hard to understand.
We can imagine it as follows: You want to buy a new car with Ethereum, but you don’t want your neighbours to know which ridiculous high price you have paid. With a zero-knowledge proof, you are able to verify to the public that you have actually sent the money, but hide the exact amount from the public.
With zero-knowledge, we can prove, that something is true or that a transaction was sent — but you do not have to reveal one single bit of information more.
If you see a big question mark now — don’t worry. There is a pretty cool explanation, written as a story for kids called “The Cave of Ali Baba”.
In Ethereum, we can already use this method. It’s called “ZK-SNARKs” and the younger but stronger family member “ZK-STARKs”. The latter is even quantum-computer resistant but needs more computing power and memory space.
In Hyperledger Fabric, there is a zero-knowledge based authentication method available, named “Identity Mixer”. This could be used to prove that we are of legal age without revealing our name. Or if we think a little bigger — we could use it for a voting system and keep the voter anonymous.
As complementary information, I want to notice, that there are several other methods of preserving privacy on blockchains.These methods, however, were developed for a very specific purpose and cannot be mapped to other systems. A great example of such implementation is the cryptocurrency “Monero”. Monero advertises that money can be transferred without leaving a trace.
Also I would like to point out that data privacy is a huge issue, especially in view of the General Data Protection Regulation (GDPR). Please think twice before you push your data or even the data of your customers on a blockchain network.
Blockchains are designed to be distributed and persistent. There is no way to withdraw leaked personal information from nodes you can’t control.
Stick to the model and implement as many layers as you can to keep your private data secure and separated.
Thanks for reading! Let me know if you have any questions.