Proof of data integrity – Solidity code

Proof of data integrity – Solidity code

Update:

I’ve simplified to code to take more advantage of the boolean data type Ethereum offers.

Now the mapping is (bytes32=>bool) instead of (bytes32=>bytes32).

The boolean array is used to prove the existence of a single document. The root of the tree is stored once and is hashed again with the new input.

The complete code can be found on Github. The old test files can be used on this code as well

struct tree{
    bytes32 root;
    mapping(bytes32=>bool) dataExist;
}

bytes32 public empty;

mapping (address=>tree) public users;    
****

function addData(
			uint256 _input,
			address _user)
			returns (bool success)
{
	var data   = keccak256(_input);        
	var oldRoot = getUserRoot(_user);
    var newRoot = hashTheTwo(data, oldRoot);

    users[_user].dataExist[data] = true;

    users[_user].root = newRoot;

    return true;
}
****

function checkDataIntegrity
        (uint256[] _data, 
        address _user)
        constant         
        returns (bool complete){ 

	var oldRoot = empty;                         
    
    for (uint i = 0; i < _data.length; i++) {     
        var data = keccak256(_data[i]);
        if(users[_user].dataExist[data]){
            var root = hashTheTwo(data, oldRoot);
            oldRoot = root;
            continue;             
        }else{
            return false;
        }
    }

    if(root == getUserRoot(_user)){
    	return true;
    }else{
    	return false;
    }
}
Contract: test 01
    The first stage is Deploying Data
      ✓ Deploys the Data contract
      ✓ Register account 0 user (69ms)
    Adds three datas to account 0 user tree
      ✓ get account 0 user root - should be undefined (79ms)
      ✓ Adds the first data 1 (108ms)
      ✓ Adds the second data 2 (86ms)
      ✓ Adds the third data 5 (79ms)
    Check data integrity
      ✓ Pass the complete array [1, 2, 5] - expect true (122ms)
      ✓ Pass the  array [1, 2] - expect false (93ms)
      ✓ Pass the  array [1, 5, 2] - expect false (83ms)


  9 passing (818ms)
    

Storing data on the blockchain

Current blockchain architecture allows us to decentralized valuable information. The most obvious example is the blockchain itself, which is nothing more than just a database that each user can interact with. The users can download a copy of the blockchain, parse it and extract any information that is meaningful to them, to add information to that database, to check its integrity and so on.

But adding information directly to the blockchain is a problematic process. For one, it’s highly expensive. Whether you’re using Bitcoin or Ethereum as the blockchain on which to store your data, you’ll soon find out that any attempt to save more than a few bytes of data at a time can get ridiculously expensive. For that reason, many have started to use the blockchain as a method to “proof existence” of said document. In this process, instead of publishing the full document on the blockchain, the document is hashed using a prespecified hashing algorithm. This practice means that the owner of the document uses the blockchain not as a mean to store his or hers document, but to prove:

  1. Ownership over the said document (As long as he keep the private key from which the transaction was deployed)
  2. The existence of the said document at a specific point in time (by looking at the block header timestamp)
  3. The integrity of that specific document, as each minor change to the original file, will result in an entirely different hash.

By keeping the document yourself, you’re also able to better handle your privacy, as now instead of publishing your own private documents on a public blockchain for all to see, you’re only posting the result of a hash function, which is extremely difficult to Didact the contains of the original document from.

Such a system might be sufficient for sporadic use. But what happens if we want to create a system that new documents are continuously added to it. And we want to be able to prove the integrity of each individual document, both by itself and in conjunction to those preceding it?

 

Binary trees.

Binary trees are not a new thing in blockchains. Merkle trees and roots are used in Bitcoin and Ethereum to store and organize transactions and to allow for merged mining. In Ethereum the trees are also used to access the storage (variables) and states of the blockchain.

One of the great characteristics of binary trees is the ability to use them, plus some hashing algorithm to prove the integrity of the data stored in it.

Let’s have a look at the most common example, the Bitcoin Merkle tree. In this tree, each leaf represents one transaction. These transactions are hashed together again and again until finally, the final hash (the root) is produced. Storing the root require much less space than when storing all of the transactions data. But if I want to check that a specific transaction is indeed a part of a specific block, I can reconstruct that said block Merkle root by myself. In this case, that means that instead of storing all of the transactions that took place in the blockchain, I maintain only copies of the transaction that is relevant to me (and usually it requires less than a half of the transactions in a block).

In Bitcoin this tree is used both to proof the integrity of the block and to make it easier to validate transaction without having a full copy of the blockchain

 

My proposal 

Knowing the advantages of binary trees, hashing, key encryptions, and filled with the motivation to create a user specific database that will allow him/her to maintain control over his/hers private information, while still being able to prove their ownership over the information and the integrity of that information, I decided to play a little with different Solidity codes. The idea was to use mapping as the mean of creating pairs of leaves and root.

 

Each leaf is hashed with the previous root to produce the new root of the tree

Each root is the hashed of all of the chain that lies bellow its level, plus the new leaf added. This way, each attached leaf if linked and chained with the rest.

pragma solidity ^0.4.6;

contract Data{
    
    struct tree{
        bytes32 root;
        mapping(bytes32=>bytes32) leafAndRoot;
    }

    bytes32 public empty;                                           // Hard codded         
    
    mapping (address=>tree) public users;
    
    function newUser(){                                             // To do - Modifer "onlyNewUser" 
        users[msg.sender];
    }
    
    function addData(
            uint256 _data,              // To do - serialize data/non empty
            address _user)
            returns (bool success){  
        
        var leaf    = keccak256(_data);   // Hashing the input
        var oldRoot = getUserRoot(_user);
        var newRoot = hashTheTwo(leaf, oldRoot);
        
        users[_user].leafAndRoot[leaf] = newRoot;
        users[_user].root = newRoot;

        return true;
    }
    
    function getRoot(
            uint256 _leafData,      // The input is in plain uint256 and hashed format to allow for future UI to be devloped
            address _user)
            constant 
            returns (bytes32 root){ // The root of specific leaf
        
        var leaf = keccak256(_leafData);                                            // Hashing the input
        return users[_user].leafAndRoot[leaf];
    }
    
    function getUserRoot(
                address _user)
                constant
                returns (bytes32 root){ // The higest (last) root      
        return users[_user].root;
    }

    function hashTheTwo
                (bytes32 _a, // To do - serialize data/non empty
                bytes32 _b)  // To do - serialize data/non empty
                constant
                returns (bytes32 hashed){         
        return keccak256(_a, _b);
    }
    
    function checkDataIntegrity
            (uint256[] _data, // To do - serialize data/non empty
            address _user)
            constant         // Run localy
            returns (bool complete){ 
         
         

        var oldRoot = empty;                         // Hard codded                        
        for (uint i = 0; i < _data.length; i++) {    // Reconstructing the tree     
            var data = keccak256(_data[i]);          // Hashing the input
            var root = hashTheTwo(data, oldRoot);
            
            if(root == getRoot(_data[i], _user)){         
                oldRoot = root;
                continue;
            }else{
                return false;
            }
        }        

        if (oldRoot == getUserRoot(_user)){
            return true;
        }else{
            return false;
        }
    }
}

For each new user, a new struct object is created containing two parts, The latest root in the tree, and the tree itself. The tree maps from bytes32 (the data/leaf) to the bytes32 of the root. That way a user can look up for a specific information and, if the root is valid, attest that the said information is indeed present in the database, while others cannot tell what the real information is just by looking at the blockchain.

Currently, due to input limitations in solidity, the easiest way to input and parse and the array is by using u/int array. Future implementation might include bytes32[] array or even direct string array as input.

All the input values are hashed to get a uniform 32 bytes result and to increase privacy.

function hashTheTwo(
        bytes32 _a,
        bytes32 _b)
        constant
        returns (bytes32){         // To do - serialize data/non empty
    return keccak256(_a, _b);
}

The data is then hashed together with the highest existing root to receive the new root of the tree (If the tree is empty, meaning no root exist yet, the first leaf is hashed with empty bytes32 variable).

The new root is then stored in a dedicated variable to allow adding extra information without manually looking for the latest existing root.

function addData
        (uint256 _data,
        address _user)
        returns (bool){                 // To do - serialize data/non empty
    
    var leaf    = keccak256(_data);     // Hashing the input
    var oldRoot = getUserRoot(_user);
    var newRoot = hashTheTwo(leaf, oldRoot);

    users[_user].leafAndRoot[leaf] = newRoot;
    users[_user].root = newRoot;

    return true;
}

When trying to prove the authenticity of a single entry, it’s enough to just check for the existence of a (none empty) root that corresponds to that specific piece of information.

function getRoot(
            uint256 _leafData,
            address _user)
            constant
            returns (bytes32){ // The root of specific leaf

    var leaf = keccak256(_leafData);                                            // Hashing the input
    return users[_user].leafAndRoot[leaf];
}

// Can also be rewtiren to give bool result

function isExist(
            uint256 _leafData,
            address _user)
            constant
            returns (bool exist){
            
    var leaf = keccak256(_leafData);                                            // Hashing the input
    if(getRoot(_leafData, _user) != 0x00){
        return true;
    }else{
        return false;
    }            
}

Proving the existence of the entire database is done by providing all the pieces in their proper order and reconstructing the finale root. If the results match the one stored on the blockchain, that means that the owner of that data array has a complete copy of that array.

function checkDataIntegrity(
            uint256[] _data,
            address _user)
            constant
            returns (bool){ 
     
     // To do - serialize data/non empty
     // Run localy

    var oldRoot = empty;    // Hard codded                        
    for (uint i = 0; i < _data.length; i++) {

        var data = keccak256(_data[i]);         // Hashing the input
        var root = hashTheTwo(data, oldRoot);

        if(root == getRoot(_data[i], _user)){   // Reconstructing the tree 
            oldRoot = root;
            continue;
        }else{
            return false;
        }
    }        

    if (oldRoot == getUserRoot(_user)){
        return true;
    }else{
        return false;
    }
}

The complete code plus test file can be found on my github page

Contract: test 01
    The first stage is Deploying Data
      ✓ Deploys the Data contract
      ✓ Register account 0 user (69ms)
    Adds three datas to account 0 user tree
      ✓ get account 0 user root - should be undefined (79ms)
      ✓ Adds the first data 1 (108ms)
      ✓ Adds the second data 2 (86ms)
      ✓ Adds the third data 5 (79ms)
    Check data integrity
      ✓ Pass the complete array [1, 2, 5] - expect true (122ms)
      ✓ Pass the  array [1, 2] - expect false (93ms)
      ✓ Pass the  array [1, 5, 2] - expect false (83ms)


  9 passing (818ms)
    

What can I do with it?

I can use the above system to prove that I’m in control over my own data, that I have the original data and that I maintain a complete copy of my database. Such a system can be combined with other types of encryptions to prove that the data is both belongs to me, complete, and recognized by other authorities.

I prove that I have a my complete medical file, I prove that the said message belongs to me and that it was signed by myself and by the doctor

Leave a Reply

Your email address will not be published. Required fields are marked *