Where and how application data is stored in Ethereum?

2 February 2021

This article was published on 2017, updated february 2021.

Ethereum is used to build decentralized applications, a.k.a. DAPPs. These applications exist through small programs that live on the Blockchain, called smart contracts.

Before jumping into the platform and writing a smart contract, it’s really important to understand where your application data is stored. Code execution, servers and programming language are rarely critical to the design of an application. But data –its structure and its security– will constrain our design the most.

Let’s imagine we are porting apps to Ethereum:

  • For a Facebook-like, where are the publications and comments data?
  • For a Dropbox-like, where are my private files?
  • For a Slack-like chat app, where do we store discussion channels? And what about private messages?

The Account Machine

Let’s skip the explanation of blockchain for a minute (you can read my post on why Blockchain can be best understood as a machine that generates Consensus here). Let’s look at Ethereum from a higher level of abstraction – the software that powers it, which is basically a big, slow, reliable computer.

Ethereum holds a set of accounts. Every account has an owner and a balance (a quantity of Ether).

If I prove my identity, I can transfer Ether from my account to another. The money will flow from one account to the other. It’s an atomic operation called a “transaction”.

In other words, the Ethereum Software is a transaction processing system that works as follows:

  1. The system is in a certain state, i.e. every account has a certain balance.
  2. We carry out one or more transactions
  3. We get a new state: an updated set of accounts and their balances.

It’s as simple as that!

With that out of the way, we can turn our attention to how to execute code and programs within a transaction. And that’s where Smart Contracts come into play.

Robot Accounts

Every account has an owner and a balance. But some of these accounts are special: they own themselves. At creation time, we give them a piece of code and memory. That’s a Smart Contract.

A smart contract is really a smart bank account. The term “contract” is unclear—I prefer to think of them as Robot Accounts.

A smart contract is basically a robot that executes some code when it receives transactions. This transaction happens within the blockchain. It is public, replicated and validated by the network. That means a smart contract won’t fail because of a power outage in a Datacenter.

A smart contract has a balance, some code, and some storage. This storage is persistent, and that’s where we’ll find DAPP data.

Storage of Robot Accounts

When a smart contract is created or when a transaction awakens it, the contract’s code can read and write to its storage space.
Here’s a breakdown of its Storage Specifications:

  • It’s a big dictionary (key-value store) that maps keys to values.
  • Keys are strings of 32 bytes. We can have 232 x 8 bits = 2256 different keys. Same for values.
  • It’s like Redis, RocksDB or LevelDB storage.
  • DAPP and Smart Contracts function in a similar way to a hard-drive storage in a regular program.

Here’s an example of a Smart Contract structure. It uses the Solidity Programming Language:

'' Solidity Code (solidity.readthedocs.io)
struct Voter {
 uint weight;
 bool voted;
 uint8 vote;
 address delegate;
}

2256 keys x 32 bytes (values) is around 1063 PETABYTES. You would need several billions of times the age of the universe to go through this amount of data with an SSD.

Basically, we can safely assume that there’s no storage limit for a DAPP.

But there is a cost:

DAPPs Fuel

For every transaction, we add some Ether, the fuel needed to power it. The emitter of the transaction pays this tax to motivate the miners to process the transaction. Miners ensure the network is reliable and we reward them with some Ether.
So we send transactions and some fuel to this big machine. When the transaction targets a Smart Contract, the Ethereum machine starts the Account’s Robot. Each action of this robot will burn some more gas.
The actions taken by this robot are translated into instructions in the Ethereum Virtual Machine (EVM). There are instructions to read in storage, instructions to write, and so on. Each of these transactions has a cost in fuel, and that cost will constrain how much storage we can use.

Storage Cost

The cost of each instruction in a Smart Contract will limit the amount of storage it uses. In theory, Ethereum enables infinite storage space. But, in return, you have to provide gas for every read/write operation.
This cost changes all the time, depending on the network, the market and the way Ethereum specs develop. To get a general idea of the pricing, I simulated a few Smart Contracts:

I tried three operations:

  1. Writing a uint8 (one byte) in storage
  2. Incrementing a uint8 in the storage (read then write)
  3. A simple voting function, which checks whether the emitter of the transaction has the right to vote and then updates the vote result. You can vote only once; the second attempt is short-circuited.

Code and tools are in the Appendix below. Here are the numbers:

(note from 2021: numbers here are outdated, but the currency is so volatile these days, use this as a rough order of magnitude).

Based on this table, this article would cost around 50 Euros to store with a Smart Contract, excluding pictures.

Posting a tweet costs a few euros, and ordering on Amazon a few cents.

Of course, these are estimations with different orders of magnitude. The exact cost will depend on the exact instructions you use, as well as on the network load, the current price of gas, etc. New algorithms might also bring down the price of Ethereum (Proof Of Stake).

Finally, where should I store my data?

Well, maybe not on the Ethereum Blockchain. The data stored there, with Smart Contracts, is safe and easy to access. But the cost and the structure of the store is especially suited for metadata-related uses.
Taking the examples from the introduction: User Posts, Files and Message Boxes will probably be on another platform like IPFS. In the Ethereum Blockchain, we would store critical data, like encryption keys, roots to storage trees & authorizations.

Appendix

Piece of code used for the table:

pragma solidity ^0.4.0;

contract Test {
    mapping(uint =\> uint) tests;

    function Test() {
    }

    function one_set() {
        tests[0] = 0;
    }

    function two_increment() {
        tests[0] = tests[0] + 1;
    }
}
 
/// Give a single vote to proposal $(proposal).
function vote(uint8 proposal) {
    Voter storage sender = voters[msg.sender];
    if (sender.voted || proposal >= proposals.length) return;
        sender.voted = true;
        sender.vote = proposal;
        proposals[proposal].voteCount += sender.weight;
    }
}

Tools used to run the code and evaluate the costs:
Ethereum Gas Station to follow gas cost
Remix Solidity IDE to write and run Smart Contracts


Laurent Senta

I wrote software for large distributed systems, web applications, and even robots. These days I focus on making developers, creators, and humans more productive through IPDX.co.