Parse bytes argument in Solidity
January 3, 2019

The bytes type in Solidity is a dynamically-sized byte array, so it can contain any number of bytes. That makes it ideal for interfaces with some delegation functionality that want to keep genericity; all function arguments (and return types) can be encapsulated into one bytes argument. We can find examples of this in ERC223's transfer function, ERC677's transferAndCall and onTokenTransfer function and ERC777's send, tokensToSend and tokensReceived functions.

interface ERC223Token {
    function transfer(address to, uint value, bytes data);
}

interface ERC677Token {
    function transferAndCall(address receiver, uint amount, bytes data) returns (bool success);
}
interface ERC677Receiver {
    function onTokenTransfer(address from, uint256 amount, bytes data) returns (bool success);
}

interface ERC777Token {
    function send(address to, uint256 amount, bytes data) external;
}
interface ERC777TokensRecipient {
    function tokensReceived(address operator, address from, address to, uint256 amount, bytes data, bytes operatorData) external;
}
interface ERC777TokensSender {
    function tokensToSend(address operator, address from, address to, uint amount, bytes userData, bytes operatorData) external;
}

When using a fixed-size byte array such as bytes32, then some direct conversions are possible. For example when those 32 bytes in the bytes32 represent an integer, one can parse them very easily with

function decodeBytes32ToUint256(bytes32 data) public {
    uint256 parsed = uint256(data);
}

Because the bytes type has no fixed size, such an approach is obviously not applicable. This article will give an overview about how to parse the (concatenated) bytes in bytes into the types they were intended to be before they were concatenated into one generic bytes instance.

We will be using Solidity's inline assembly to accomplish this task. Whenever the bytes actually contain 32 bytes, this is very easy:

function parse32BytesToUint256(bytes data) public {
    uint256 parsed;
    assembly {parsed := mload(add(data, 32))}
}

function parse32BytesToAddress(bytes data) public {
    address parsed;
    assembly {parsed := mload(add(data, 32))}
}

function parse32BytesToBool(bytes data) public {
    bool parsed;
    assembly {parsed := mload(add(data, 32))}
}

function parse32BytesToBytes32(bytes data) public {
    bytes32 parsed;
    assembly {parsed := mload(add(data, 32))}
}

The previous examples all work in the same way. First allocate memory for the specific type (uint256, address, bool, ...). Then read 32 bytes from our input data and store them in the allocated memory.

The first 32 bytes of the dynamically-sized byte array bytes contain the length of the bytes instance. So when we say add(data, 32), what we do is adding 32 to the pointer that points toward the memory address of our data variable. This essentially means: skip the first 32 bytes. The mload operation reads in one 32-byte word at a time. So mload(add(data, 32)) means: read bytes 32 to 63.

The exact same reasoning can be used to read in concatenated bytes. So when our bytes are 64 bytes long because it is a concatenation of two uint256 integers, we can extract them by reading bytes 32 to 63 and bytes 64 to 95:

function parse64BytesToTwoUint256(bytes data) public {
    uint256 parsed1;
    uint256 parsed2;
    assembly {
	    parsed1 := mload(add(data, 32))
	    parsed2 := mload(add(data, 64))
    }
}

Because the mload operation works with 32-byte words, it is by now clear that this approach works well for bytes which have a length that is a multiple of 32.

But what if our concatenated bytes represent an uint256 integer (32 bytes long) and an address (20 bytes long)? This can still be solved by iterating byte by byte over the 20 specific bytes like:

function parse52BytesToUint256AndAddress(bytes data) public {
    uint256 parsed;
    address parsedAddress;
    uint160 addressAsInt = 0;
    uint8 addressByte = 0;

    assembly {
        parsed := mload(add(data, 32))
    }

    for (uint8 i = 32; i < 52; i++) {
        addressAsInt *= 256;
        addressByte = uint8(data[i]);
        addressAsInt += (addressByte);
    }
    parsedAddress = address(addressAsInt);
}

Solutions like these are not very gas-efficient. Better ways like the memcpy function of Arachnid's solidity-string utils library exist. Nevertheless it is generally more efficient to keep the length of your bytes a multiple of 32 as to be able to use the mload operation. In the previous example this would mean to pad the 20 bytes representing the address to 32 by prepending it with 12 (random) bytes.

Be aware that it is not possible to put a lot of local variable onto the stack. The following example will return the exception CompilerError: Stack too deep, try removing local variables.

function parse320Bytes(address _operator, address _from, address _to, uint256 _amount, bytes _data, bytes _operatorData) public {
    uint256 d1;
    uint256 d2;
    uint256 d3;
    uint256 d4;
    uint256 d5;
    uint256 d6;
    uint256 d7;
    uint256 d8;
    uint256 d9;
    uint256 d10;
    uint256 d11;
    assembly {
        d1 := mload(add(_data, 32))
        d2 := mload(add(_data, 64))
        d3 := mload(add(_data, 96))
        d4 := mload(add(_data, 128))
        d5 := mload(add(_data, 160))
        d6 := mload(add(_data, 192))
        d7 := mload(add(_data, 224))
        d8 := mload(add(_data, 256))
        d9 := mload(add(_data, 288))
        d10 := mload(add(_data, 320))
        d11 := mload(add(_data, 352))
    }
}

Encountering the CompilerError: Stack too deep, try removing local variables. can be circumvented by using arrays. This means that for example 13 variables could be parsed in a loop like:

function parseAsArray(address _operator, address _from, address _to, uint256 _amount, bytes _data, bytes _operatorData) external {
    uint256[] memory parsed = new uint256[](13);
    for (uint256 i = 32; i <= 416; i += 32) {
        assembly {mstore(add(parsed, i), mload(add(_data, i)))}
    }
}

I'll conclude this article with an example of a function that can receive ERC777 Tokens. The function tokensReceived takes a bytes argument. I'll read the first byte of that bytes as an integer. So that means a value between 0 and 255. That byte will be the selector or action as to which function of the ERC777TokensRecipient contract was intended to be called.

function tokensReceived(address _operator, address _from, address _to, uint256 _amount, bytes _data, bytes _operatorData) external {
    uint8 action;
    assembly {action := mload(add(_data, 1))} // read bytes 1 to 32 and parse to uint8 so only use byte 32. (bytes 0 to 31 contain the length of _data)
    if (action == 1) {
        return myFunction(_data);
    } else if (action == 2) {
        return anotherFunction(_data);
    }
    require(false, "Invalid function");
}

function myFunction(bytes _data) public {
    require(_data.length == 33, "Invalid input for function myFunction");
    // parse the 32 required bytes by skipping the first action byte
}

function anotherFunction(bytes _data) public {
    require(_data.length == 65, "Invalid input for function anotherFunction");
    // parse the 64 required bytes by skipping the first action byte
}