Analyzing the ERC20 Short Address Attack

Jun 25, 2017

Back in April of 2017, the Golem Project published a blog post about the discovery a security bug affecting some exchanges such as Poloniex. According to the post, when certain exchanges processed transactions of ERC20 tokens, input validation was not being performed on account address length. The result was malformed input data being provided to the contract’s transfer function, and a subsequent underflow condition that manipulated the amount being sent. The impact was that an attacker could potentially rob an exchange account of tokens.

The attack explained by the Golem Project exemplifies a rather unique case, in which an exchange acts as both a client and a server. That is, the exchange is a server for users to buy tokens as well as a client to the Ethereum network. This differs from typical contract interaction in which a client uses the Ethereum network directly, and any transaction error would likely be the sole fault of the client and not a third-party. Luckily for the Golem Project, the vulnerability is not known to have ever been exploited. It has since been dubbed the "ERC20 short address attack."

This got me curious about the feasibility of exploiting this vulnerability today. First and foremost, an attacker would need to manipulate the input data such that A) the provided address resolved to a valid account (ideally one owned by the attacker) and B) the amount specified was less than or equal to that of the exchange’s own supply of ERC20 tokens. How trivial would this be to exploit in the wild today? What can contract developers do to protect themselves?

Contract ABI And Input Data

First let’s set aside exploitability for a minute and explore how this vulnerability is even possible in the first place. Ethereum’s Contract ABI (application binary interface) allows clients to call a contract’s function. I think that the use of this binary protocol is best illustrated by an example.

Let’s say we have the following contract written in Solidity:

pragma solidity ^0.4.11;
 
contract MyToken {
	mapping (address => uint) balances;
 
	event Transfer(address indexed _from, address indexed _to, uint256 _value);
 
	function MyToken() {
		balances[tx.origin] = 10000;
	}
 
	function sendCoin(address to, uint amount) returns(bool sufficient) {
		if (balances[msg.sender] < amount) return false;
		balances[msg.sender] -= amount;
		balances[to] += amount;
		Transfer(msg.sender, to, amount);
		return true;
	}
 
	function getBalance(address addr) constant returns(uint) {
		return balances[addr];
	}
}

If I wanted to send coins to another address, I would create a transaction with input data that would look something like this (with added line breaks for clarity):

0x90b98a11
00000000000000000000000062bec9abe373123b9b635b75608f94eb8644163e
0000000000000000000000000000000000000000000000000000000000000002

Where:

0x90b98a11 is the method ID (4 bytes), which is the Keccak (SHA-3) hash of the method signature.
00000000000000000000000062bec9abe373123b9b635b75608f94eb8644163e is the "to" address (20 bytes), padded to 32 bytes.
0000000000000000000000000000000000000000000000000000000000000002 is the "amount" unsigned integer (non-fixed, 1 byte), padded to 32 bytes.

Triggering an Underflow

Now what would happen if the client were to send something malformed? We could, for example, choose not to pad the address properly so that the address is less than 32 bytes. The total input data length would be less than what is expected, causing a condition where an data underflow may occur.

Let us suppose that we want to send some coins again to 0x62bec9abe373123b9b635b75608f94eb8644163e. However, this time we decide to drop the last byte in the address which is 3e. We end up with the following input data:

0x90b98a11
00000000000000000000000062bec9abe373123b9b635b75608f94eb86441600
00000000000000000000000000000000000000000000000000000000000002  
                                                              ^^
                                           Note the missing byte

This is where things begin to get interesting. How does Solidity/EVM handle underflows? The contract event generated by our transaction tells the answer:

Event name: Transfer
Return values:
- _from: 0x58bad47711113aea5bc5de02bce6dd7aae55cce5
- _to: 0x62bec9abe373123b9b635b75608f94eb86441600
- _value: 512

The missing byte has been replaced with 00. This effectively performed a bit shift operation of 8 bits to the left. As a result, the address had 00 added to the end (taken from the zeros at the beginning of the proceeding word), and the number of coins transferred became 512 ( 2<<8 = 512).

Exploitability

Let’s explore how feasible this vulnerability would be to exploit in the wild today. We can confirm that it would be easy enough for an attacker to use the address of a contract under their own control. This is because the only requirement is that the address ends in 00. We can also confirm that it would be quite feasible for an exchange account to have at least 512 ERC20 tokens. These two conditions suggest that an attack like this would be relatively easy to pull off even today.

However, an important requirement remains for a successful exploitation of this vulnerability. An exchange with a supply of ERC20 tokens must not be performing user input validation or proper padding when preparing transaction input data. Without this condition present, the address will be invalid but no underflow would occur. The result would likely be the transfer of tokens to an Ethereum account that would never exist due to an invalid address (recall here that a ERC20 token "transfer” would be a change of state within the contract itself, not an actual transaction from an existing address to an invalid one).

Mitigation

The community’s awareness of this vulnerability probably makes it less likely to be an exploitable bug today. However, terms like "probably" and "less likely" don’t always sit well in a threat model. Therefore, if you are thinking about handing over a supply of tokens to an exchange that you may not trust (put bluntly, should not trust), performing your own input validation within your ERC20 contract is generally a good idea.

Unfortunately this is not as straight-forward as simply checking the length of an address. By the time you are validating something inside of a Solidity function body it is too late. Your data would have already been decoded and an underflow would have already altered it. One solution, as pointed out here, is to check the length of raw input data received in the transaction:

assert(msg.data.length == size + 4);

Calculating the value of size should be easy, since you can expect every argument to be 32 bytes in length and you know how many arguments your function takes. Using this to validate that an underflow hasn’t occurred seems to work, although I haven’t formally verified it.

Hopefully this issue will be properly addressed in an upcoming Solidity and/or EVM update.

#ethereum