Hacking Blind Revisited

This post is a report of my study of the excellent Hacking Blind paper as well as a write up for the Pinata CTF challenge which I prepared for JustCTF 2020 (moved to January 2021). The paper is from 2014 and my goal was to evaluate how the blind ROP attack works nowadays. The idea was to go through all the exploitation steps described in the paper, see if they were not mitigated by ever-evolving security protections and prepare a vulnerable target which can be realiably exploited as a CTF challenge.

Paper study #

I highly recommend reading the paper as it’s really well written but I’ll give a high-level summary.

The goal is to gain code execution on a remote service, without having access to the target binary, by observing side effects from requests sent to the service. There are two basic requirements for this attack to succeed:

The service is architected as a forking server. This means that requests are handled in newly forked processes or are handed to a set of pre-forked workers.
The service has a stack buffer overflow vulnerability which the attacker can trigger by sending a request.

The attack assumes that common mitigations are present and can work around them, specifically the stack canaries and ASLR. This is why forking server requirement is important as neither canary nor address space are re-randomized on fork which allows to incrementally gain information about the binary.

At a very high level, the attack is divided into blind and non-blind part. The blind part is the most difficult and ends when we are able to leak memory at the given address. From this point we have virtually full knowledge of the binary and the running program and can follow with regular exploitation.

The attack roughly consists of these steps (for x86-64 architecture):

Discover the vulnerability, the connection will terminate in an unusual way because we triggered server process crash.
Perform “stack reading” that leaks the stack canary and a return address (or an address close to it) that allows to probe the code around it, effectively defeating the ASLR for this part of the address space.
Find a primitive called a stop gadget. This is an address that when jumped to, will inform us that the program has not crashed. The behavior observed from the attckers perspective must be different from the crash in step 1.
Find the ROP gadgets to set arguments to write syscall. This often boils down to finding the characteristic “BROP gadget” that allows us to set rsi and rdi (first and second arguments). Gadgets setting rdi (third argument) are not common and it’s more practical to use a side effect from calling functions such as strcmp that will set rdi to some value. Luckily, we don’t need an exact value for rdi, it just needs to be positive as write’s third argument is length.
Find the Procedure Linkage Table (PLT).
Find the address of write to leak memory and strcmp to set rdi (third argument) of write.
Blind part ends here, proceed with whatever is needed to gain code execution on the server.

Target dummy #

I picked nginx for the target and added a custom module written in C to introduce a vulnerability. The module has a simple stack buffer overflow vulnerability where the buffer is controlled by an attacker.

A convenient way to give unconstrained control of the buffer is to decode arbitrary bytes from a base64-encoded HTTP header value. The decode_credentials function is called on the value of the Authorization request header.

static int __attribute__ ((noinline)) decode_credentials(char* str) {
  ngx_str_t src;
  src.data = (unsigned char*) str;
  src.len = strlen(str);
  unsigned char buf[16] = {0};
  ngx_str_t dst;
  dst.data = buf;
  dst.len = 0;

  ngx_decode_base64(&dst, &src); // boom

  src.data = 0; // prevent inf loops when jumping back

  return dst.len;
}

When passing more than 16 bytes, ngx_decode_base64 will overwrite memory on stack. From this point, the exploitation appeared straightforward given clear instructions from the paper and the accompanying “Braille” exploit¹. As it quickly turned out, it was not so easy.

The “Braille” exploit is advertised as follows:

A fully automated tool that conducts a BROP attack (from crash to remote shell) when supplied with an input string that crashes a server due to a stack overflow.

Overall, I found the code very helpful, however I couldn’t make it work as advertised. I also doubt it would work with modern binaries, as PLT finding is done with an optimization (checking addr + 6):

# http://www.scs.stanford.edu/brop/braille.rb
def try_plt(addr, inf = @inf)
  rop = []
  rop << addr
  rop << (addr + 6)
  rop << inf
  rop << inf

  r = try_rop_print(addr, rop)

  return false if r != RC_INF

  rop = []
  rop << addr
  rop << (addr + 6)
  rop << DEATH

  r = try_rop_print(addr, rop)
  return false if r != RC_CRASH

  return true
end

As I discovered, this that doesn’t work with full RELRO² (more on that later).

Some exploitation steps required tweaks but worked in the end. The hardest parts turned out to be dealing with hangs and finding a reliable stop gadget.

Dealing with hangs and infinite loops #

While guessing bytes of the return address, I was already getting stuck in infinite loops or long hangs. One of such places was my custom module code, which I managed to get rid of with the src.data = 0; in the first listing. It’s worth mentioning here that nginx (unlike Apache Server) is not forking per new connection and it reuses a configured number of worker processes. This means the exploitation is harder because it’s easy to block all workers and nginx has to be restarted.

While it is possible to go around it for example by remembering “bad bytes” from previous trials, I found it to be quite annoying myself and imagined it will be frustrating for future CTF players. At this point, I also considered switching to Apache Server though after giving it some thought it wouldn’t really improve the situation as the number of forked connection would grow with time and having many processes running in infinite loops would require complex monitoring to not bring down the competition’s infrastructure.

Having all those constraints in mind, I decided to introduce a 1 second timeout for worker processes upon entering request handler for the module. If a worker goes into an infinite loop, it will be killed after 1 second and nginx master process will spawn a fresh one.

Finding a stop gadget #

Stop gadget is an address used to discover other useful places in the binary such as ROP gadgets or PLT. When the program jumps to it, we should be able to recognize that and differentiate from a regular crash. This could be something as straightforward as sending different data to the socket or a side effect such as a connection hang. Infinite loops and hangs are troublesome stop gadgets as pointed out in the previous section.

The Pinata Challenge #

The challenge obviously does not include the binary and the intended solution requires using the techniques from the paper. The only given data is an URL. Blind does not mean random guessing! There are some estimations to make, however, the challenge can be solved in a systematic way, incrementally learning more about the binary.

Vulnerability discovery #

The module performs basic authentication, after vising the page, it will prompt for a username and password.

Basic auth login - challenge entry point

Trying out some longer credentials will crash the worker process due to overwriting the stack canary. The socket will be closed without sending any data. The proxy will return 502 in this case.

At this point it’s good to prepare a function which will reproduce the crash and will allow us to precisely control the payload:

from pwn import *
from base64 import b64encode

HOST = b"..."
PORT = 80

def request(payload):
    with remote(HOST, PORT) as s:
        req  = b"GET / HTTP/1.1\r\n"
        req += (b"Host: %b\r\n" % HOST)
        req += (b"Authorization: Basic %s\r\n" % b64encode(payload))
        req += b"\r\n"
        print(req)

        s.send(req)

        try:
            resp = s.recv(1024, timeout=5)
        except Exception as e:
            print(e)
            return None

        print(resp)

    if  b"502" in resp or resp == '':
        return None # crashed
    else:
        return resp

The exact buffer size can be determined manually by using the auth form though it’s more practical to automate that. We can start with a single byte and add one by one until we get the crash:

MAX_BUFFER_SIZE = 1024

def detect_buffer_size():
    for i in range(1, MAX_BUFFER_SIZE):
        if not request(b"A" * i):
            return i - 1
    sys.exit("couldn't detect buffer size")

The buffer holding decoded base64 data from the Authorization header is only 16 bytes. There is also another word on the stack after the buffer, that’s why it appears as buffer is larger (24 bytes). This doesn’t really influence the attack in any meaningful way. Here we can begin the stack reading phase which signals that the BROP environment exists.

Stack reading phase #

The only part of the challenge which is randomized is the stack reading phase. In case of Pinata it’s stack canary and return address (7 + 5 bytes). Rest of the offsets can be hardcoded relative to the return address while incrementally exploring the binary for interesting places. I call it return address but stack reading can in fact return other addresses from the binary, in our case it’s also a stop gadget. The vulnerable function is compiled in a way that the word after canary doesn’t matter and can be all zero. This is convenient as the value is not requred for the task and it’s less requests to complete this step. So we get 3 words, first - canary, second - doesn’t matter, can be all zero, third - an address from the binary (stop gadget).

Stop gadget #

The challenge was specifically designed to encounter a “perfect” stop gadget very early (when going byte by byte ascendingly) while reading the return address form the stack. I call it perfect, because it’s a write straight to the socket without any additinal conditions/constraints. It’s not the original return address but that doesn’t matter. Good stop gadgets are hard to find and require a lot of trials, I wanted to spare this part.

Obviously, we don’t know that right away and might look for other addresses but it’s a strong sign and it’s worth to verify by trying to find some ROP gadgets by using it.

BROP gadget #

You can use the stop gadget to find the BROP gadget. It’s good to collect a few to avoid false positives. Later, we can loop through candidates and based on the behavior we can hardcode it’s offset for the rest of the challenge. BROP gadget is used to control rdi and rsi registers (first two arguments).

Finding PLT #

While finding the PLT is well described in the paper, there is a slight difference. The method of verifying that slow path does not crash at offsets +6 from a PLT entry does not work as Full RELRO is turned on, so we can’t rely on that. I guess it was not that common at the time of writing the paper. The pattern with just checking if a few subsequent PLT entries do not crash works well here, 3 should do the job. The binary is fairly large so skipping PLT entries is a must but it requires just one good hit. On the way, we will see that some addresses are executing (this is a side effect of verifying PLT address), we can use this information to estimate binary size and start from this offset next time as PLT is towards the beginning of the binary. Once we land on a promising address we can explore more from there, PLT is quite characteristic as subsequent entries do not crash and there is no other place like that in the program.

We could also grab here an nginx binary to do some estimations about the binary size and number of PLT entries which won’t be accurrate but can at give some perspective.

Finding strcmp #

After we know an estimated address of PLT we can start to look for strcmp, by iterating over entries using the pattern is described in the paper. The trick with using the PLT slow path won’t work due to Full RELRO hence the iteration. There are a few candidates similarly to the BROP gadget so it’s good to collect them all and later pick the winner. The address of strcmp is required to control rdx (third parameter to write).

Finding write #

To find write, we also iterate on PLT entries like with strcmp. The fd number is 3. It’s good to try all the strcmp candidates here. Eventually a leak from the binary will happen.

Dumping the binary #

It’s good to adjust arguments to strcmp to maximize rdx which will give us more leaked bytes per request.

Finalizing the attack #

Now we can perform a regular ROP attack by searching gadgets in the dumped binary and take control over the server. For example, load the /flag.txt contents into memory and write it to the socket or launch a reverse shell.

Congratulations to the Dice Gang and Kalmarunionen teams for solving it!