Machium Source Code
Let's build a debugger for iOS
by PsychoBird - December 9, 2021ptrace()
my way through this entire project and add some memory writing / reading after I'm done.
Nothing is that easy though, and what's unfortunate is that I found that out wayyyyy too late into the process of making this project.
ptrace(PTRACE_ATTACH)
and working from that function to access
registers, set breakpoints, and pause / continue the target process. My thinking process was quickly detoured and my dreams were squashed in the span of a few minutes due to an issue I found.
.
As one would, I went and included that file in my project and noticed that the header didnt exist and ptrace()
couldn't be found.
I struggled with this problem for a few minutes and then stumbled across a StackOverflow article that revealed the bad news to me.
Apple, in their infinite wisdom, decided to remove ptrace()
from the kernel. Yeah, Seriously. I have absolutely zero idea why they would remove ptrace()
from the kernel,
so instead of using that function, I had to use a bunch of random APIs to achieve just part of what was possible through ptrace()
ptrace()
.
The first thing we need is the "inhertance" or control abilities of the caller.
Fortunately, there is one function that allows us to take control of a remote task, and that's task_for_pid()
.
When I say "take control" what I really mean is that the function will give us a Mach send right to the task port, allowing us essentially
to do whatever we want to the process.
task_for_pid(mach_port_t owned_task, pid_t pid, mach_port_t* remote_task)
is loosely defined like this. (I couldn't find a formal definition, so I took some creative liberty with types)
Once the function is called, (check in Machium.c for an example) remote_task should contain a reference to the target task. Remote_task is what we'll be using throughout the entire debugger.
vm_read_overwrite()
and vm_write()
.
These two functions cover memory read/write in a specific task. For example usage and how I was able to create a wrapper for them, check Memory.c.
Using vm_read_overwrite()
was fairly easy, but vm_write()
was a bit more difficult.
I needed to use the vm_region_64()
function to grab the memory protection information of the memory page where I'll be writing.
After that, I had to change the memory protection through vm_protect()
to VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY
to write to to memory, and then reinstate
the stored protection.
ptrace()
in the trash can.
task_threads()
allows for an array of threads for a given task (in this case our debug task) to be returned.
Generally speaking, the first index in the array is the thread that is responsible for its main execution.
Since it is assumed the first thread is the most important one, it is what will be used for getting register states.
The function thread_get_state()
allows for a set of registers to be returned for a given state.
It is important to note that the function input depends on your CPU architecture and desired register types, because
the kernel won't just sort that out for you. What we'll want returned are the ARM64 register states, defined in the struct behind the typedef arm_thread_state64_t
For the calling convention, here is an example ripped from the Machium project. The context behind the code is self explanitory, but comments are in Register.c if needed.
(thread_get_state(thread_list[0], ARM_THREAD_STATE64, (arm_thread_state64_t) &state, &state_count)
)
state
. Just remember, however, that the
contents of state
are simply the register values at the time of its calling and don't change in real time. The thread state / register struct looks like this:
_STRUCT_ARM_THREAD_STATE64 { __uint64_t __x[29]; /* General purpose registers x0-x28 */ __uint64_t __fp; /* Frame pointer x29 */ __uint64_t __lr; /* Link register x30 */ __uint64_t __sp; /* Stack pointer x31 */ __uint64_t __pc; /* Program counter */ __uint32_t __cpsr; /* Current program status register */ };Source: https://opensource.apple.com/source/cctools/cctools-870/include/mach/arm/_structs.h
printf("0xllx", state.__x[5])
. Writing to the registers is easy too, and can just be done through =
.
After accessing and writing to the registers, thread_set_state()
must be called for the change to take place. The calling convention is the same as thread_get_state()
thread_get_state()
and thread_set_state()
are called, I call thread_suspend(thread_list[0])
and thread_resume(thread_list[0])
on the first index of the thread list, which in theory should always pause the main execution thread of the target task.
When they're paused and I call thread_get_state()
, it'll return the last register values of the target task and not debugger code.
arm_thread_state64_t
type defined struct and ARM_THREAD_STATE64
, we use arm_debug_state64_t
and ARM_DEBUG_STATE64
. The struct is defined as follows:
_STRUCT_ARM_DEBUG_STATE64 { __uint64_t __bvr[16]; __uint64_t __bcr[16]; __uint64_t __wvr[16]; __uint64_t __wcr[16]; __uint64_t __mdscr_el1; /* Bit 0 is SS (Hardware Single Step) */ };Source: https://opensource.apple.com/source/cctools/cctools-870/include/mach/arm/_structs.h
__bvr[16]
and __wvr[16]
are the respective breakpoint / watchpoint registers. In order to set them, the same process as above is followed.
The breakpoint registers should be set to an address in executable memory and the watchpoint registers should be set to a value in readable / writeable memory.
Once the set address in the breakpoint register is executed, a breakpoint exception will be raised and execution will be halted.
Similarly, once the memory address set in the watchpoint register is accessed or its content is changd, a watchpoint exception will be raised and execution will halt.
__bcr[16]
and __wcr[16]
variables used for then? Those are used for enabling / disabling breakpoints and watchpoints.
After some digging through the ARM Debug Manual I was able to find the exact bits I needed to flip in order to enable a breakpoint.
I won't bore you will the details, but know that setting __bcr[]
or __wcr[]
to 0x1E1
will enable watchpoints and breakpoints.
If you want to disable the breakpoints or watchpoints, set the registers back to 0. Just know that each value in the array corresponds with its respective break/watchpoint,
so for example __bcr[3]
must be set to the enable value in order for __bvr[3]
to work.
signal()
or task_set_exception_ports()
The easiest way of solving this problem is using signal()
, but it requires one extra step. Compiling and linking a dynamic library to a target process that uses __attribute__((constructor))
to raise
signal(SIGTRAP, handler)
before the main program runs will prevent crashing upon a hitting a break/watchpoint, but that's sloppy.
The other solution of using task_set_exception_ports()
is much more elegant.
task_set_exception_ports()
, we start by allocating a mach port for the remote task using mach_port_allocate()
. We give the mach port that we
created a receive right (MACH_PORT_RIGHT_RECEIVE
) so it's able to catch the breakpoint / watchpoint exception.
Next, we use mach_port_construct()
to construct a mach port with a send right (MPO_INSERT_SEND_RIGHT
)
so we can build our exception server that will receive the Mach message containing the exception.
Lastly, we create the exception port using task_set_exception_ports()
and set the exception flag to EXC_MASK_BREAKPOINT
.
We're able to create a mach receiver port successfully because we have a send right to the remote task that we're debugging thanks to task_for_pid()
!
For context and more detailed code, the function start_exception_server(Machium* machium)
in Breakpoint.c shows this code to completion.
task_suspend(machium->debug_task)
and task_resume(machium->debug_task)
.
It's very trivial so I didn't mention it earlier, but since pausing and resuming is a core part of a debugger I figured I would include it here at the end.
ldid
and this entitlements XML file to sign it correctly.
Entitlements such as task_for_pid-allow
are needed for debugging remote processes, unlocking task_for_pid()
usage.
Almost all of Machium can be slightly modified and inserted into a dynamic library to run under a self task context, but that's not as useful.