Getting started
Welcome to Maat's hands on tutorial! In this first tutorial we will use the Python API and learn to:
Have fun!
Hands on
Let's start this tutorial series by introducing Maat's most important object: the
MaatEngine
.
A MaatEngine
is designed to symbolically execute code expected to run inside a
single-threaded process.
It exposes various methods and attributes that enable to load a binary, execute code,
instrument the execution, take snapshots, gather path constraints, and so on. Today we will
focus on the essential features that are used to execute simple programs,
and interact with CPU registers and memory.
First things first, let's create an engine. To do that, we need to specify which architecture and
operating system
we want to emulate (see ARCH
and OS
for available architectures and systems).
For this tutorial we will use 64-bit Linux:
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
The next thing we want to do is symbolically execute some code. While the API allows to arbitrarily map
code in memory and execute it, Maat has a loader interface that can properly load an executable file.
For this tutorial we are going to
work on a simple ELF binary compiled for X64: print.bin
(download it here).
It's source code is shown below:
#include <stdio.h>
int main(int argc, char** argv)
{
printf("Argument starts with: '%c'\n", argv[1][0]);
return 42;
}
The program is fairly simple, all it does is print the first
character of the first argument in the command line, and then
exit with status 42
.
In order to load print.bin
in Maat, we will use the
engine's load()
method as follows:
m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)
In the parameters to load()
, we specify the path of the binary to load,
the type of binary (ELF64
), and the command line arguments ('hello'
).
The load_interp
parameter is a bit special and enables to select between two very distinct ways Maat can
load binaries:
- when
load_interp
isTrue
(which is the default), Maat will try to find the interpreter requested by the program, load it, and then symbolically execute the interpreter code to load the program and its dependencies. This ensures that the binary and its dependencies are properly loaded and initialised. The drawback is that the user must provide the binaries for the interpreter and the shared libraries. It also induces runtime overhead, since the interpreter code must be executed before reaching the actual program - when
load_interp
isFalse
, Maat will manually load the program and its dependencies, and perform the relocations. If Maat can't find binaries for the dependencies, Maat will try to emulate missing functions with user-defined callbacks. Emulating imported functions can work for some simple libc functions, but will quickly become blocking on big real-world programs. Therefore this method should be reserved for very simple binaries (likeprint.bin
used in this tutorial), or when shared libraries can't be provided
There are other parameters to load()
to specify environment variables, directories where to look
for shared libraries, the base address of the program, etc. If you are interested, have a look at the
documentation.
Run a binary
Now that the binary is loaded, we can call the run()
method to
start the execution.
By default, run()
will execute instructions until the program exits or until an error
occurs. However it is possible to limit the number of instructions to execute with: run(nb_instr)
.
Maat also has a run_from(addr)
method, which sets the instruction pointer to an
arbitrary address and calls run()
from there.
To summarize:
m.run() # Run from current position
m.run(100) # Run 100 instructions from current position
m.run_from(0x1234) # Run starting from address 0x1234
m.run_from(0x1234, 100) # Run 100 instructions from address 0x1234
Our script to load and run print.bin
becomes:
# run.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)
m.run()
Let's run it!
$ python3 -i run.py
Argument starts with: 'h'
>>>
Great! The binary was properly executed. When the engine stops executing code,
we can look at the info
attribute. It always gives the reason why the engine stopped.
Depending on the context it can also contain a lot of additional information (address of the current instruction,
register and memory accesses, path constraint, branch conditions and target(s), etc).
We will learn how and when all this information can be used in later tutorials, but you can always have
a glance at
Info
in the documentation for more details.
In our case we have:
>>> m.info
Stop: program exited # Stop reason
Status: 0x2a # Exit status (0x2a = 42)
Info fields can of course be accessed separately:
>>> m.info.stop
>>> m.info.exit_status
...
The MaatEngine
also has a settings
attribute that can control
many aspects of the engine behaviour during execution. The various available settings
are documented here and will for the most
part be covered in later tutorials. As an example, we can enable the log_insts
setting
to have the engine display every instruction that is executed. Simply add the following statement
before calling run()
:
m.settings.log_insts = True
And the output of our script becomes:
$ python3 -i run.py
[Info] Run 0x540: XOR EBP,EBP
[Info] Run 0x542: MOV R9,RDX
[Info] Run 0x545: POP RSI
[Info] Run 0x546: MOV RDX,RSP
[Info] Run 0x549: AND RSP,-0x10
[Info] Run 0x54d: PUSH RAX
[Info] Run 0x54e: PUSH RSP
[Info] Run 0x54f: LEA R8,[0x700]
[Info] Run 0x556: LEA RCX,[0x690]
[Info] Run 0x55d: LEA RDI,[0x64a]
...
Play with CPU registers
Now that we have learned how to load and run binaries, let's move on
and see how we can interact with the engine's CPU.
The registers can be read and written through cpu
attribute:
>>> m.cpu.rax = 0x1234
>>> m.cpu.rax
0x1234
Please note that Maat doesn't use regular integers to represent
register values, but instances of Value
,
which can be either concrete values or abstract expressions:
>>> type(m.cpu.rax)
<class 'Value'>
The simplest values that we can create are constants
(2, 4, 0x20, ...
) and
abstract variables which are basically just named bitvectors with no concrete value (x, y, esp_0, ...
).
In the present tutorial we will only use concrete values built from constants.
Usage of abstract variables will be introduced in later tutorials.
Constant values can be created by calling Cst()
:
x = Cst(32, 1) # Constant 1 on 32 bits
y = Cst(64, -78) # Constant -78 on 64 bits
z = Cst(27, 0x45) # Constant 0x45 on 27 bits
Values always represent a bitvector value on a fixed number of bits.
The size of an value is accessible through the size
attribute:
>>> x = Cst(32, 1)
>>> x.size
32
For convenience Value
objects
can be combined with both Value
and regular integers to create new values:
>>> rax = m.cpu.rax
>>> rax
0x1234
>>> type(rax)
<class 'Value'>
>>> x = rax + 0xff
>>> x
(0xff+0x1234)
>>> type(x)
<class 'Value'>
All common arithmetic and logic operations on bitvectors are suppported. They are for the most part implemented using the built-in python operators:
x = Cst(32, 0xcafe)
y = Cst(32, 0xbabe)
x + y # Addition
x - y # Subtraction
x * y # Multiplication
x / y # Division
x & y # Logical AND
x | y # Logical OR
x ^ y # Logical XOR
x >> y # Shift right
x << y # Shift left
x % y # Remainder
-x # Unary minus
~x # Logical NOT
Extract(x, 10, 2) # Extract bitfield x<10:2>
Concat(x, y) # Concatenate x and y (x in higher significant bits)
Last but not least, Value
s can be converted back to regular python
integers thanks to the as_uint()
and as_int()
methods. They concretize the value, and interpret it respectively
as an unsigned and a signed value:
>>> x = Cst(32, -67)
>>> x.as_int()
-67
>>> x.as_uint()
4294967229
Play with memory
Memory can be accessed through the mem
attribute which
is a Memory
instance.
The map()
method can be used to map memory regions and choose which of
RWX
permissions to give. When loading a binary Maat automatically creates
the mappings for the process's memory. However, if we don't use load()
to initialise the engine, we'll need to explicitely call map()
before
reading or writing to memory regions.
# Map 0x0 - 0xfff with RW perms
m.mem.map(0x0, 0xfff, PERM.RW)
The write()
method can be used to write values, integers,
and byte buffers in memory:
# Write the constant 10 on 4 bytes at address 0x100
m.mem.write(0x100, 10, 4)
# Write a concrete buffer at address 0x200
buf = b'I love pineapples'
m.mem.write(0x200, buf, len(buf))
# Write the current value of RAX at address pointed by RSP
m.mem.write(m.cpu.rsp, m.cpu.rax)
It is also possible to read memory with the read()
method (it always
returns Value
instances):
>>> m.mem.write(0x100, 0x12345678, 4)
>>> m.mem.read(0x100, 4)
0x12345678
>>> m.mem.read(0x100, 2)
0x5678
The read_buffer()
and read_str()
methods can be used
to read a buffer (with a fixed number of elements and element size) or a null
terminated C string from memory. To illustrate their usage, let's modify our
loading script to have it display the argv[]
array just after the
binary was loaded:
# show_args.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)
# When entering _start(), argc and argv[] are on the stack
argc = m.mem.read(m.cpu.rsp, 8) # Read argc
argv = m.mem.read_buffer(m.cpu.rsp+8, argc.as_uint(), 8) # Read buffer of 'argc' 8-byte elements
print(f"argc is: {argc}")
print(f"argv is: {argv}")
print(f"argv[0] is: {m.mem.read_str(argv[0])}") # Read C string at argv[0]
print(f"argv[1] is: {m.mem.read_str(argv[1])}") # Read C string at argv[1]
The output is:
$ python3 -i show_args.py
argc is: 0x2
argv is: [0x7fffffffaf0, 0x7fffffffb00]
argv[0] is: b'print.bin'
argv[1] is: b'hello'
>>>
Alright! We will stop here for this introduction to Maat. Hopefully it helped you
grasp the basics of the API and gave you motivation to continue learning
and discover cool features that the framework can offer.
Next tutorial: Events!