Getting started

Welcome to Maat's hands on tutorial! In this first tutorial we will use the Python API and learn to:

Have fun!


Hands on

Let's start this tutorial series by introducing Maat's most important object: the MaatEngine.

A MaatEngine is designed to symbolically execute code expected to run inside a single-threaded process. It exposes various methods and attributes that enable to load a binary, execute code, instrument the execution, take snapshots, gather path constraints, and so on. Today we will focus on the essential features that are used to execute simple programs, and interact with CPU registers and memory.

First things first, let's create an engine. To do that, we need to specify which architecture and operating system we want to emulate (see ARCH and OS for available architectures and systems). For this tutorial we will use 64-bit Linux:

from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)

The next thing we want to do is symbolically execute some code. While the API allows to arbitrarily map code in memory and execute it, Maat has a loader interface that can properly load an executable file. For this tutorial we are going to work on a simple ELF binary compiled for X64: print.bin (download it here).

It's source code is shown below:

#include <stdio.h>

int main(int argc, char** argv)
{
    printf("Argument starts with: '%c'\n", argv[1][0]);
    return 42;
}

The program is fairly simple, all it does is print the first character of the first argument in the command line, and then exit with status 42.

In order to load print.bin in Maat, we will use the engine's load() method as follows:

m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)

In the parameters to load(), we specify the path of the binary to load, the type of binary (ELF64), and the command line arguments ('hello').

The load_interp parameter is a bit special and enables to select between two very distinct ways Maat can load binaries:

  • when load_interp is True (which is the default), Maat will try to find the interpreter requested by the program, load it, and then symbolically execute the interpreter code to load the program and its dependencies. This ensures that the binary and its dependencies are properly loaded and initialised. The drawback is that the user must provide the binaries for the interpreter and the shared libraries. It also induces runtime overhead, since the interpreter code must be executed before reaching the actual program

  • when load_interp is False, Maat will manually load the program and its dependencies, and perform the relocations. If Maat can't find binaries for the dependencies, Maat will try to emulate missing functions with user-defined callbacks. Emulating imported functions can work for some simple libc functions, but will quickly become blocking on big real-world programs. Therefore this method should be reserved for very simple binaries (like print.bin used in this tutorial), or when shared libraries can't be provided

There are other parameters to load() to specify environment variables, directories where to look for shared libraries, the base address of the program, etc. If you are interested, have a look at the documentation.


Run a binary

Now that the binary is loaded, we can call the run() method to start the execution.

By default, run() will execute instructions until the program exits or until an error occurs. However it is possible to limit the number of instructions to execute with: run(nb_instr).

Maat also has a run_from(addr) method, which sets the instruction pointer to an arbitrary address and calls run() from there.

To summarize:

m.run() # Run from current position
m.run(100) # Run 100 instructions from current position
m.run_from(0x1234) # Run starting from address 0x1234
m.run_from(0x1234, 100) # Run 100 instructions from address 0x1234

Our script to load and run print.bin becomes:

# run.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)  
m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)
m.run()

Let's run it!

$ python3 -i run.py
Argument starts with: 'h'
>>>

Great! The binary was properly executed. When the engine stops executing code, we can look at the info attribute. It always gives the reason why the engine stopped. Depending on the context it can also contain a lot of additional information (address of the current instruction, register and memory accesses, path constraint, branch conditions and target(s), etc). We will learn how and when all this information can be used in later tutorials, but you can always have a glance at Info in the documentation for more details.

In our case we have:

>>> m.info

Stop:       program exited  # Stop reason
Status:     0x2a            # Exit status (0x2a = 42)

Info fields can of course be accessed separately:

>>> m.info.stop
>>> m.info.exit_status
...

The MaatEngine also has a settings attribute that can control many aspects of the engine behaviour during execution. The various available settings are documented here and will for the most part be covered in later tutorials. As an example, we can enable the log_insts setting to have the engine display every instruction that is executed. Simply add the following statement before calling run():

 m.settings.log_insts = True

And the output of our script becomes:

 $ python3 -i run.py
[Info] Run 0x540: XOR EBP,EBP
[Info] Run 0x542: MOV R9,RDX
[Info] Run 0x545: POP RSI
[Info] Run 0x546: MOV RDX,RSP
[Info] Run 0x549: AND RSP,-0x10
[Info] Run 0x54d: PUSH RAX
[Info] Run 0x54e: PUSH RSP
[Info] Run 0x54f: LEA R8,[0x700]
[Info] Run 0x556: LEA RCX,[0x690]
[Info] Run 0x55d: LEA RDI,[0x64a]
...


Play with CPU registers

Now that we have learned how to load and run binaries, let's move on and see how we can interact with the engine's CPU. The registers can be read and written through cpu attribute:

>>> m.cpu.rax = 0x1234
>>> m.cpu.rax
0x1234

Please note that Maat doesn't use regular integers to represent register values, but instances of Value, which can be either concrete values or abstract expressions:

>>> type(m.cpu.rax)
<class 'Value'>

The simplest values that we can create are constants (2, 4, 0x20, ... ) and abstract variables which are basically just named bitvectors with no concrete value (x, y, esp_0, ...). In the present tutorial we will only use concrete values built from constants. Usage of abstract variables will be introduced in later tutorials.

Constant values can be created by calling Cst():

x = Cst(32, 1) # Constant 1 on 32 bits
y = Cst(64, -78) # Constant -78 on 64 bits
z = Cst(27, 0x45) # Constant 0x45 on 27 bits

Values always represent a bitvector value on a fixed number of bits. The size of an value is accessible through the size attribute:

>>> x = Cst(32, 1)
>>> x.size
32

For convenience Value objects can be combined with both Value and regular integers to create new values:

>>> rax = m.cpu.rax
>>> rax
0x1234
>>> type(rax)
<class 'Value'>
>>> x = rax + 0xff
>>> x
(0xff+0x1234)
>>> type(x)
<class 'Value'>

All common arithmetic and logic operations on bitvectors are suppported. They are for the most part implemented using the built-in python operators:

x = Cst(32, 0xcafe)
y = Cst(32, 0xbabe)

x + y # Addition
x - y # Subtraction
x * y # Multiplication
x / y # Division
x & y # Logical AND
x | y # Logical OR
x ^ y # Logical XOR
x >> y # Shift right
x << y # Shift left
x % y # Remainder
-x # Unary minus
~x # Logical NOT
Extract(x, 10, 2) # Extract bitfield x<10:2>
Concat(x, y) # Concatenate x and y (x in higher significant bits)

Last but not least, Values can be converted back to regular python integers thanks to the as_uint() and as_int() methods. They concretize the value, and interpret it respectively as an unsigned and a signed value:

>>> x = Cst(32, -67)
>>> x.as_int()
-67
>>> x.as_uint()
4294967229


Play with memory

Memory can be accessed through the mem attribute which is a Memory instance.

The map() method can be used to map memory regions and choose which of RWX permissions to give. When loading a binary Maat automatically creates the mappings for the process's memory. However, if we don't use load() to initialise the engine, we'll need to explicitely call map() before reading or writing to memory regions.

# Map 0x0 - 0xfff with RW perms
m.mem.map(0x0, 0xfff, PERM.RW)

The write() method can be used to write values, integers, and byte buffers in memory:

# Write the constant 10 on 4 bytes at address 0x100
m.mem.write(0x100, 10, 4)

# Write a concrete buffer at address 0x200
buf = b'I love pineapples'
m.mem.write(0x200, buf, len(buf))

# Write the current value of RAX at address pointed by RSP
m.mem.write(m.cpu.rsp, m.cpu.rax)

It is also possible to read memory with the read() method (it always returns Value instances):

>>> m.mem.write(0x100, 0x12345678, 4)
>>> m.mem.read(0x100, 4)
0x12345678
>>> m.mem.read(0x100, 2)
0x5678

The read_buffer() and read_str() methods can be used to read a buffer (with a fixed number of elements and element size) or a null terminated C string from memory. To illustrate their usage, let's modify our loading script to have it display the argv[] array just after the binary was loaded:

# show_args.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("print.bin", BIN.ELF64, args=[b'hello'], load_interp=False)

# When entering _start(), argc and argv[] are on the stack
argc = m.mem.read(m.cpu.rsp, 8) # Read argc
argv = m.mem.read_buffer(m.cpu.rsp+8, argc.as_uint(), 8) # Read buffer of 'argc' 8-byte elements

print(f"argc is: {argc}")
print(f"argv is: {argv}")
print(f"argv[0] is: {m.mem.read_str(argv[0])}") # Read C string at argv[0]
print(f"argv[1] is: {m.mem.read_str(argv[1])}") # Read C string at argv[1]

The output is:

$ python3 -i show_args.py 
argc is: 0x2
argv is: [0x7fffffffaf0, 0x7fffffffb00]
argv[0] is: b'print.bin'
argv[1] is: b'hello'
>>>


Alright! We will stop here for this introduction to Maat. Hopefully it helped you grasp the basics of the API and gave you motivation to continue learning and discover cool features that the framework can offer.

Next tutorial: Events!