Dynamic symbolic execution

Maat enables users to perform advanced binary analysis thanks to its symbolic execution features. By design, Maat supports both full symbolic execution and dynamic symbolic execution (abreviated DSE, also refered to as concolic execution).

Full symbolic execution makes it possible to explore program states without spending too much time solving path constraints, however it can run into well-known limitations such as symbolic memory pointers, impossible paths, unbounded loops, or OS/network simulation.

Concolic execution combines concrete and symbolic execution to overcome symbolic's execution limitations by executing programs in a concrete manner while still propagating symbolic information. This way when the program reaches a symbolic branch it can use the concrete value associated with the symbolic input to resolve the branch. Concolic execution can thus be seen as symbolic execution over a concrete execution trace. Maat supports both types of execution but at the moment it works best with concolic execution.

In this tutorial we show how to create and use concolic data and symbolic constraints. We also demonstrate how concolic execution can be used to solve a simple reverse-engineering challenge.

Have fun!


Abstract values

To begin with, let's see how we can create and manipulate symbolic values in Maat. A symbolic value is a value containing symbolic variables. Symbolic variables represent abstract data that has no concrete value on its own. They can be created with the Var() constructor:

from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)

m.cpu.rax = Var(64, "a") # Variable "a" on 64 bits
m.cpu.rbx = Var(64, "b") # Variable "b" on 64 bits

Just like concrete values in Maat, symbolic variables are Value objects and classical operators can be applied on them to create more complex expressions, as seen in previous tutorials:

>>> type(m.cpu.rax)
<class 'Value'>
>>> x = (m.cpu.rax + 0x100)*m.cpu.rbx + 2*m.cpu.rax  # Create a complex expression
>>> print(x)
((0x2*a)+((0x100+a)*b))
>>> type(x) # Combining variables with concrete integers still gives an 'Value' object
<class 'Value'>

Since symbolic variables don't have a concrete value, calling as_int() and as_uint() will fail on symbolic expressions:

>>> m.cpu.rax.as_int()
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError: Cannot concretize symbolic variable without supplying a context

If we want to associate concrete values to the variables we can use the vars attribute of our MaatEngine, which is a VarContext object. This makes the symbolic variables become concolic, since they can now be concretized:

>>> m.vars.set("a", 0x12345678) # Set "a" to 0x12345678
>>> m.cpu.rax
a # RAX is still an abstract concolic value... 
>>> m.cpu.rax.as_uint(m.vars)
0x12345678 # ... but it can be concretized according to the symbolic variable context

# Note: for CPU registers only, the VarContext argument defaults to m.vars and can be omitted 
>>> m.cpu.rax.as_uint()
0x12345678

To summarize, we can have the following types of abstract values:

  • concrete values: they contain no symbolic variables

  • symbolic values: they contain at least one symbolic variable which doesn't have a value set in the VarContext. They can not be concretized and can also be refered to as 'fully symbolic' expressions

  • concolic values: (concrete + symbolic) they contain at least one symbolic variable and all the variables it contains have a value set in the VarContext. They can thus be concretized

It is possible to check whether a value is concrete, concolic, or symbolic by using the is_concrete(), is_concolic(), and is_symbolic() methods:

# Constant value is always concrete:
>>> c = Cst(32, 0x1234)
>>> c.is_concrete(m.vars)
True
>>> c.is_concolic(m.vars)
False
>>> c.is_symbolic(m.vars)
False

# "a" is concolic when it has a value in the VarContext
>>> a = Var(64, "a")
>>> m.vars.set("a", 0x12345678) # Set "a" to 0x12345678
>>> a.is_concolic(m.vars)
True
>>> a.is_symbolic(m.vars)
False

# "a" becomes symbolic when its value is removed from the VarContext
>>> m.vars.remove("a")
>>> a.is_concolic(m.vars)
False
>>> a.is_symbolic(m.vars)
True


Making data concolic

To make a register symbolic, we can simply assign an abstract variable to it, just as shown previously:

>>> m.cpu.rax = Var(64, "a")
>>> m.cpu.rax
a   # RAX is now the symbolic variable "a"   

In order to make a register concolic, we simply have to save its concrete value in the engine's VarContext, before assigning the abstract variable:

>>> m.cpu.rax = 0x111 # Set some concrete value
>>> m.vars.set("a", m.cpu.rax.as_uint())
>>> m.cpu.rax = Var(64, "a")
>>> m.cpu.rax
a   # RAX is now the symbolic variable "a" 
>>> m.cpu.rax.as_uint()
0x111 # RAX can still be concretized to its concrete value

To symbolize data stored in memory, Maat provides the make_symbolic() and make_concolic() helper methods. They can be used as follows:

# Make memory purely symbolic
>>> m.mem.map(0x0, 0xfff)
# Symbolize memory at address 0x100 as an array named "buf" of 2 variables of 4 bytes each
>>> m.mem.make_symbolic(0x100, 2, 4, "buf")
>>> m.mem.read(0x100, 4)
buf_0   # Symbolic variable 'buf_0'
>>> m.mem.read(0x104, 4)
buf_1   # Symbolic variable 'buf_1'

# Make memory concolic
>>> m.map(0x0, 0xfff)
>>> m.mem.write(0x100, b'whatever') # Write concrete data
# Make memory at address 0x100 a concolic array named "buf" of 2 variables of 4 bytes each
>>> m.mem.make_concolic(0x100, 2, 4, "buf")
>>> m.mem.read(0x100, 4)
buf_0   # Symbolic variable 'buf_0'
>>> m.mem.read(0x104, 4)
buf_1   # Symbolic variable 'buf_1'

# The abstract variables context has kept the concrete memory values
>>> m.vars.get("buf_0")
0x74616877
>>> m.mem.read(0x100, 4).as_uint(m.vars)
0x74616877 # reading memory returns "buf_0", but it can be concretized
>>> m.vars.get_as_buffer("buf", 4) # Interpret 'buf_' vars as a buffer of 4-byte elements
b'whatever'

In addition to making memory symbolic, it is possible to generate standalone symbolic buffers with VarContext's new_symbolic_buffer() and new_concolic_buffer() methods. Those can be used to pass symbolic program arguments, or write symbolic data to the filesystem, as will be shown in a moment.

# Create symbolic buffer of 4 8-bytes values
>>> m.vars.new_symbolic_buffer("x", 4, elem_size=8)
[x_0, x_1, x_2, x_3]

# Create concolic buffer from a list of concrete values [1,2,3,4]
>>> m.vars.new_concolic_buffer("y", [1,2,3,4], 4, elem_size=8)
[y_0, y_1, y_2, y_3] # Returned buffer contains abstract concolic values...
>>> m.vars.get("y_0")
1 # ... so they can be concretized

# Create a concolic buffer of 1-byte values from concrete bytes
>>> m.vars.new_concolic_buffer("z", b'foo')
[z_0, z_1, z_2]
>>> chr(m.vars.get("z_0"))
'f'
>>> m.vars.get_as_str("z")
b'foo'


Solving symbolic constraints

Now that we have seen how to manipulate symbolic expressions and how to symbolize registers and memory, let's talk about constraints.

Constraints are instances of the Constraint class and represent arithmetic and logical constraints on expressions. Basic constraints can be built using all standard arithmetic comparison operators:

a = Var(64, "a")

# Create constraints
a == 100
a != 100
a < 100
a ≤ 100
a > 100
a >= 100 

Basic constraints can be combined with the built-in & and | operators:

(a == 100) & (a > 120)
(a < 100) | (a > 30)

Constraints can also be inverted by either using the built-in ~ operator, or by calling constraint.invert() (both are equivalent):

~(a == 100)
(a == 100).invert()

Great! Now let's see how we can solve constraints. Solving constraints means finding values for all the symbolic variables it contains so that the constraint becomes true. For instance, the constraint (a == 100) has only one solution for variable a which is 100. The constraint (a < 100) would have a 100 possible solutions for a (0, 1, 2, 3, ..., 99)

In Maat, constraints can be solved by using a Solver object. A solver can be created this way:

>>> solver = Solver()

Then constraints can be added with the add() method:

>>> a = Var(64, "a")
>>> b = Var(64, "b")
>>> c = Var(64, "c")

>>> solver.add(a > b)
>>> solver.add(b == c)
>>> solver.add(c == 0x42)

Once constraints have been added, we can try to solve them by calling the check() method. check() returns True if the solver could solve all constraints, and False otherwise:

>>> solver.check()
True

If check() was succesful, the solution to the constraint(s) (called a model), can be retrieved as a VarContext object with the get_model() method:

>>> model = solver.get_model()
>>> type(model)
<class 'VarContext'>
>>> model

a : 0x43
b : 0x42
c : 0x42

If we want to apply a given model to a current MaatEngine instance, we can use the update_from() method:

>>> m.vars.update_from(model)
>>> m.vars

a : 0x43
b : 0x42
c : 0x42
...

Finally, after solving the constraints, we can reset the solver and remove all the constraints it currently contains by calling reset():

>>> solver.reset() 


Cracking a basic challenge

Alright! We have learned everything we need about symbolic values and constraint solving and will now apply it on a concrete example. To conclude this tutorial, we will use Maat to automatically solve a simple crackme binary.

The challenge is called crackme1. Download it here (the archive contains the binary along with ld and libc libraries needed to load and execute it). The binary asks for a serial of 8 characters and will print Correct! only if the correct serial is entered:

$ ./crackme1 
Enter the 8-characters serial: abcdefgh
Wrong!

$ ./crackme1
Enter the 8-characters serial: ********
Correct!

Our goal will be to find out the correct serial. We could obviously solve the challenge without the source code, but to know what's going on let's take a glance:

// crackme1.c
#include <string.h>
#include <stdio.h>

unsigned long long transform(char* str){
    unsigned long long res = 0;
    for( int i = 0; str[i] != '\0'; i++ ){
        res = res << 8;
        res = res + ((str[i]*7 - 0x20) & 0xff);
    }
    return res;
}

int main(int argc, char ** argv){
    char serial[256];
    // Get serial
    printf("Enter the 8-characters serial: ");
    if( !scanf("%s", serial))
        return 1;
    if( strlen(serial) != 8 ){
        printf("Serial length must be 8\n");
        return 1;
    }
    
    // Check serial
    if( transform(serial) == 0x378ed80c535a3630 ){
        printf("Correct!\n");
    }else{
        printf("Wrong!\n");
    }

   return 0;
}

We can see that the source code doesn't contain directly the correct serial value. Instead, the user-supplied serial is processed by the transform() function which returns an integer computed from the serial (calling transform() with different serials will return different integers).

The integer returned by transform() is then checked against the hardcoded value: 0x378ed80c535a3630. If it is equal, the program deduces that the serial was correct.

Let's use Maat to automatically find the serial that makes the program print Correct!

First of all, we write a script that loads and run the binary. Since the program reads data from stdin with scanf, we will also add symbolic data in stdin (the symbolic filesystem will be explained in details in another tutorial, for now simply trust the example):

# solve.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("crackme1", BIN.ELF64, base=0x04000000, libdirs=["."])

# Fill stdin with 8-bytes concolic input
# Note: On a single-process linux environment, stdin handle is always 0 
stdin = m.env.fs.get_fa_by_handle(0)
buf = m.vars.new_concolic_buffer(
    "input",
    b'aaaaaaaa',
    nb_elems=8,
    elem_size=1,
    trailing_value=ord('\n') # concrete new line at the end of the input
)
stdin.write_buffer(buf)

Next, we will need to add hooks at the right place in order to find the input that makes the program print Correct!. To find the location we are interested in, we open the challenge in our favorite disassembler and find the branch that follows the input check:


We see that the branch occurs at address 0x040008b1. Since the branch depends on the symbolic input, it corresponds to a path constraint. If you remember our previous tutorial, you know that we can hook the PATH event in order to instrument this symbolic branch. The goal is to get the branch condition, and then invert it so that we continue to 0x040008b3 instead of jumping to 0x040008c1.

As an exercice, try to write a PATH event callback that uses a solver to find an input that doesn't satisfy the branch condition. The solution is given below:

# <previous part of the solve.py script>

# Add hook on symbolic branch
def find_serial(m: MaatEngine):
    if m.info.addr == 0x040008b1: # We care only about that particular branch
        s = Solver()
        s.add(m.info.branch.cond.invert())
        if s.check():
            model = s.get_model()
            print(f"Found serial: {model.get_as_str('input')}")
        else:
            print("Failed to find serial")
        return ACTION.HALT
    
m.hooks.add(EVENT.PATH, WHEN.BEFORE, callbacks=[find_serial])

# Run analysis
m.run()

Running the entire script gives:

$ python3 solve.py
Found serial: 1bHt56z0

We can validate the solution:

$ ./crackme1
Enter the 8-characters serial: 1bHt56z0 
Correct!


We hope you enjoyed this introduction to dynamic symbolic execution. We showed how it can help to reason about potential user inputs and automatically find inputs that satisfy arbitrary runtime constraints. Of course we just scratched the surface of possible usage for symbolic execution, make sure to checkout our next tutorials to discover other applications and use-cases!