# Dynamic symbolic execution

Maat enables users to perform advanced binary analysis thanks to its symbolic execution features.
By design, Maat supports both **full symbolic execution** and **dynamic symbolic execution**
(abreviated DSE, also refered to as **concolic execution**).

Full symbolic execution makes it possible to explore program states without spending too much time solving path constraints, however it can run into well-known limitations such as symbolic memory pointers, impossible paths, unbounded loops, or OS/network simulation.

Concolic execution combines concrete and symbolic execution to overcome symbolic's execution
limitations by executing programs in a concrete manner while still propagating symbolic information.
This way when the program reaches a symbolic branch it can use
the *concrete* value associated with the symbolic input to resolve the branch. Concolic execution
can thus be seen as *symbolic execution* over a concrete execution trace. Maat supports both
types of execution but at the moment it works best with concolic execution.

In this tutorial we show how to create and use concolic data and symbolic constraints. We also demonstrate how concolic execution can be used to solve a simple reverse-engineering challenge.

- Abstract values in Maat
- Making data concolic
- Solving symbolic constraints
- Cracking a basic challenge

*Have fun! *

## Abstract values

To begin with, let's see how we can create and manipulate symbolic values in Maat.
A symbolic value is a value containing symbolic variables. Symbolic variables represent abstract data that
has no concrete value on its own. They can be created with the `Var()`

constructor:

```
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.cpu.rax = Var(64, "a") # Variable "a" on 64 bits
m.cpu.rbx = Var(64, "b") # Variable "b" on 64 bits
```

Just like concrete values in Maat, symbolic variables are `Value`

objects and classical operators can be applied on them to create more complex expressions, as seen in previous tutorials:

```
>>> type(m.cpu.rax)
<class 'Value'>
>>> x = (m.cpu.rax + 0x100)*m.cpu.rbx + 2*m.cpu.rax # Create a complex expression
>>> print(x)
((0x2*a)+((0x100+a)*b))
>>> type(x) # Combining variables with concrete integers still gives an 'Value' object
<class 'Value'>
```

Since symbolic variables don't have a concrete value, calling `as_int()`

and `as_uint()`

will
fail on symbolic expressions:

```
>>> m.cpu.rax.as_int()
Traceback (most recent call last):
File "
```", line 1, in
RuntimeError: Cannot concretize symbolic variable without supplying a context

If we want to associate concrete values to the variables we can use the `vars`

attribute
of our `MaatEngine`

, which is a VarContext object.
This makes the symbolic variables become **concolic**, since they can now be concretized:

```
>>> m.vars.set("a", 0x12345678) # Set "a" to 0x12345678
>>> m.cpu.rax
a # RAX is still an abstract concolic value...
>>> m.cpu.rax.as_uint(m.vars)
0x12345678 # ... but it can be concretized according to the symbolic variable context
# Note: for CPU registers only, the VarContext argument defaults to m.vars and can be omitted
>>> m.cpu.rax.as_uint()
0x12345678
```

To summarize, we can have the following types of abstract values:

**concrete values**: they contain no symbolic variables**symbolic values**: they contain at least one symbolic variable which doesn't have a value set in the`VarContext`

. They can not be concretized and can also be refered to as 'fully symbolic' expressions**concolic values**: (**conc**rete + symb**olic**) they contain at least one symbolic variable and all the variables it contains have a value set in the`VarContext`

. They can thus be concretized

It is possible to check whether a value is concrete, concolic, or symbolic by using the
`is_concrete()`

, `is_concolic()`

, and `is_symbolic()`

methods:

```
# Constant value is always concrete:
>>> c = Cst(32, 0x1234)
>>> c.is_concrete(m.vars)
True
>>> c.is_concolic(m.vars)
False
>>> c.is_symbolic(m.vars)
False
# "a" is concolic when it has a value in the VarContext
>>> a = Var(64, "a")
>>> m.vars.set("a", 0x12345678) # Set "a" to 0x12345678
>>> a.is_concolic(m.vars)
True
>>> a.is_symbolic(m.vars)
False
# "a" becomes symbolic when its value is removed from the VarContext
>>> m.vars.remove("a")
>>> a.is_concolic(m.vars)
False
>>> a.is_symbolic(m.vars)
True
```

## Making data concolic

To make a register symbolic, we can simply assign an abstract variable to it, just as shown previously:

```
>>> m.cpu.rax = Var(64, "a")
>>> m.cpu.rax
a # RAX is now the symbolic variable "a"
```

In order to make a register concolic, we simply have to save its concrete value
in the engine's `VarContext`

, before assigning the abstract variable:

```
>>> m.cpu.rax = 0x111 # Set some concrete value
>>> m.vars.set("a", m.cpu.rax.as_uint())
>>> m.cpu.rax = Var(64, "a")
>>> m.cpu.rax
a # RAX is now the symbolic variable "a"
>>> m.cpu.rax.as_uint()
0x111 # RAX can still be concretized to its concrete value
```

To symbolize data stored in memory, Maat provides the `make_symbolic()`

and
`make_concolic()`

helper methods. They can be used as follows:

```
# Make memory purely symbolic
>>> m.mem.map(0x0, 0xfff)
# Symbolize memory at address 0x100 as an array named "buf" of 2 variables of 4 bytes each
>>> m.mem.make_symbolic(0x100, 2, 4, "buf")
>>> m.mem.read(0x100, 4)
buf_0 # Symbolic variable 'buf_0'
>>> m.mem.read(0x104, 4)
buf_1 # Symbolic variable 'buf_1'
```

```
# Make memory concolic
>>> m.map(0x0, 0xfff)
>>> m.mem.write(0x100, b'whatever') # Write concrete data
# Make memory at address 0x100 a concolic array named "buf" of 2 variables of 4 bytes each
>>> m.mem.make_concolic(0x100, 2, 4, "buf")
>>> m.mem.read(0x100, 4)
buf_0 # Symbolic variable 'buf_0'
>>> m.mem.read(0x104, 4)
buf_1 # Symbolic variable 'buf_1'
# The abstract variables context has kept the concrete memory values
>>> m.vars.get("buf_0")
0x74616877
>>> m.mem.read(0x100, 4).as_uint(m.vars)
0x74616877 # reading memory returns "buf_0", but it can be concretized
>>> m.vars.get_as_buffer("buf", 4) # Interpret 'buf_' vars as a buffer of 4-byte elements
b'whatever'
```

In addition to making memory symbolic, it is possible to generate standalone
symbolic buffers with `VarContext`

's `new_symbolic_buffer()`

and `new_concolic_buffer()`

methods. Those can be used to pass symbolic program
arguments, or write symbolic data to the filesystem, as will be shown in a moment.

```
# Create symbolic buffer of 4 8-bytes values
>>> m.vars.new_symbolic_buffer("x", 4, elem_size=8)
[x_0, x_1, x_2, x_3]
# Create concolic buffer from a list of concrete values [1,2,3,4]
>>> m.vars.new_concolic_buffer("y", [1,2,3,4], 4, elem_size=8)
[y_0, y_1, y_2, y_3] # Returned buffer contains abstract concolic values...
>>> m.vars.get("y_0")
1 # ... so they can be concretized
# Create a concolic buffer of 1-byte values from concrete bytes
>>> m.vars.new_concolic_buffer("z", b'foo')
[z_0, z_1, z_2]
>>> chr(m.vars.get("z_0"))
'f'
>>> m.vars.get_as_str("z")
b'foo'
```

## Solving symbolic constraints

Now that we have seen how to manipulate symbolic expressions and how to symbolize registers and memory, let's talk about constraints.

Constraints are instances of the Constraint class and represent arithmetic and logical constraints on expressions. Basic constraints can be built using all standard arithmetic comparison operators:

```
a = Var(64, "a")
# Create constraints
a == 100
a != 100
a < 100
a ≤ 100
a > 100
a >= 100
```

Basic constraints can be combined with the built-in `&`

and `|`

operators:

```
(a == 100) & (a > 120)
(a < 100) | (a > 30)
```

Constraints can also be inverted by either using the built-in `~`

operator, or by calling `constraint.invert()`

(both are equivalent):

```
~(a == 100)
(a == 100).invert()
```

Great! Now let's see how we can solve constraints. Solving constraints
means finding values for all the symbolic variables it contains so that the
constraint becomes true. For instance, the constraint `(a == 100)`

has
only one solution for variable `a`

which is `100`

.
The constraint `(a < 100)`

would have a 100 possible solutions for
`a`

(`0, 1, 2, 3, ..., 99`

)

In Maat, constraints can be solved by using a
`Solver`

object. A solver can be created this way:

```
>>> solver = Solver()
```

Then constraints can be added with the `add()`

method:

```
>>> a = Var(64, "a")
>>> b = Var(64, "b")
>>> c = Var(64, "c")
>>> solver.add(a > b)
>>> solver.add(b == c)
>>> solver.add(c == 0x42)
```

Once constraints have been added, we can try to solve them by
calling the `check()`

method. `check()`

returns `True`

if the solver could solve all constraints, and `False`

otherwise:

```
>>> solver.check()
True
```

If `check()`

was succesful, the solution to the constraint(s)
(called a **model**), can be retrieved as a `VarContext`

object
with the `get_model()`

method:

```
>>> model = solver.get_model()
>>> type(model)
<class 'VarContext'>
>>> model
a : 0x43
b : 0x42
c : 0x42
```

If we want to apply a given model to a current `MaatEngine`

instance,
we can use the `update_from()`

method:

```
>>> m.vars.update_from(model)
>>> m.vars
a : 0x43
b : 0x42
c : 0x42
...
```

Finally, after solving the constraints, we can reset the solver and
remove all the constraints it currently contains by
calling `reset()`

:

```
>>> solver.reset()
```

## Cracking a basic challenge

Alright! We have learned everything we need about symbolic values and constraint solving and will now apply it on a concrete example. To conclude this tutorial, we will use Maat to automatically solve a simple crackme binary.

The challenge is called `crackme1`

. Download it
here (the archive contains
the binary along with `ld`

and `libc`

libraries needed
to load and execute it). The binary
asks for a serial of 8 characters and will print `Correct!`

only if the correct serial is entered:

```
$ ./crackme1
Enter the 8-characters serial: abcdefgh
Wrong!
$ ./crackme1
Enter the 8-characters serial: ********
Correct!
```

Our goal will be to find out the correct serial. We could obviously solve the challenge without the source code, but to know what's going on let's take a glance:

```
// crackme1.c
#include <string.h>
#include <stdio.h>
unsigned long long transform(char* str){
unsigned long long res = 0;
for( int i = 0; str[i] != '\0'; i++ ){
res = res << 8;
res = res + ((str[i]*7 - 0x20) & 0xff);
}
return res;
}
int main(int argc, char ** argv){
char serial[256];
// Get serial
printf("Enter the 8-characters serial: ");
if( !scanf("%s", serial))
return 1;
if( strlen(serial) != 8 ){
printf("Serial length must be 8\n");
return 1;
}
// Check serial
if( transform(serial) == 0x378ed80c535a3630 ){
printf("Correct!\n");
}else{
printf("Wrong!\n");
}
return 0;
}
```

We can see that the source code doesn't contain directly the
correct serial value. Instead, the user-supplied serial is processed
by the `transform()`

function which returns an integer
computed from the serial (calling `transform()`

with
different serials will return different integers).

The integer returned by `transform()`

is then checked
against the hardcoded value: `0x378ed80c535a3630`

. If it
is equal, the program deduces that the serial was correct.

Let's use Maat to automatically find the serial that makes
the program print `Correct!`

First of all, we write a script that loads and run the binary. Since the program
reads data from `stdin`

with `scanf`

, we will
also add symbolic data in `stdin`

(the symbolic filesystem will be explained
in details in another tutorial, for now simply trust the example):

```
# solve.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("crackme1", BIN.ELF64, base=0x04000000, libdirs=["."])
# Fill stdin with 8-bytes concolic input
# Note: On a single-process linux environment, stdin handle is always 0
stdin = m.env.fs.get_fa_by_handle(0)
buf = m.vars.new_concolic_buffer(
"input",
b'aaaaaaaa',
nb_elems=8,
elem_size=1,
trailing_value=ord('\n') # concrete new line at the end of the input
)
stdin.write_buffer(buf)
```

Next, we will need to add hooks at the right place in order to find
the input that makes the program print `Correct!`

. To find
the location we are interested in, we open the challenge in our favorite
disassembler and find the branch that follows the input check:

We see that the branch occurs at address `0x040008b1`

. Since the branch depends
on the symbolic input, it corresponds to a *path constraint*. If you remember
our previous tutorial, you know that we can hook the `PATH`

event in order
to instrument this symbolic branch. The goal is to get the branch condition, and then
invert it so that we continue to `0x040008b3`

instead of jumping to `0x040008c1`

.

As an exercice, try to write a `PATH`

event callback that uses a solver to
find an input that doesn't satisfy the branch condition.
The solution is given below:

```
# <previous part of the solve.py script>
# Add hook on symbolic branch
def find_serial(m: MaatEngine):
if m.info.addr == 0x040008b1: # We care only about that particular branch
s = Solver()
s.add(m.info.branch.cond.invert())
if s.check():
model = s.get_model()
print(f"Found serial: {model.get_as_str('input')}")
else:
print("Failed to find serial")
return ACTION.HALT
m.hooks.add(EVENT.PATH, WHEN.BEFORE, callbacks=[find_serial])
# Run analysis
m.run()
```

Running the entire script gives:

```
$ python3 solve.py
Found serial: 1bHt56z0
```

We can validate the solution:

```
$ ./crackme1
Enter the 8-characters serial: 1bHt56z0
Correct!
```

We hope you enjoyed this introduction to dynamic symbolic execution. We showed how it can help to reason about potential user inputs and automatically find inputs that satisfy arbitrary runtime constraints. Of course we just scratched the surface of possible usage for symbolic execution, make sure to checkout our next tutorials to discover other applications and use-cases!