Path exploration

Path exploration is a very common task performed by symbolic executors. It consists in symbolically executing many possible code paths of a program. It allows to visit various possible states for the program and systematically check if they fulfil certain conditions. Path exploration can be used to help verify software against desired properties, or detect runtime vulnerabilities.

The philosophy behind Maat is to provide a multi-purpose framework rather than a monolithic tool, so there is no built-in path exploration API that works out-of-the-box. However, as this tutorial will show, different exploration strategies can be implemented with very simple scripts.

In this tutorial we show two ways of exploring all possible program paths with Maat, and use them to solve a reserve-engineering challenge. We also introduce Maat's snapshoting and serialization capabilities, upon whom the path exploration strategies rely.

Exploration using snapshots
Exploration using state serialization
Which exploration strategy should I choose?

Have fun!

Exploration using snapshots

Maat offers the possibility to take snapshots of the symbolic engine. A snapshot is a backtracking point that can be set at any time during execution. After a snapshot is taken, execution resumes normally, but at any time it is possible to "go back in time" and restore the snapshot, which resets the engine to the state it had when the snapshot was taken.

It is important to understand that snapshots are not copies of the symbolic engine. They are backtracking points, and are hence incremental. This means that if we take a first snapshot A and later a second snapshot B, it is possible to restore snapshot B without deleting snapshot A, but restoring snapshot A will delete snapshot B, because at point A the snapshot B didn't exist yet.

To take a snapshot, we can use the take_snapshot() method. It returns a unique ID that can later be used to restore that particular snapshot:

from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
snap = m.take_snapshot()

Restoring a snapshot is done by calling restore_snapshot(). This method takes two optional arguments:

snapshot_id: the snapshot to restore, as returned by take_snapshot(). If not specified, it restores the most recent snapshot
remove: whether to remove the snapshot after restoring it

# Restore snapshot 'snap'
m.restore_snapshot(snap)

# Restore last snapshot
m.restore_snapshot()

# Restore last snapshot and remove it
m.restore_snapshot(remove=True)

Good! Now let's try some snapshots on today's challenge named crackme2. You can download the binary and libraries here. The binary takes a password as argument and prints Correct password! if the supplied password is the correct one.

$ ./crackme2 aaaaaaaaaaaaaaa 
Wrong password!

$ ./crackme2 ****************
Correct password!

Before diving into path exploration, let's illustrate snapshoting with the script below. It takes a snapshot just after the program is loaded, executes it one time, then restores the snapshot and runs it a second time:

# test_snapshot.py
from maat import *
m = MaatEngine(ARCH.X64, OS.LINUX)
m.load("./crackme2", BIN.ELF64, base=0x04000000, args=[b'aaaaaaaaaaaaa'], libdirs=["."])

m.take_snapshot()
m.run()
m.restore_snapshot()
m.run()

When we run the script, we indeed see that the program is executed two times and prints Wrong password! twice.

$ python3 test_snapshot.py
[Info] Adding object 'ld-linux-x86-64.so.2' to virtual fs at '/usr/lib/ld-linux-x86-64.so.2'
[Info] Adding object 'libc.so.6' to virtual fs at '/usr/lib/libc.so.6'
[Info] Adding object 'crackme2' to virtual fs at '/crackme2'
Wrong password! Wrong password!

Alright, now let's get serious and actually write a script that explores all paths in crackme2, and use it to find the correct password!

The idea for the exploration algorithm is to explore execution paths using a depth-first-search approach:

Run until we find a symbolic branch
When a symbolic branch is found, take a snapshot and continue execution
Detect success or failure:
- If we reach the code that prints Good password! we can stop exploring
- Otherwise if we reach the code that prints Wrong password!, we go back to the lastest symbolic branch by restoring the latest snapshot, and invert the branch condition. This way when resuming execution the program will take the path that hadn't yet been explored
Go back to step 1 and repeat, until the correct password is found, or until all paths have been explored

Using any disassembler, we can easily find the addresses that are jumped to on password verification success or failure. 0x7c6 is reached on success, and 0x7d9 on failure:

Now we do have every information we need to implement the exploration strategy described above. As an exercice, try to implement it on your own, and then compare with the solution given below.

# explore_dfs.py
from maat import *

BASE = 0x04000000 # address where to load crackme2
SUCCESS_ADDR = BASE + 0x7c6 # address we want to reach
FAILURE_ADDR = BASE + 0x7d9 # address we don't want to reach
INPUT_LEN = 50 # length of the input

# Load crackme2 into a symbolic engine, with a concolic argument
m = MaatEngine(ARCH.X64, OS.LINUX)
args = [
    m.vars.new_concolic_buffer("input", b'a'*INPUT_LEN, INPUT_LEN)
]
m.load("./crackme2", BIN.ELF64, base=BASE, args=args, libdirs=["."])

## Main exploration loop
def explore(m: MaatEngine):
    global snapshot_next
    # We keep trying new paths as long as execution is stopped by reaching
    # SUCCESS_ADDR or FAILURE_ADDR
    while m.run() == STOP.HOOK:
        # If we found the path for the correct password, print it and stop exploring
        if m.info.addr == SUCCESS_ADDR:
            print(f"Found correct password: {m.vars.get_as_str('input')}")
            return

        # Otherwise, restore previous snapshots until we find a branch condition
        # that can successfuly be inverted to explore a new path 
        while True:
            # Restore latest snapshot. This brings us back to the last
            # symbolic branch that was taken
            m.restore_snapshot(remove=True)
            # Use the solver to invert the branch condition, and find an
            # input that takes the other path
            s = Solver()
            # We start by adding all constraints that led to the current path.
            # Those constraints need to be preserved to ensure that the new input
            # we compute will still reach the current branch.
            # Since the snapshots are taken *before* branches are resolved,
            # m.path.constraints() doesn't contain the current branch as a constraint.
            for c in m.path.constraints():
                s.add(c)
            if m.info.branch.taken:
                # If branch was previously taken, we negate the branch condition
                # so that this time it is not taken 
                s.add(m.info.branch.cond.invert())
            else:
                # If the branch was not previously taken, we solve the branch condition
                # so that this time it is taken
                s.add(m.info.branch.cond)
            # If the solver found a model that inverts the current branch, apply this model
            # to the current symbolic variables context and continue exploring the next path! 
            if s.check():
                m.vars.update_from(s.get_model())
                # When successfully inverting a branch, we set snapshot_next to False. We do
                # this to avoid taking yet another snapshot of the current branch when 
                # resuming execution. We just inverted the branch, which means that one of
                # both possibilities (taken and not taken) has been explored already, and
                # that the other will get explored now. So there is no need to take a 
                # snapshot to go back to that particular branch.
                snapshot_next = False
                break

# Flag telling whether we should take a snapshot on the next symbolic branch
snapshot_next = True
## Callback to be executed on every symbolic branch
def path_cb(m: MaatEngine):
    global snapshot_next
    if snapshot_next:
        m.take_snapshot()
    # We can skip only one branch when we just inverted it, but then
    # we want to take snapshots for the next ones
    snapshot_next = True

## Exploration hooks
m.hooks.add(EVENT.PATH, WHEN.BEFORE, name="path", callbacks=[path_cb])
m.hooks.add(EVENT.EXEC, WHEN.BEFORE, name="fail", filter=FAILURE_ADDR)
m.hooks.add(EVENT.EXEC, WHEN.BEFORE, name="success", filter=SUCCESS_ADDR)

# Run exploration
explore(m)

Running this script should explore all possible paths until it finds one that reaches the code that prints Good password!, and then gives us the concrete input that triggered that path, which is the correct password:

$ python3 explore_dfs.py
...
Found correct password: ZaHyu_TdbaVxIka_sJ

Voilà!

Exploration using state serialization

Maat allows to serialize a symbolic engine. It consists in dumping the whole symbolic state of the engine in a file on the disk. This file can later be loaded again in a fresh symbolic engine, and execution can continue seemlessly.

The easiest way to serialize and deserialize states for a symbolic engine is through the SimpleStateManager class. SimpleStateManager is a helper that manages a queue of symbolic states. It can be used to dump/load states from/into an engine very easily:

m = MaatEngine(ARCH.X64, OS.LINUX)
# Serialized states will be stored in '/tmp/'
state_manager = SimpleStateManager("/tmp")

# Save the current engine state
state_manager.enqueue_state(m)

# Load a set from the queue into the engine
state_manager.dequeue_state(m)

How can we use state serialization to perform path exploration? The idea for the exploration algorithm is actually even simpler than the one we implemented with snapshots. Instead of exploring paths in a depth-first-search approach, we can use a breadth-first-search algorithm:

Run until we find a symbolic branch
When a symbolic branch is found:
1. serialize the current state
2. use the solver to invert the branch condition, and continue exploring the path that was not serialized
Detect success or failure:
- If we reach the code that prints Good password! we can stop exploring
- Otherwise if we reach the code that prints Wrong password!, load the next pending state and go back to step 1

As an exercice, try to implement this exploration method using the SimpleStateManager, and compare with the solution below:

# explore_bfs.py
from maat import *

BASE = 0x04000000 # Address where to load crackme2
SUCCESS_ADDR = BASE + 0x7c6
FAILURE_ADDR = BASE + 0x7d9
INPUT_LEN = 50 # Length of the concolic input

# Load crackme2 into a symbolic engine, with a concolic argument
m = MaatEngine(ARCH.X64, OS.LINUX)
args = [
    m.vars.new_concolic_buffer("input", b'a'*INPUT_LEN, INPUT_LEN)
]
m.load("./crackme2", BIN.ELF64, base=BASE, args=args, libdirs=["."])

# Create a global state manager that stores states in /tmp/
state_manager = SimpleStateManager("/tmp")

## Main exploration loop
def explore(m: MaatEngine):
    global save_next
    global state_manager

    # We keep exploring new paths as long as execution it stopped by reaching
    # SUCCESS_ADDR or FAILURE_ADDR
    while m.run() == STOP.HOOK:
        # If we found the path for the correct password, print it and stop exploring
        if m.info.addr == SUCCESS_ADDR:
            print(f"Found correct password: {m.vars.get_as_str('input')}")
            return

        # Load the next state to explore into the engine.
        # dequeue_state() returns False is there are no more states to load
        if not state_manager.dequeue_state(m):
            break

        # When reloading a state, we set save_next to False. We do this
        # to avoid serialising the same state again when resuming execution.
        # The state we just loaded was about to branch, which means that one of both
        # possibilities (taken and not taken) has been explored already, and that the
        # other will get explored now. So there is no need to save that state again.
        save_next = False

# Flag telling whether we should take a snapshot on the next symbolic branch
save_next = True
## Callback to be executed on every symbolic branch
def path_cb(m: MaatEngine):
    global save_next
    global state_manager

    # If we can successfully invert the condition of the current
    # branch, we save the current state for later exploration, and
    # take the other path first
    if save_next:
        # Check if we can invert the branch condition
        s = Solver()
        for c in m.path.constraints():
            s.add(c)
        if m.info.branch.taken:
            s.add(m.info.branch.cond.invert())
        else:
            s.add(m.info.branch.cond)
        if s.check():
            # We found a model that inverts the branch condition
            model = s.get_model()
            # Save current state for later execution
            state_manager.enqueue_state(m)
            # Update symbolic variables to explore the other path first
            m.vars.update_from(model)

    # We can skip only one branch when we just loaded a state, but then
    # we want to take save the next ones
    save_next = True

# Exploration hooks
m.hooks.add(EVENT.PATH, WHEN.BEFORE, name="path", callbacks=[path_cb])
m.hooks.add(EVENT.EXEC, WHEN.BEFORE, name="fail", filter=FAILURE_ADDR)
m.hooks.add(EVENT.EXEC, WHEN.BEFORE, name="success", filter=SUCCESS_ADDR)

# Explore
explore(m)

Now we just have to run the script to verify that it founds the correct password:

$ python3 explore_bfs.py
  ...
  Found correct password: ZaHyu_TdbaVxIka_sJ

Which exploration strategy should I choose?

This tutorial showed two ways of exploring programs with Maat, one using snapshots and the other using serialization. But which one is the best? As always, the answer is it-depends. Each option comes with advantages and drawbacks.

Snapshoting stores all state modifications that occured since the last snapshot as dynamic data in the RAM. Restoring a snapshot simply means rewinding all the recorded state modifications. The main advantage is that the engine saves only what is modified (typically CPU and memory) and has no need of copying large memory areas or symbolic files if they aren't written to. The drawback is an increased load put on the RAM, which could become impractical on very long executions.

Serialization is the opposite of snapshoting. It stores the whole state in a file on the disk. The advantages of serialization are that disk usage scales better than RAM usage, so with serialization limited resources are less likely to become a problem. It also doesn't put any additional Load on the RAM. Last but not least, serializing states on the disk allows parallelising exploration by using several explorer processes or threads that share a common pending states pool. The main downside of serialization is that serializing a state is a costly operation, since the whole state must be written into a file, which includes the whole process's memory and all symbolic files contents.

The following table summarizes differences between snapshoting and serialization for path exploration:

#	Snapshots	Serialization
Saving a state	Taking a snapshot is instantaneous. Once at least one snapshot is taken, a small execution overhead is induced, because the engine starts recording state modifications	Serializing a state takes a lot of time because the entire state is written into a file on the disk
Restoring a state	Restoring a snapshot is usually fast, but the cost is proportional to the number of state modifications that happened since the snapshot to restore. It can thus become significant on large execution traces	Deserializing a state takes a lot of time because the entire state must be loaded from a file on the disk, and all serialized objects must be recreated in dynamic memory
RAM usage	Usage is usually proportional to the number of instructions symbolically executed after a snapshot is taken	No additional RAM is used
Disk usage	No additional disk space is used	Uses the disk to store serialized states. However available disk space is unlikely to become a problem
Parallelism (multiprocess & multithreading)	Not possible	Possible

Given a specific use-case and available resources, the above table should contain enough information to help you choose between snapshot-based or serialization-based exploration.

Following up our introduction to dynamic symbolic execution, this tutorial focused on snapshoting and serialization features, and showed how they can be used to explore multiple possible paths in a program. This opens the door to more advanced binary analysis stuff, and cool applications, which we will cover in next tutorials.