Ruby Security Field Guide

Parser

Parsing is a classic (Hard™) computer science problem and a common source of security vulnerabilities. Specifically, YAML recent had a big problem with this as described in CVE-2013-0156. Thus a good question to review is how do parsers work in the first place? Most parsers take on something similar to the below set of logical steps:

Accept a set of rules (and some accompanying actions) and a sequence of tokens. This step is commonly referred to as the Lexical Analysis.
Attempt to match a sequence of tokens against a supplied grammar. Most programming languages can be specified as Context-Free Grammars. Nearly all serialization formats are Regular or Context-Free Grammars.
When the parser encounters a sequence of tokens matching a rule specified in the grammar, it will perform the affiliated action. Generally the goal is to build a tree.

If the above series of steps sounds familiar, it's because this is the common process used for compiling a program. The goal here is typically to build an abstract syntax-tree. If none of this material seems to be making any sense, or you have not had an introductory compilers course, take a look at the 'Additional Resources' section detailed below for some educational material.

Parslet

Parslet is a pure Ruby library (no compiling!) for constructing lexers/parsers with a clean Ruby DSL. It uses Parsing Expression Grammar (or PEG) which is essentially a Context-Free Grammar with no ambiguity, to perform greedy parsing. Most parsers built on Parslet are built in two stages. First, the input is parsed into an intermediate tree (JSON 1 exercise) and, second, the input is transformed from the intermediary tree into a desired representation (JSON 2 exercise).

Parslet Pointers

More information is detailed in the Parslet Getting Started doc.

Parslet::Parser

Specify rules
- rule(name) { definition }
Define the root rule:
- root :name
str(...) matches a literal string
repeat is equivalent of regex*
repeat(1) is equivalent to regex+
repeat(1,5) is equivalent to regex{1,2}
match(...) matches data against the specified regular expression
match['a-z'] is shorthand for match('[a-z'])
| denotes union and >> denotes concatenation

Parslet::Transform

Specify rules (different meaning here)
- rule(pattern) { action }
Parslet automatically walks the intermediary tree for you and performs action whenever it encounters pattern
simple(:name) matches a single String value.
sequence(:name) matches an Array of String values.
subtree(:name) matches a Hash within the Parslet tree.