Parsing is a classic (Hard™) computer science problem and a common source of security vulnerabilities. Specifically, YAML recent had a big problem with this as described in CVE-2013-0156. Thus a good question to review is how do parsers work in the first place? Most parsers take on something similar to the below set of logical steps:
If the above series of steps sounds familiar, it's because this is the common process used for compiling a program. The goal here is typically to build an abstract syntax-tree. If none of this material seems to be making any sense, or you have not had an introductory compilers course, take a look at the 'Additional Resources' section detailed below for some educational material.
Parslet is a pure Ruby library (no compiling!) for constructing lexers/parsers with a clean Ruby DSL. It uses Parsing Expression Grammar (or PEG) which is essentially a Context-Free Grammar with no ambiguity, to perform greedy parsing. Most parsers built on Parslet are built in two stages. First, the input is parsed into an intermediate tree (JSON 1 exercise) and, second, the input is transformed from the intermediary tree into a desired representation (JSON 2 exercise).
More information is detailed in the Parslet Getting Started doc.
rule(name) { definition }
root :name
str(...)
matches a literal stringrepeat
is equivalent of regex*
repeat(1)
is equivalent to regex+
repeat(1,5)
is equivalent to regex{1,2}
match(...)
matches data against the specified regular expressionmatch['a-z']
is shorthand for match('[a-z'])
|
denotes union and >>
denotes concatenationrule(pattern) { action }
action
whenever it encounters pattern
simple(:name)
matches a single String value.sequence(:name)
matches an Array of String values.subtree(:name)
matches a Hash within the Parslet tree.