Parsing is a classic (Hard™) computer science problem and a common source of security vulnerabilities. Specifically, YAML recent had a big problem with this as described in CVE-2013-0156. Thus a good question to review is how do parsers work in the first place? Most parsers take on something similar to the below set of logical steps:
If the above series of steps sounds familiar, it's because this is the common process used for compiling a program. The goal here is typically to build an abstract syntax-tree. If none of this material seems to be making any sense, or you have not had an introductory compilers course, take a look at the 'Additional Resources' section detailed below for some educational material.
Parslet is a pure Ruby library (no compiling!) for constructing lexers/parsers with a clean Ruby DSL. It uses Parsing Expression Grammar (or PEG) which is essentially a Context-Free Grammar with no ambiguity, to perform greedy parsing. Most parsers built on Parslet are built in two stages. First, the input is parsed into an intermediate tree (JSON 1 exercise) and, second, the input is transformed from the intermediary tree into a desired representation (JSON 2 exercise).
More information is detailed in the Parslet Getting Started doc.
rule(name) { definition }root :namestr(...) matches a literal stringrepeat is equivalent of regex*repeat(1) is equivalent to regex+repeat(1,5) is equivalent to regex{1,2}match(...) matches data against the specified regular expressionmatch['a-z'] is shorthand for match('[a-z'])| denotes union and >> denotes concatenationrule(pattern) { action }action whenever it encounters patternsimple(:name) matches a single String value.sequence(:name) matches an Array of String values.subtree(:name) matches a Hash within the Parslet tree.