Let's parse the search path grammar! The USP (User Services Platform) committee has given us a BNF in their pdf that describes the object path model (including search path) using a context-free language.
This is how the BNF of Path Objects looks like:
idmpath ::= objpath | parampath | cmdpath | evntpath | searchpath objpath ::= name '.' (name (('.' inst)|(reffollow '.' name) )? '.')* parampath ::= objpath name cmdpath ::= objpath name '()' evntpath ::= objpath name '!' inst ::= posnum | expr | '*' expr ::= '[' (exprcomp ( '&&' exprcomp )*) ']' exprcomp ::= relpath oper value relpath ::= name (reffollow? '.' name )* reffollow ::= ( '#' (posnum | '*') '+' )| '+' oper ::= '==' | '!=' | '<' | '>' | '<=' | '>=' value ::= literal | number name ::= [A-Za-z_] [A-Za-z_0-9]* literal ::= '"' [^"]* '"' posnum ::= [1-9] [0-9]* number ::= '0' | ( '-'? posnum )
I've interpreted a searchpath based on the HTML specification, which unfortunately is not defined above, as follows:
searchpath ::= objpath name?
The pyparsing PEG language looks like this:
# Helpers. alphas = pyparsing.alphas.upper() + pyparsing.alphas.lower() dot = Suppress(Char('.')) # Basic stuff such as names and operators. name = Word(alphas + '_', alphas + pyparsing.nums + '_') literal = Char('"') + CharsNotIn('"') + Char('"') posnum = Word(pyparsing.nums[1:], pyparsing.nums) number = Char('0') | (Optional(Char('-')) + posnum) value = literal | number oper = Literal('==') | Literal('!=') | Literal('<') | Literal('>') | Literal('<=') | Literal('>=') # The real stuff - references, expressions, paths. reffollow = (Char('#') + (posnum | Char('*')) + Char('+')) | Char('+') relpath = name + ZeroOrMore(Optional(reffollow) + dot + name) exprcomp = relpath + oper + value expr = Char('[') + exprcomp + ZeroOrMore(Literal('&&') + exprcomp) + Char(']') inst = posnum | expr | Char('*') objpath = name + dot + ZeroOrMore(name + Optional((dot + inst) | (reffollow + dot + name)) + dot) parampath = objpath + name searchpath = objpath + Optional(name)
Note: PEG languages are not equivalent to context-free languages. In fact they are less powerful, since they are more strict in terms of ambiguity. E.g. the following rule cannot be parsed by a PEG:
S ::= 'x' S 'x' | 'x'
So why not simply do a Regex? BNFs describe context-free languages and often
cannot be implemented using a regular language. Specifically, context-free
languages should strictly contain context-free rules only (i.e. the left side
of each rule in the BNF further above has only one variable) and they can have
recursion on both sides. Regular languages do not allow that and are even more
strict - they can only contain a rule that ends in a terminal (e.g. a
character) or a terminal combined with a variable (given terminal
Ba. In above example it could be
'*' posnum or
For instance, the rule for exprcomp as it stands is not allowed in regular
languages, since it contains three variables. Now, these rules can still
possibly be converted into a regular language by enumerating all possible
combinations of a variable and a terminal. E.g. oper consists of 6 possible
values, so why not add 6 rules
a → '==' value,
a → '>=' value, and so on.
However, this is a lot of work. Let's just stay context-free and use pyparsing!
There are lemmas to prove that a language is not regular such as the Pumping Lemma. However, I haven't been able to find a contradiction (i.e. any word in the above language can be pumped). This does not mean it is regular, so it is not a proof, but I have a gut feeling the above language is in fact regular. One way to actually prove this would be to create a finite automaton, i.e. a DFA or NFA from the BNF. Another way could be to try to convert every single rule into a left-/right-regular rule. If this is successful, it is indeed a regular language.