Support escaping in regular expression replacement

by Arnold Daniels on 01/9/2009

Simple replacement
String replacement is often used as a way to apply templating. You might replace “%a:test” with “~~test~~” using the regexp: %a:(\w+), replacing it with “~~$1~~”.

Trying to escape
The only problem now, is that I can’t use “%a:” any more within my string. This could be solved by allowing escaping using the backslash. In the regexp we can use a negative lookbehind to see if the character before the % isn’t a backslash: (?<!\\)%a:(\w+).

Escaping the escaping
Now we’re close, however now it’s not possible to use “\%:a” anywhere. We need to be able to escape the backslash as well. We could state the problem as needing to match %a if there isn’t an uneven number of backslashes in front of it. Checking for an uneven number in a negative lookbehind isn’t possible unfortunately, so we need to get the backslashes into the match. We can say: match 0 or more pairs of backslashes, followed by “%a:”, if there is no backslash in front of it. This results in the regexp:
(?<!\\)((?:\\{2})*+)%a:(\w+), replacing it for “$1~~$2~~”.

To finish up
To only thing is that \% and \\ will still be displayed as that. This can simply be solved with a str_replace.

Arnold Daniels

I've spend a big part of my life behind a computer, learning about databases (MySQL), programming (PHP) and system administration (Linux). Currently I playing with HTML5, jquery and node.js.

E-mailTwitterLinkedInGithubGittip

There are 5 comments in this article:

  1. 9 January 2009Jordi Boggiano says:

    Just a small note, although it doesn’t change anything, in the final regex, “*+” an useless +.

    ReplyReply
  2. 12 January 2009Arnold Daniels says:

    Hi Jordi,

    The + isn’t useless, it’s there to increase performance. When the expression fails because of an uneven number of backslashes, it doesn’t need to try the expression for each backslash, because they all will fail. Using a possessive quantifier (*+ or ++) prevents this.

    ReplyReply
  3. 11 October 2009Dave says:

    Thanks for the really helpful post, this is going to come in very handy. However, I think the negative lookbehind should have the question mark before the less than symbol.

    http://www.regular-expressions.info/lookaround.html

    ReplyReply
  4. 12 October 2009Arnold Daniels says:

    You seem to be right Dave. I’ve changed in in the article.

    ReplyReply

Write a comment: