7/18/2014

Regexp Lookahead & Lookbehind


Source: http://denis-zhdanov.blogspot.com/2009/10/regexp-lookahead-and-lookbehind.html
Source: http://www.regular-expressions.info/lookaround.html

Lookahead


Generally speaking 'lookahead' allows to check if subsequent input characters match particular regexp. For example, let's consider word 'murmur' as our input and capitalize all 'r' symbols that are not followed by 'm' symbol (i.e. we expect to get 'murmuR' as the output). Here is a naive approach:

    System.out.println("murmur".replaceAll("r[^m]", "R"));


However, it doesn't perform any change, i.e. 'murmur' is printed. The reason is that 'r[^m]' means that the pattern is 'r' symbol followed by the symbol over than 'm'. There is no symbol after the last 'r' symbol, so, the pattern is not matched.

Here lookahead comes to the rescue - it allows to check if subsequent symbol(s) match to the provided regexp without making matched subsequent character(s) part of the match. It may be 'positive' or 'negative', i.e. allows to define if we're interested in match or unmatch. Here is the example:

public static void main(String[] args) throws Exception {
    // 'Negative' lookahead, 'r' is not followed by 'm'.    System.out.println("murmur".replaceAll("r(?!m)", "R")); // prints 'murmuR'
    // 'Positive' lookahead, 'r' is followed by 'm'.    System.out.println("murmur".replaceAll("r(?=m)", "R")); // prints 'muRmur'
    // It's possible to use regexp as 'lookahead' pattern    System.out.println("murmur".replaceAll("m(?!u[^u]+u)", "M")); // prints 'murMur'}


Lookbehind


'Lookbehind' behaves very similar to 'lookahead' but works backwards. Another difference is that it's possible to define only finite repetition regexp as 'lookbehind' pattern:

public static void main(String[] args) throws Exception {
    // 'Negative' lookbehind, 'c' is not preceded by 'b'.    System.out.println("abcadcaeec".replaceAll("(?<!b)c", "C")); // prints 'abcadCaeeC'
    // 'Positive' lookbehind, 'c' is preceded by 'b'.    System.out.println("abcadcaeec".replaceAll("(?<=b)c", "C")); // prints 'abCadcaeec'
    // It's possible to use finite repetition regexp as 'lookbehind' pattern    System.out.println("abcadcaec".replaceAll("(?<=e{2})c", "C")); // doesn't match because there is no 'c' preceded by two 'e'
    // It's not possible to use regexp that doesn't imply obvious max length as 'lookbehind' pattern    System.out.println("abcadcaec".replaceAll("(?<=a[^a]+ae?)c", "C")); // PatternSyntaxException is thrown here}

No comments:

Post a Comment