Regex Module

Iñaki Baz Castillo

   <ibc@aliax.net>

Edited by

Iñaki Baz Castillo

   <ibc@aliax.net>

   Copyright © 2009 Iñaki Baz Castillo
     __________________________________________________________________

   Table of Contents

   1. Admin Guide

        1. Overview
        2. Dependencies

              2.1. Kamailio Modules
              2.2. External Libraries or Applications

        3. Parameters

              3.1. file (string)
              3.2. max_groups (int)
              3.3. group_max_size (int)
              3.4. pcre_caseless (int)
              3.5. pcre_multiline (int)
              3.6. pcre_dotall (int)
              3.7. pcre_extended (int)

        4. Functions

              4.1. pcre_match (string, pcre_regex)
              4.2. pcre_match_group (string [, group])

        5. MI Commands

              5.1. regex_reload

        6. Installation and Running

              6.1. File format

   List of Examples

   1.1. Set file parameter
   1.2. Set max_groups parameter
   1.3. Set group_max_size parameter
   1.4. Set pcre_caseless parameter
   1.5. Set pcre_multiline parameter
   1.6. Set pcre_dotall parameter
   1.7. Set pcre_extended parameter
   1.8. pcre_match usage (forcing case insensitive)
   1.9. pcre_match usage (using "end of line" symbol)
   1.10. pcre_match_group usage
   1.11. pcre_match_group usage (using a pseudo-variable as group)
   1.12. regex file
   1.13. Using with pua_usrloc
   1.14. Incorrect groups file

Chapter 1. Admin Guide

   Table of Contents

   1. Overview
   2. Dependencies

        2.1. Kamailio Modules
        2.2. External Libraries or Applications

   3. Parameters

        3.1. file (string)
        3.2. max_groups (int)
        3.3. group_max_size (int)
        3.4. pcre_caseless (int)
        3.5. pcre_multiline (int)
        3.6. pcre_dotall (int)
        3.7. pcre_extended (int)

   4. Functions

        4.1. pcre_match (string, pcre_regex)
        4.2. pcre_match_group (string [, group])

   5. MI Commands

        5.1. regex_reload

   6. Installation and Running

        6.1. File format

1. Overview

   This module offers matching operations against regular expressions
   using the powerful PCRE library.

   A text file containing regular expressions categorized in groups is
   compiled when the module is loaded, storing the compiled PCRE objects
   in an array. A function to match a string or pseudo-variable against
   any of these groups is provided. The text file can be modified and
   reloaded at any time via a MI command. The module also offers a
   function to perform a PCRE matching operation against a regular
   expression provided as function parameter.

   For a detailed list of PCRE features read the man page of the library.

2. Dependencies

   2.1. Kamailio Modules
   2.2. External Libraries or Applications

2.1. Kamailio Modules

   The following modules must be loaded before this module:
     * No dependencies on other Kamailio modules.

2.2. External Libraries or Applications

   The following libraries or applications must be installed before
   running Kamailio with this module loaded:
     * libpcre - the libraries of PCRE.

3. Parameters

   3.1. file (string)
   3.2. max_groups (int)
   3.3. group_max_size (int)
   3.4. pcre_caseless (int)
   3.5. pcre_multiline (int)
   3.6. pcre_dotall (int)
   3.7. pcre_extended (int)

3.1. file (string)

   Text file containing the regular expression groups. It must be set in
   order to enable the group matching function.

   Default value is “NULL”.

   Example 1.1. Set file parameter
...
modparam("regex", "file", "/etc/kamailio/regex_groups")
...

3.2. max_groups (int)

   Max number of regular expression groups in the text file.

   Default value is “20”.

   Example 1.2. Set max_groups parameter
...
modparam("regex", "max_groups", 40)
...

3.3. group_max_size (int)

   Max content size of a group in the text file.

   Default value is “8192”.

   Example 1.3. Set group_max_size parameter
...
modparam("regex", "group_max_size", 16384)
...

3.4. pcre_caseless (int)

   If this options is set, matching is done caseless. It is equivalent to
   Perl's /i option, and it can be changed within a pattern by a (?i) or
   (?-i) option setting.

   Default value is “0”.

   Example 1.4. Set pcre_caseless parameter
...
modparam("regex", "pcre_caseless", 1)
...

3.5. pcre_multiline (int)

   By default, PCRE treats the subject string as consisting of a single
   line of characters (even if it actually contains newlines). The "start
   of line" metacharacter (^) matches only at the start of the string,
   while the "end of line" metacharacter ($) matches only at the end of
   the string, or before a terminating newline.

   When this option is set, the "start of line" and "end of line"
   constructs match immediately following or immediately before internal
   newlines in the subject string, respectively, as well as at the very
   start and end. This is equivalent to Perl's /m option, and it can be
   changed within a pattern by a (?m) or (?-m) option setting. If there
   are no newlines in a subject string, or no occurrences of ^ or $ in a
   pattern, setting this option has no effect.

   Default value is “0”.

   Example 1.5. Set pcre_multiline parameter
...
modparam("regex", "pcre_multiline", 1)
...

3.6. pcre_dotall (int)

   If this option is set, a dot metacharater in the pattern matches all
   characters, including those that indicate newline. Without it, a dot
   does not match when the current position is at a newline. This option
   is equivalent to Perl's /s option, and it can be changed within a
   pattern by a (?s) or (?-s) option setting.

   Default value is “0”.

   Example 1.6. Set pcre_dotall parameter
...
modparam("regex", "pcre_dotall", 1)
...

3.7. pcre_extended (int)

   If this option is set, whitespace data characters in the pattern are
   totally ignored except when escaped or inside a character class.
   Whitespace does not include the VT character (code 11). In addition,
   characters between an unescaped # outside a character class and the
   next newline, inclusive, are also ignored. This is equivalent to Perl's
   /x option, and it can be changed within a pattern by a (?x) or (?-x)
   option setting.

   Default value is “0”.

   Example 1.7. Set pcre_extended parameter
...
modparam("regex", "pcre_extended", 1)
...

4. Functions

   4.1. pcre_match (string, pcre_regex)
   4.2. pcre_match_group (string [, group])

4.1.  pcre_match (string, pcre_regex)

   Matches the given string parameter against the regular expression
   pcre_regex, which is compiled in runtime into a PCRE object. Returns
   TRUE if it matches, FALSE otherwise.

   Meaning of the parameters is as follows:
     * string - String or pseudo-variable to compare.
     * pcre_regex - Regular expression to be compiled in a PCRE object. It
       can be a string or pseudo-variable.

   NOTE: To use the "end of line" symbol '$' in the pcre_regex parameter
   use '$$'.

   This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
   ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.

   Example 1.8.  pcre_match usage (forcing case insensitive)
...
if (pcre_match("$ua", "(?i)^twinkle")) {
    xlog("L_INFO", "User-Agent matches\n");
}
...

   Example 1.9.  pcre_match usage (using "end of line" symbol)
...
if (pcre_match("$rU", "^user[1234]$$")) {  # Will be converted to "^user[1234]$"
    xlog("L_INFO", "RURI username matches\n");
}
...

4.2.  pcre_match_group (string [, group])

   Tries to match the given string against a specific group in the text
   file (see Section 6.1, “File format”). Returns TRUE if it matches,
   FALSE otherwise.

   Meaning of the parameters is as follows:
     * string - String or pseudo-variable to compare.
     * group - Number of group to use in the operation. If not specified
       then 0 (the first group) is used. A pseudo-variable containing an
       integer can also be used.

   This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
   ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.

   Example 1.10.  pcre_match_group usage
...
if (pcre_match_group("$rU", "2")) {
    xlog("L_INFO", "RURI username matches group 2\n");
}
...

   Example 1.11.  pcre_match_group usage (using a pseudo-variable as
   group)
...
$avp(i:10) = 5;  # Maybe got from a DB query.
if (pcre_match_group("$ua", "$avp(i:10)")) {
    xlog("L_INFO", "User-Agent matches group 5\n");
}
...

5. MI Commands

   5.1. regex_reload

5.1.  regex_reload

   Causes regex module to re-read the content of the text file and
   re-compile the regular expressions. The number of groups in the file
   can be modified safely.

   Name: regex_reload

   Parameters: none

   MI FIFO Command Format:
:regex_reload:_reply_fifo_file_
_empty_line_

6. Installation and Running

   6.1. File format

6.1. File format

   The file contains regular expressions categorized in groups. Each group
   starts with "[number]" line. Lines starting by space, tab, CR, LF or #
   (comments) are ignored. Each regular expression must take up just one
   line, this means that a regular expression can't be splitted in various
   lines.

   An example of the file format would be the following:

   Example 1.12. regex file
### List of User-Agents publishing presence status
[0]

# Softphones
^Twinkle/1
^X-Lite
^eyeBeam
^Bria
^SIP Communicator
^Linphone

# Deskphones
^Snom

# Others
^SIPp
^PJSUA


### Blacklisted source IP's
[1]

^190\.232\.250\.226$
^122\.5\.27\.125$
^86\.92\.112\.


### Free PSTN destinations in Spain
[2]

^1\d{3}$
^((\+|00)34)?900\d{6}$

   The module compiles the text above to the following regular
   expressions:
group 0: ((^Twinkle/1)|(^X-Lite)|(^eyeBeam)|(^Bria)|(^SIP Communicator)|
          (^Linphone)|(^Snom)|(^SIPp)|(^PJSUA))
group 1: ((^190\.232\.250\.226$)|(^122\.5\.27\.125$)|(^86\.92\.112\.))
group 2: ((^1\d{3}$)|(^((\+|00)34)?900\d{6}$))

   The first group can be used to avoid auto-generated PUBLISH (pua_usrloc
   module) for UA's already supporting presence:

   Example 1.13. Using with pua_usrloc
route[REGISTER] {
    if (! pcre_match_group("$ua", "0")) {
        xlog("L_INFO", "Auto-generated PUBLISH for $fu ($ua)\n");
        pua_set_publish();
    }
    save("location");
    exit;
}

   NOTE: It's important to understand that the numbers in each group
   header ([number]) must start by 0. If not, the real group number will
   not match the number appearing in the file. For example, the following
   text file:

   Example 1.14. Incorrect groups file
[1]
^aaa
^bbb

[2]
^ccc
^ddd

   will generate the following regular expressions:
group 0: ((^aaa)|(^bbb))
group 1: ((^ccc)|(^ddd))

   Note that the real index doesn't match the group number in the file.
   This is, compiled group 0 always points to the first group in the file,
   regardless of its number in the file. In fact, the group number
   appearing in the file is used for nothing but for delimiting different
   groups.

   NOTE: A line containing a regular expression cannot start by '[' since
   it would be treated as a new group. The same for lines starting by
   space, tab, or '#' (they would be ignored by the parser). As a
   workaround, using brackets would work:
[0]
([0-9]{9})
( #abcde)
( qwerty)