src/modules/db_cassandra/README
c0a92201
 DB Cassandra Module
 
 Anca Vamanu
 
    <anca.vamanu@1and1.ro>
 
18c5f3fe
 Boudewyn Ligthart
 
    <bligthart@btlnet.co.uk>
 
c0a92201
 Edited by
 
 Anca Vamanu
 
    <anca.vamanu@1and1.ro>
 
c3987b2e
    Copyright © 2012 1&1 Internet AG
c0a92201
      __________________________________________________________________
 
    Table of Contents
 
    1. Admin Guide
 
         1. Overview
         2. Dependencies
 
               2.1. SIP Router Modules
               2.2. External Libraries or Applications
 
         3. Parameters
 
               3.1. schema_path (string)
 
         4. Functions
         5. Installation
         6. Table schema
         7. Limitations
 
    List of Examples
 
    1.1. Set schema_path parameter
 
 Chapter 1. Admin Guide
 
    Table of Contents
 
    1. Overview
    2. Dependencies
 
         2.1. SIP Router Modules
         2.2. External Libraries or Applications
 
    3. Parameters
 
         3.1. schema_path (string)
 
    4. Functions
    5. Installation
    6. Table schema
    7. Limitations
 
 1. Overview
 
b6a0d86b
    Note: the module requires old version of external library, not
    compiling with those available out of the stock in the Linux
    distributions. It is going to be kept for a while in case someone wants
    to pick it up and upgrade. Also, the module was never extensively
    tested, therefore take the appropriate actions in case you plan to use
    it.
 
e1306157
    Db_cassandra is one of the Kamailio database modules. It does not
c0a92201
    export any functions executable from the configuration scripts, but it
e1306157
    exports a subset of functions using the database API, and thus, other
c0a92201
    modules can use it as a database driver, instead of, for example, the
    Mysql module.
 
    The storage backend is a Cassandra cluster and this module provides an
    SQL interface to be used by other modules for storing and retrieving
    data. Because Cassandra is a NoSQL distributed system, there are
    limitations on the operations that can be performed. The limitations
401a81e1
    concern the indexes on which queries are performed, as it is only
    possible to have simple conditions (equality comparison only) and only
    two indexing levels. These issues will be explained in an example
    below.
c0a92201
 
e1306157
    Cassandra DB is especially suited for storing large amounts of data or
    data that requires distribution, redundancy or replication. One usage
    example is a distributed location system in a platform that has a
    cluster of SIP Router servers, with several proxies and registration
    servers accessing the same location database. This was actually the
    main use case we had in mind when implementing this module. Please NOTE
    that it has only been tested with the usrloc, auth_db and domain
    modules.
c0a92201
 
    You can find a configuration file example for this usage in the module
    - kamailio_cassa.cfg.
 
    Because the module has to do the translation from SQL to Cassandra
401a81e1
    NoSQL queries, the schemas for the tables must be known by the module.
c0a92201
    You will find the schemas for location, subscriber and version tables
    in utils/kamctl/dbcassandra directory. You have to provide the path to
    the directory containing the table definitions by setting the module
29a87c81
    parameter schema_path.
 
    There is no need to configure a table metadata in Cassandra cluster.
    You only need to define a keyspace with the name of the database and
    for each table a column family inside that keyspace with the name of
    the table. The comparator and validators should be either UTF8Type or
    ASCIIType. Example:
    ...
c3987b2e
    create keyspace kamailio;
    use kamailio;
29a87c81
    create column family 'location' with comparator='UTF8Type' and
 default_validation_class='UTF8Type' and key_validation_class='UTF8Type';
    ...
c0a92201
 
401a81e1
    Special attention was given to performance in Cassandra. Therefore, the
    implementation uses only the native row indexing in Cassandra and no
    secondary indexes, because they are costly. Instead, we simulate a
    secondary index by using the column names and putting information in
    them, which is very efficient. Also, for deleting expired records, we
    let Cassandra take care of this with its own mechanism (by setting the
    TTL for columns).
c0a92201
 
18c5f3fe
    The module supports raw queries. However these queries must follow the
    CQL (Cassandra Query Language) syntax. The queries can be issued in the
    script by means of the AVPOPS module. Keep in mind that when passing
    back the results from the database only the first row is used to set
    the AVP variables. (default AVPOPS behaviour) The script lines below
    can be used as an example for issuing the query towards an cassandra
    instance.(This example will work once the column family `location` is
    configured correctly in the cassandra keyspace)
    ...
    $var(dballowed)="select * from location where key = 'userx' limit 1;";
    avp_db_query("$var(dballowed)");
    xlog("L_INFO","Got result here: [$avp(i:1)] [$avp(i:2)] [$avp(i:3)].\n");
    ...
 
c0a92201
 2. Dependencies
 
    2.1. SIP Router Modules
    2.2. External Libraries or Applications
 
 2.1. SIP Router Modules
 
    The following modules must be loaded before this module:
      * No dependencies on other SIP Router modules.
 
 2.2. External Libraries or Applications
 
    The following libraries or applications must be installed before
    running SIP Router with this module loaded:
ff18475e
      * Thrift library (tested with version 0.6.1 and version 0.7.0). You
        can download it from http://archive.apache.org/dist/thrift .
c0a92201
 
ff18475e
    The implementation was tested with Cassandra version 1.0.1 and version
    1.1.6. I used the sourced from DataStax Community Edition
    (http://www.datastax.com/download/community).
c0a92201
 
 3. Parameters
 
    3.1. schema_path (string)
 
 3.1. schema_path (string)
 
    The directory where the files with the table schemas are located. This
401a81e1
    directory has to contain the subdirectories corresponding to the
e1306157
    database name (name of the directory = name of the database). These
401a81e1
    directories, in turn, contain the files with the table schemas. See the
c0a92201
    schemas in utils/kamctl/dbcassandra directory.
 
    Example 1.1. Set schema_path parameter
    ...
    modparam("db_cassandra", "schema_path",
e1306157
                "/usr/local/kamailio/etc/kamctl/dbcassandra")
c0a92201
    ...
 
 4. Functions
 
    NONE
 
 5. Installation
 
8868e624
    Because the db_cassandra module depends on an external library, it is
c0a92201
    not compiled and installed by default. You can use one of these
401a81e1
    options:
c0a92201
      * - edit the "Makefile" and remove "db_cassandra" from
        "excluded_modules" list. Then follow the standard procedure to
        install SIP Router: "make all; make install".
401a81e1
      * - from command line, run: 'make all include_modules="db_cassandra";
c0a92201
        make install include_modules="db_cassandra"'.
 
 6. Table schema
 
    The module must know the table schema for the tables that will be used.
    You must configure the path to the schema directory by setting the
    schema_path parameter.
 
    A table schema document has the following structure:
      * First row: the name and type of the columns in the form name(type)
        separated by spaces. The possible types are: string, int, double
        and timestamp.
        Thetimestamp type has a special meaning. Only one column of this
        type can be defined for a table, and it should contain the expiry
        time for that record. If defined this value will be used to compute
        the ttl for the columns and Cassandra will automatically delete the
        columns when they expire. Because we want the ttl to have meaning
        for the entire record, we must ensure that when the ttl is updated,
        it is updated for all columns for that record. In other words, to
        update the expiration time of a record, an insert operation must be
401a81e1
        performed from the point of view of the db_cassandra module
        ("insert" in Cassandra means "replace if exists or insert new
        record otherwise"). So, if you define a table with a timestamp
        column, the update operations on that table that also update the
        timestamp must update all columns. So, these update operations must
        in fact be insert operations.
c0a92201
      * Second row: the columns that form the row key separated by space.
      * Third row: the columns that form the secondary key separated by
        space.
 
e1306157
    Below you can see the schema for the location table (when use_domain
b7fde234
    not set):
c0a92201
 
    ...
    callid(string) cflags(int) contact(string) cseq(int) expires(timestamp) flags
 (int) last_modified(int) methods(int) path(string) q(double) received(string) so
82bb7a8b
 cket(string) user_agent(string) username(string) ruid(string) instance(string) r
 eg_id(int)
c0a92201
    username
    contact
    ...
 
    Observe first that the row key is the username and the secondary index
b7fde234
    is the contact. We have also defined a timestamp column - expires.
 
    If you need to use the domain part of the AOR also (you have set
    use_domain parameter for usrloc in the script), you should include the
    domain column in the list of columns and in the primary key. The schema
    will then look like this:
    ...
    callid(string) cflags(int) contact(string) cseq(int) domain(string) expires(t
 imestamp) flags(int) last_modified(int) methods(int) path(string) q(double) rece
 ived(string) socket(string) user_agent(string) username(string) ruid(string) ins
 tance(string) reg_id(int)
    username domain
    contact
    ...
 
    Notice that a key (primary or secondary) can be composed from more
    columns, in which case you have to specify them separated by space.
c0a92201
 
401a81e1
    To understand why the schema looks like this, we must first see which
    queries are performed on the location table. (The 'callid' condition
    was ignored as it doesn't really have a well defined role in the SIP
    RFC).
c0a92201
      * When Invite received, lookup location: select where username='..'.
      * When Register received, update registration: update where
        username='..' and contact='..'.
 
401a81e1
    So, the relation between these keys is the following:
c0a92201
      * The unique key for a table is actually the combination of row key +
        secondary key.
      * A row defined by a row key will contain more records with different
        secondary keys.
 
401a81e1
    The timestamp column that leaves the Cassandra cluster to deal with
b7fde234
    deleting the expired records. For this to work right we needed to
    modify a bit the behavior of usrloc module and replace update sql query
    performed at re-registration with an insert sql query (so that all
    columns are updated and the new timestamp is set for all columns). This
    behavior is enabled by setting a parameter in the usrloc module
401a81e1
    db_update_as_insert:
c0a92201
 
    ...
    modparam("usrloc", "db_update_as_insert", 1)
    ...
 
b7fde234
    Also you should disable in usrloc module the timer routine that checks
    for expired records. You can do this by setting the timer interval to
    0. timer_interval:
 
    ...
    modparam("usrloc", "timer_interval", 0)
    ...
 
c0a92201
    The alternative would have been to define an index on the expire column
401a81e1
    and run a external job to periodically delete the expired records.
    However, obviously, this would be more costly.
c0a92201
 
 7. Limitations
 
401a81e1
    The module can be used only when the queries use only one index, which
    is also the unique key, or have two indexes that form the unique key
    like in the usrloc usage.