Wednesday, November 23, 2011

DCI Sample Implementation

Download: https://github.com/dimitrs/DCI-NIDS/tree/DCI-NIDS-1

In this post I present an experimental network protocol analyzer implementation (in C++) based on the Data, context and interaction (DCI) paradigm and code snippets from Snort. My intention was to get first hand experience with DCI in C++, understand its benefits and its limitations, and evaluate its applicability to full-scale projects. Since I was unsuccessful in finding DCI C++ examples anywhere, my code represents the understanding of DCI that I have gained from piecing together the various code fragments from documentation. I admit that I find it a bit strange that there are no examples (that I could find). Is there an impediment to using DCI in C++ ? I did come across a couple of issues in my implementation. But first, let me list the goals of DCI as described by the DCI wiki entry and to state that to some degree or another I did experience the benefit of these goals:


  • To improve the readability of object-oriented code by giving system behavior first-class status;
  • To cleanly separate code for rapidly changing system behavior (what the system does) from code for slowly changing domain knowledge (what the system is), instead of combining both in one class interface;
  • To help software developers reason about system-level state and behavior instead of only object state and behavior;
  • To support an object style of thinking that is close to peoples' mental models, rather than the class style of thinking that overshadowed object thinking early in the history of object-oriented programming languages.


DCI consists of three parts: Data, context and interaction. Each of which I list below together with some observations about each one concerning the implementation I present in this post. One thing I should mention first is that I took most of the underlying code from Snort. My goal is not to re-design Snort. After all, it only took me about half and hour to find and understand the TCP/IP decoding and processing parts of the code, which could be a sign that it is fairly well designed even though its implemented in C. What’s more, I have re-implemented a very small part of Snort – just enough to take DCI out for a test-run. Take a quick look at the source code before reading on.

Data

The domain objects. They contain very little interaction code and mainly consist of getter and setter methods. I have used the prefix “Do” to denote data classes e.g. DoIPv4Packet, DoIPv6Packet and DoTCPpacket. They represent the IPv4, IPv6, TCP parts of the packet. You will notice that DoIPv4Packet and DoIPv6Packet are practically identical (see “Interaction”).


Context

Implements the use cases of the system. A context includes the roles and data objects and knows how to bind them. The context also provides a mechanism by which any role executing within that context can refer to the other roles in that context. I have used the prefix “Context” to denote context classes e.g. ContextIP. The ContextIP context decodes IP packet headers and applies IP layer rules and includes the specific roles and data objects specific to those tasks. A single data object may simultaneously play several roles. In the ContextIP context, a DoIPv4Packet object can play the Role_IPv4decoder and Role_IPrules roles (see “Interaction”).

The context knows which roles and data objects are to be used within its scope. Data objects are retrieved or created by the context. Below, is the ContextIP constructor. It creates the relevant data object and assigns the roles they are to play.


ContextIP::ContextIP(const void* packet) :
    decode_(NULL), rules_(NULL), pkt_(packet)
{
    const iphdr* hdr = static_cast<const iphdr*>(packet);   
    if (hdr->version == 4)
    {
        DoIPv4Packet* obj = new DoIPv4Packet;        
        decode_ = obj;
        rules_ = obj;
    }
    else if (hdr->version == 6)
    {
        DoIPv6Packet* obj = new DoIPv6Packet;        
        decode_ = obj;
        rules_ = obj;
    }
}


In one context I had a difficult time implementing the creation/retrieval of the data objects. The type of data object to be created/retrieved depends on what came before i.e. the state of previous data objects. Is it permissible to use a role from within a context constructor ? In the end I ended up in the strange situation where a role has no SELF because it has not been created yet, but uses other roles (see ContextStream). ContextStream creates or retrieves a TCP/UDP stream (or 5-tuple flow) and processes it. I am sure that this part needs to be reworked.


Interaction 


The Interaction is "what the system does." and is implemented as roles objects play at run time. I have used the prefix “Role” to denote role classes e.g. Role_IPv4decoder, Role_IPv6decoder, Role_TCPdecoderImpl.
Role objects combine the state and methods of a Data object with methods (but no state, as Roles are stateless) from one or more Roles. Role methods are injected into Data objects by use of traits. Unfortunately, the injection can not be done at run-time. The IP packet data object could play either the Role_IPv4decoder role or the Role_IPv6decoder role depending on the IP version. However, since both roles can not be injected onto the same data object, I had to duplicate data object itself – hence DoIPv4Packet and DoIPv6Packet are practically identical.
According to DCI documentation, a Role should only address another object in terms of its methodless Role i.e. a pure virtual base class. I found that it was essential to do this if only to compile successfully because of the dependencies between the Role, Data and Context classes. That is the reason for Role_TCPdecoder and Role_TCPdecoderImpl. Even though there may never be a different kind of TCP decoder I had to give the Role_TCPdecoderImpl a pure base class.
Below, you can see an IPv4 decoder role method. The role uses SELF which binds to the object playing the current Role. Code within the method invokes methods of the Data part of the object using SELF. Methods of other roles can also be invoked. RULES binds to the Role_IPrules role and allows one role to invoke methods on another role. A single data object may simultaneously play several roles. The DoIPv4Packet object play the Role_IPv4decoder and Role_IPrules roles.

template <class ConcreteDerived>

void Role_IPv4decoder<ConcreteDerived>::accept(const void* packet)
{
    const iphdr* ip = static_cast<const iphdr*>(packet);
   
    SELF->data(packet);
   
    SELF->srcip()->setAddr(ip->saddr);
    SELF->dstip()->setAddr(ip->daddr);
       
    if (ip->protocol==6)
    {
        SELF->proto(TCP_PROTO);
    }
    else if (ip->protocol==17)
    {
        SELF->proto(UDP_PROTO);       
    }
    else {
        SELF->proto(UNKNOWN_PROTO);               
    }
   
    SELF->totallength(ntohs(ip->tot_len));
    SELF->headerLength(ip->ihl*4);
    SELF->id(ntohs(ip->id));
    SELF->frag_flag(ntohs(ip->frag_off) & 0x3fff);
    SELF->frag_offset(ntohs(ip->frag_off));
   
    // Apply IP header rules
    RULES->match();
                   
    if (Role_IPv4decoder<ConcreteDerived>::getProto() == TCP_PROTO)
    {
        ContextTCP context(SELF, packet);
        context.doit();      
    }
}

Friday, September 30, 2011

Web applications in Python without Javascript, CSS, HTML


Download: rctk_zk.tar.gz

If would like to know how to write rich web applications in pure Python without any knowledge of Javascript, CSS, HTML or Ajax then continue reading this post.


Take a look at the screen-shot. It is the demo application of a framework called RCTK – which makes the development of complex web applications as easy as desktop applications. As the RCTK documentation puts it, the API has been inspired by Tkinter. So if you are comfortable with Tkinter (or for example wxPython), you don't have to be a web programmer to create impressive web applications. Even though applications like the one in the screen-shot are programmed in pure Python, RCTK still has to render Javascript widgets in the browser, transparently to the programmer of course. Currently, two different front-end tool-kits are supported for this: Jquery (see demo) and Qooxdoo (see demo).

If you have just looked at the demos you will be wondering why my screen-shot looks different. Not even the creators of RCTK will fully recognise it. On the other hand, If you have ever used the ZK Java framework then you will probably be familiar with the “classic” blue colour theme and “Grid” widget shown in the screen-shot. I have extracted the Javascript part of ZK and added it to RCTK as an optional third front-end tool-kit. I will contribute the source code to the RCTK project as soon as I can. However, not all ZK components are available. For the moment, it is limited to the current RCTK API. Hopefully, we will expand the API to add other widgets, layouts and features such as border layout, html editor, drag-and-drop and other ZK components. Take a look at the ZK demo to see what components are available.

Below you can see the source code for the grid in the screen-shot. Simple !

class Demo(object):
    title = "List"
    description = "Demonstrates the List control"

    def build(self, tk, parent):
        parent.setLayout(GridLayout(rows=4, columns=2))
        parent.append(StaticText(tk, "Single"))
        parent.append(StaticText(tk, "Multiple"))
        grid = Grid(tk,
        [
            Column('Inv. No', width=55, sorttype=Column.INT),
            Column('Date', width=90, sorttype=Column.DATE),
            Column('Amount', width=80, sorttype=Column.FLOAT),
            Column('Tax', width=80, align=Column.RIGHT, sorttype=Column.FLOAT),
            Column('Total', width=80, align=Column.RIGHT, sorttype=Column.FLOAT),
            Column('Notes', width=150, sortable=False)
        ])

        parent.append(grid,colspan=2)

        for i in range(0, 20):
            grid.add((str(1000+i), "2010-01-%d" % (i+1), str(i), str(i*10.19), str(i*1.19), "Hello world, %d" % I))




Friday, September 02, 2011

A case for replacing polymorphism with switch-statements

Download: switch_benchmark.tar.gz

In this post I benchmark and compare virtual function calls to functionally equivalent if-else and switch statements. In object oriented design switch statements are substituted by polymorphism. The reason is to make the code more compact, readable and maintainable. However, the benchmark indicates that there is a case for doing the opposite -- replacing polymorphism with switch statements, if only for very specific and limited applications where performance considerations are the highest priority. Of course, switch statements can be made more compact, readable and maintainable using other C++ techniques such as metaprogramming. The source code for these techniques and the benchmark is available for download. Tested with gcc version 4.3.2.


The polymorphic benchmark uses a container (an array) that holds pointers to a base class from which several classes are derived. The time taken to make a virtual function call is measured for arrays of between 2 and 10 objects. An array of 2 objects takes 14 ns on average and 18 ns for 10 objects.

A functionally equivalent if-else function varies between 10 ns and nearly 18 ns.

The switch statement was the best performing at a constant 4 ns irrespective of number of objects. The switch statement is obviously being optimized by the compiler with the construction of a jump table.   

Should I care about a 14 ns virtual function call overhead? For most applications a few nanoseconds here or there make no difference. However, there are probably some software applications that might need to make every nanosecond count. For example, a network device may be described as having a performance of 10 gigabits per second (Gb/s). A 10-Gb/s Ethernet interface can deliver packets at between 812,740 and 14,880,960 p/s depending on the size of the packet, or on average it delivers packets every 67 ns - 1.2 us (see this article). In this context, a 14 ns overhead is significant. Therefore, if you are building a software application on top of 10 gigabits Ethernet interfaces, if-else and especially switch-statements might be the way to go. You can not make too many virtual function calls in 67 ns. Granted, my hardware may not be cutting-edge but I think a case can be made for switch-statements over polymorphism. Function call overhead can be improved  by a factor of at least 4.

Getting back to compactness, readability and maintainability of the code; we can improve the if-else and switch statements using Boost’s mpl, preprocessor and tuple libraries. You can see below a typical switch-statement and how it is used in place of an array of pointers.

struct SwitchBase
{
    void function(long* x, int type)
    {
        switch(type) {
            case 0:
                obj1.function(x);   
                break;
            case 1:
                obj2.function(x);   
                break;
            case 2:
                obj3.function(x);   
                break;               
            case 3:
                obj4.function(x);   
                break;
            case 4:
                obj5.function(x);   
                break;
            case 5:
                obj6.function(x);   
                break;               
            case 6:
                obj7.function(x);   
                break;
            case 7:
                obj8.function(x);   
                break;
            case 8:
                obj9.function(x);   
                break;               
            case 9:
                obj10.function(x);   
                break;               
        }
    }

    ImpOfStatic1 obj1;
    ImpOfStatic2 obj2;
    ImpOfStatic3 obj3;   
    ImpOfStatic4 obj4;
    ImpOfStatic5 obj5;
    ImpOfStatic6 obj6;   
    ImpOfStatic7 obj7;
    ImpOfStatic8 obj8;
    ImpOfStatic9 obj9;   
    ImpOfStatic10 obj10;
};

long sum=0; 
SwitchBase sw;
sw.function(&sum, 2); // calls object 3 with a 4 ns overhead
sw.function(&sum, 0); // calls object 1 with a 4 ns overhead


Writing switch-statements like this by hand is tedious and error prone. Instead we can automate its construction. I provide a class called “switch_case”. The compiler automatically constructs all the case statements required and the performance (Switch-Boost) is exactly the same as the hand written version.


Usage: Create instances of each class and add them to a container (a Boost tuple).

// Instantiate the class instances
ImpOfStatic1 s1;
ImpOfStatic2 s2;
ImpOfStatic3 s3;   
ImpOfStatic4 s4;
ImpOfStatic5 s5;
ImpOfStatic6 s6;   
ImpOfStatic7 s7;
ImpOfStatic8 s8;
ImpOfStatic9 s9;   
ImpOfStatic10 s10;

// Create a tuple and store the instances
typedef boost::tuple<
      ImpOfStatic1*,
      ImpOfStatic2*,
      ImpOfStatic3*,
      ImpOfStatic4*,
      ImpOfStatic5*,
      ImpOfStatic6*,
      ImpOfStatic7*,
      ImpOfStatic8*,
      ImpOfStatic9*,
      ImpOfStatic10*> StatTuple;

// Store the instances in the tuple
StatTuple tup(&s1,&s2,&s3,&s4,&s5,&s6,&s7,&s8,&s9,&s10);
           

// Crreate a switch-statement for all types in the tuple.
switch_case<StatTuple> call(tup);
Func f;
call(f, 2, &sum);    // calls object s3 with a 4 ns overhead
call(f, 0, &sum);    // calls object s1 with a 4 ns overhead


In this case, you do not have to write a switch-case function by hand. The compiler does it automatically. It is almost as elegant as the polymorphic solution and comes with a couple of real advantages:

1) Function call overhead is improved by a factor of four.
2) Functions can be in-lined -- virtual functions can not.
3) No loss of type identity specific to derived classes. 
4) No derived classes are needed. Classes can have different member function names.




Wednesday, July 27, 2011

Python Bindings for Sally (a machine learning tool)

Download:   sally-0.6.1-with-bindings.tar.gz

One of the tools I have used recently in my machine-learning projects is Sally. As Sally’s web page describes it: “There are many applications for Sally, for example, in the areas of natural language processing, bioinformatics, information retrieval and computer security”. You can look at the example page to see more details. It is written in C which makes it fast, but as is usually the case, using a tool like this directly from Python, would make life easier. It would make for faster prototyping and system development and since it is a tool that I think I will be using repeatedly in the future, I gave the library Python bindings. In this post I would like to outline the technique I use to create a python module from a C library. I use Swig for the bindings and you will have to be, to some extent, familiar with Swig to follow the rest of this post.

As input, SWIG takes a file containing ANSI C/C++ declarations, a special "interface file" (usually given an .i suffix). At its simplest, an interface file looks something like this (see below), where a module called "example" will be created with all C/C++ functions and variables in example.h available from Python.
%module example

%{
#include "example.h"
%}

%include "example.h"
 
Unfortunately, interface files are not usually that simple. There are limitations to what Swig will parse correctly. For example, complex declarations such as function pointers and arrays are problematic.

In the case of Sally, libconfig is used for its configuration management and one would need to include libconfig in the interface file. Take a look at the interface file below. Libconfig's config_lookup_string function is problematic. Swig can not deal with the char** without extra work from us. I created a function called config_lookup_string_2 that wraps config_lookup_string and with the help of the cstring.i library, this becomes useable from Python. Unfortunately, this is quite typical -- it often becomes a time consuming process to check every function and structure you want to provide bindings for, and look for ways of making problematic functions and structures work correctly from Python.
%module pysally

%{
#include <libconfig.h>
%}

%include <cstring.i>
%cstring_output_allocate(char **out1, free(*$1));

%{

void config_lookup_string_2(
    const config_t *config, const char *path, char **out1)
{
    *out1 = (char *) malloc(1024);
    (*out1)[0] = 0;   
    config_lookup_string(config, path, (const char *)out1);
}

%}
%include <libconfig.h>

The above interface file can quickly grow into a bit of a night-mare, in terms of development time and complexity as you add additional functions that Swig can’t deal with transparently. I take an alternative route. The approach I use is to create a facade over the api I want to use from Python. The facade consists of one or more C++ classes and it is the facade for which I provide bindings. The facade is made as complex as Swig's parser allows it to be without having to add complex Swig directives in the interface file.

Getting back to Sally. Essentially the library does three things:
1) Read a config file.
2) Read text from a file or files and process them.
3) Write features to an output file.

As far as Sally's configuration processing is concerned, one could provide some getter and setter member functions. It’s not strictly necessary to access the configuration from Python because the facade takes care of the configuration details in the load_config and init methods. All I have to do from Python is pass the name of the configuration file Sally is to use. Below, is my initial attempt at creating a facade and its interface file. You can pass the input and output paths (in the constructor), and configuration path (in load_config). As you can see, I make use of std::string because Swig deals with it semi-transparently by the addition of %include "std_string.i" in the interface file.

swig.i
%module pysally

%{
#include "pysally.h"
%}

%include "std_string.i"
%include "pysally.h"

pysally.h
class Sally
{
public:

    Sally(int verbose, std::string in, std::string out) :
        entries_(0), input_(in), output_(out) {}

    ~Sally();

    /// Load the configuration of Sally
    void load_config(const std::string& config_file);

    /// Init the Sally tool
    void init();

    /// Main processing routine of Sally.
    /// This function processes chunks of strings.
    void process();

    /// Get/Set configuration
    std::string getConfigAttribute(std::string name);

    void setConfigAttribute(std::string name, std::string value);

    // etc
    // ...
    // ...

private:
    config_t cfg_;
    int verbose_;
    long entries_;
    std::string input_;
    std::string output_;
};

From Python you would use it like this:
verbose = 0
in = "/tmp/input"
out = "/tmp/output"
config = "/tmp/sally.cfg"

s = Sally(verbose, in, out)
s.load_config(config)
s.init()
s.process()

As a result, I can now use Sally from Python, which is nice but it doesn’t really provide anything that I can’t already do with the C executable Sally provides. The Sally library allows you to configure its outputs for a specified format, such as plain text, in LibSVM or Matlab formats. Even though it’s not too difficult to add C code for other formats, it is even easier to do from Python. I provide two additional C++ classes; Reader and Writer (see the code below). The reader and writer facades use the underlying Sally library to read and write to files using the format specified in the configuration file, just as the original Sally binary does. But by extending these classes in Python, one could override the default behaviour -- read and write in other formats,  read and write to a database instead, or even write Sally's output directly to another machine-learning module or read its input directly from a web-scrapping python module instead of a file.

Below, you can see the final interface file, the C++ Reader/Writer classes that provide the default implementation and Python extension Reader/Writer classes. The interface file is still very simple. The only new additions are the directors directive. Directors allow C++ classes to be extended in Python, and from C++ these extensions look exactly like native C++ classes. Neither C++ code nor Python code needs to know where a particular method is implemented.

swig.i
%module(directors="1") pysally
%{
#include "pysally.h"
%}

%feature("director") Reader;        
%feature("director") Writer;       

%include "std_string.i"
%include "pysally.h"

pysally.h
class Writer
{
public:

    Writer(std::string out);

    virtual ~Writer();
   
    virtual void init();   
   
    virtual const std::string getName();

    virtual int write(const output_list& output, int len);
   
private:   
    config_t& cfg_;   
    std::string output_;
    bool hasout_;
};

class Reader
{
public:

    Reader(std::string in);

    virtual ~Reader();
   
    virtual void init();   
   
    virtual const std::string getName();
   
    virtual long getNrEntries();

    virtual int read(string_list& strs, int len);

private:   
    config_t& cfg_;   
    std::string input_; 
    long entries_;   
};

run.py
class MyReader(Reader):
    def __init__(self, input):
        super(MyReader, self).__init__(input)

    def read(self, strings, len):       
        return super(MyReader, self).read(strings, len)

    def init(self):
        super(MyReader, self).init()

    def getNrEntries(self):
        return super(MyReader, self).getNrEntries()


class MyWriter(Writer):
    def __init__(self, output):
        super(MyWriter, self).__init__(output)

    def init(self):        
        pass   

    def write(self, fvec, len):       
        for j in range(len):           
            print "l:", fvec.getFeaturesLabel(j),           
            for i in range(fvec.getListLength(j)):
                print fvec.getDimension(j, i), fvec.getValue(j, i),
                print  fvec.getValue(j, i)           
            print fvec.getFeaturesSource(j)           
            print       
        return 1

input = "/home/edimchr/reuters.zip"
output = "/home/edimchr/tmp/pyreuters.libsvm"
verbose = 0
r = MyReader(input)
w = MyWriter(output)
#r = Reader(input)
#w = Writer(output)

s = Sally(verbose, r, w)
s.load_config("./example.cfg")
s.init()
s.process()

From Python then, you can extend the Reader and/or Writer classes defined in C++. MyReader and MyWriter are passed to the Sally facade via its constructor, and from then-on the underlying C++ code uses the derived python implementations. MyReader simply defers to its base class i.e. Reader, and MyWriter prints the various output information Sally generated.

You may have noticed that the Reader class defines the member function:

    virtual int read(string_list& strs, int len);

And Writer defines the member function:

    virtual int write(const output_list& output, int len);

What are string_list and output_list ? Sally defines a couple of structures that it uses to store the text read (string_t) and output features calculated (fvec_t). These two structures are especially problematic for Swig. As a result, I create a facade over each one called string_list and output_list.
class string_list
{   
private:
    string_t* str_;
   
public:   
    string_list(string_t* str) :
        str_(str) {}
  
    /// Length for element i
    void setStringLength(int i, int len) 
      { str_[i].len = len ; }

    /// String data for element i
    void setStringData(int i, char* data) 
      { str_[i].str = strdup(data); } 
   
    /// Optional label of string
    void setLabel(int i, float label) 
      { str_[i].label = label; }
       
    /// Optional description of source
    void setSource(int i, char* src) 
      { str_[i].src = strdup(src); } 
   
    string_t* getString() const { return str_; }
};

class output_list
{   
private:
    fvec_t** vec_;
   
public:   
    output_list(fvec_t** vec) :
        vec_(vec) {}
  
    /// Length for element i
    unsigned long getListLength(int i) const 
      { return vec_[i]->len; }

    /// Nr of features for element i
    unsigned long getTotalFeatures(int i) const 
      { return vec_[i]->total; }
   
    /// Label for element i
    float getFeaturesLabel(int i) const 
      { return vec_[i]->label; }
   
    /// List of dimensions j
    unsigned long getDimension(int i, int j) 
      { return vec_[i]->dim[j]; }
   
    /// List of values for element i
    float getValue(int i, int j) 
      { return vec_[i]->val[j]; }   
   
    char* getFeaturesSource(int i) const 
      { return vec_[i]->src; } 
   
    fvec_t** getFvec() const { return vec_; }
};
By creating a simple C++ facade over the API, Swig can parse the interface file without difficulties. In general, one could use std::string, std::vector, and std::map.


Building the Python module

Sally is a C library and is built with Autotools.

1) You need additional Autoconf macros to enable SWIG and Python support. I added ac_pkg_swig.m4, ax_pkg_swig.m4 and ax_python_devel.m4 to the m4 subdirectory.
 
    sally-0.6.1/
        m4/
            ac_pkg_swig.m4
            ax_pkg_swig.m4
            ax_python_devel.m4
        pysally/
            Makefile.am
            swig.i
        src/
            Makefile.am
        Makefile.am
        configure.in
 
2) Add pysally to sally-0.6.1/Makefile.am
    ……
    SUBDIRS = src doc tests contrib pysally
    ……
    ……
 
3) Add the following to sally-0.6.1/configure.in
    AC_PROG_CXX
    AC_DISABLE_STATIC
    AC_PROG_LIBTOOL
    AX_PYTHON_DEVEL(>= '2.3')
    AM_PATH_PYTHON
    AC_PROG_SWIG(1.3.21)
    SWIG_ENABLE_CXX
    SWIG_PYTHON
 
4) Add pysally/Makefile to AC_CONFIG_FILES in sally-0.6.1/configure.in

    AC_CONFIG_FILES([
       Makefile \
       src/Makefile \
       src/input/Makefile \
       src/output/Makefile \
       src/fvec/Makefile \
       doc/Makefile \
       tests/Makefile \
       contrib/Makefile \
       pysally/Makefile \
    ])

5) Create the pysally subdirectory and add the files
    Makefile.am
    swig.i       <-- interface file
    globals.h 
    pysally.h    <-- wrapper facades
    pysally.cpp
    run.py       <-- example code to use the module

6) To build:
    cd sally-0.6.1
    ./autogen.sh
    ./configure --prefix=/home/yourhome/sally_install/ --enable-libarchive
    make
    make install

Tuesday, June 28, 2011

Restructured text table reports

Download: rst2pdf-read-only.tar.gz

I use rst2pdf to create PDF documents from reStructuredText (ReST). It is relatively easy to generate table reports from a database, like the ones shown below. In this example, the table does not fit into one page and spans two.




Although it is easy to execute statistical queries such as sum a column in a database table using SQL, it is not possible to add summary rows to each page when a table spans multiple pages. The shortcoming with ReST and rst2pdf is that you do not know how many rows will fit into a page because it depends on the page size (A4 etc), font size and table content. The modified version of rst2pdf I provide in this post allows you to attach additional rows (with sum of column information) onto the bottom of tables, automatically, and for each page.  In the example you can see how to insert current page sum of column, previous pages sum of column and total pages sum of column information into the summary row. The ":sum:" is a nonstandard interpreted text role, which means it will only work with this version of rst2pdf.

The syntax to generate the table is this:

.. list-table:: Title
   :widths: 15 10 30
   :header-rows: 1

   * - Description
     - Weight
     - Height
   * - A
     - 1
     - 2 
   * - B
     - 1
     - 2
   * - C
     - 1
     - 2
   * - D
     - 1
     - 2
   * - E
     - 1
     - 2
   * - F
     - 1
     - 2 
   * - G
     - 1
     - 2
   * - H
     - 1
     - 2
   * - I
     - 1
     - 2 
   * - J
     - 1
     - 2
   * - k
     - 1
     - 2
   * - L
     - 1
     - 2
   * - M
     - 1
     - 2
   * - N
     - 1
     - 2
   * - O
     - 1
     - 2
   * - P
     - 1
     - 2
   * - Q
     - 1
     - 2
   * - R
     - 1
     - 2
   * - Previous Total 
     - :sum:`previous_pages_col`
     - :sum:`previous_pages_col`
   * - Current Page Total 
     - :sum:`current_pages_col`
     - :sum:`current_pages_col`
   * - Total 
     - :sum:`total_pages_col`
     - :sum:`total_pages_col`


The command to create the table:

python createpdf.py --repeat-table-rows table.txt -o table.pdf

Wednesday, June 15, 2011

Benchmarking function call overhead in C++

Download: benchmark.tar.gz

The are two types of function calls I benchmark here:

1) Virtual functions
2) CRTP, used to simulate dynamic binding (with and without inlining)

One particular use of the CRTP is to simulate dynamic binding. The technique achieves a similar effect to the use of virtual functions, without the costs of dynamic polymorphism. The reason for this stems from the ability to inline function calls which is not possible with virtual functions bound at run-time. The use of CRTP is only worthwile when the cost of calling a funcion is not dominated by call and return overhead. Getter and Setter funcionts are good candidates for inlined CRTP.

Dynamic.h
#ifndef _Dynamic_H
#define _Dynamic_H

struct DynamicBase
{
    DynamicBase() {}
    virtual ~DynamicBase() {}
    virtual void function(long*) = 0;
};

DynamicBase* createDynamic();

#endif  /* _Dynamic_H */
Dynamic.cc
#include "Dynamic.h"
struct ImpOfDynamic : public DynamicBase
{
    void function(long* x);
};

void ImpOfDynamic::function(long* x) { ++*x; }

DynamicBase* createDynamic() {
    return new ImpOfDynamic;
}
Static.h
#ifndef _Static_H
#define _Static_H

template 
struct StaticBase
{
    void function(long* x)
    {
        static_cast(this)->function(x);
    }
};

struct ImpOfStatic : StaticBase
{
    // Without inlining  
    // void function(long* x);    

    // With inlining 
    inline void function(long* x);
};

void ImpOfStatic::function(long* x) { ++*x; }

#endif

main.cc
#include "Static.h"
#include "Dynamic.h"
#include 
#include 

Benchmark Results:

Without CRTP inlining:
6.01975 Virtual function (ns)
6.0212 CRTP function (ns)

With CRTP inlining:
6.16391 Virtual function (ns)
1.99611 CRTP function (ns)

There is no point in using CRTP instead of virtual functions unless inlining is used.

Tuesday, March 22, 2011

Automated Pattern Discovery From Network Traffic (2)

Last time, I described a way to find pattern strings in network traffic using machine-learning tools and techniques and it is the goal of this post to describe the method and results of applying these techniques to real network traffic. As an experiment or proof-of-concept, I looked for patterns in Bit-torrent traffic. The results look very promising. We will see how I uncovered a couple of patterns that could be used and probably are used by NIDS and DPI products to identify Bit-torrent traffic.

I will not go into any detail whatsoever on how to capture Bit-torrent traffic using Wireshark because I believe it is incidental to what I really want to show; how to apply machine-learning techniques using Sally and Cluto for pattern discovery. However, it is important to say that I will only be looking for patterns in the first packet of each flow or stream. I copied the contents of the first packet of each flow to a separate file. These files provide the input data to Sally. As I described in my last post, Sally maps strings into a vector space which we then use as input to Cluto, a toolkit for clustering.
One problem I did run into when I tried to combine these two tools is that Cluto's expected input file format is different to the format Sally provides. I hacked Sally's source code slightly so that its output matched Cluto's expected format. You can download the patch here: sally_patch. If you wish to reproduce my results, you can also download Sally's configuration file I used: sally_configuration. The script I used to glue together Sally and Cluto can be found here: glue_sally_cluto.py. This being a proof-of-concept, do not expect to find production ready code.

The glue_sally_cluto.py script first runs Sally, then Cluto, the results of which are then copied into a separate directory called "clusters". The directory holds the contents of each cluster Cluto generated and each cluster contains the packets Cluto lumped together in the clustering process. In my experiment, I used the tool chain to separate 750 packets into 10 clusters. As it turns out, a simple visual inspection of the clusters gave me the patterns. I found two: "BitTorrent protocol" and "d1:ad2:id20".

As an example, I looked at the contents of each packet in cluster "0":

dimitri@dimitri-laptop:/tmp/sig_analysis/clusters/0$ find -exec more {} \; 


BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol
BitTorrent protocol


The contents of each packet in cluster "5" indicate that an overwhelming number of packets start with the pattern "d1:ad2:id20":

dimitri@dimitri-laptop:/tmp/sig_analysis/clusters/5$ find -exec more {} \; 


L�RY � �O� 3�s!p�������� ��F� ;!7�����,��T������
�[��P ��7�ݞ �ڍQȝ�' ާ�                           ��V���v�� G�v=_/��@���G I���3 ]lV�˗|�� �t�f���Ż����ܐB32( ���>� �b C���#<y胀Y�:��v5��P�_��6[�5K󤜌�F `S
                     +�yp
d1:ad2:id20:
� h�B=A�~���lv� ,e1:q4:ping1:t4:�~
L6�J�A�\%<�c�e1:q4:ping1:t4:
d1:ad2:id20:Y�]{]� Pg=CA�,�B��0}e1:q4:ping1:t4:�6
d1:ad2:id20:{aB�Dǹ< d 0
                        A��e1:q4:ping1:t4:<b
d1:ad2:id20:hh ��Jb�νd��W%��|�e1:q4:ping1:t4:�(
d1:ad2:id20:���� �Y y���fc�] ���e1:q4:ping1:t4:�
d1:ad2:id20:<@˩
d1:ad2:id20:O�؇� ��"BG􃢅�e1:q4:ping1:t4:�l
d1:ad2:id20:CgC ���u ȡ��N�p���e1:q4:ping1:t4:rB
d1:ad2:id20:�T_}8 S��:�*�4���قe1:q4:ping1:t4:��
d1:ad2:id20:f$� �F0ik�D �_I k��e1:q4:ping1:t4:L�
�e1:q4:ping1:t4:e� 2~ג�=��ƷZ
d1:ad2:id20:�|IM��� ���r��E��e1:q4:ping1:t4:�m
d1:ad2:id20:JЅN�#�\bN�.|��}��e1:q4:ping1:t4:�+
d1:ad2:id20:��a�6ދO��˞ \��~ ��e1:q4:ping1:t4:}�
d1:ad2:id20: Z
d1:ad2:id20: ��� ^L!?b
d1:ad2:id20:
            D����{ �jUh�D



All the other clusters shared the same patterns; either "BitTorrent protocol" or "d1:ad2:id20".



Wednesday, March 16, 2011

Automated Pattern Discovery From Network Traffic

From time to time I browse the projects listed on freshmeat.net. I came accross Sally, a tool for mapping strings into vector spaces. This mapping allows one to apply machine learning techniques and data mining to the analysis of string data. I looked at two of the examples given; Sally can be used to map documents to a vector space and build a classifier to categorize text documents using Support Vector Machines and map DNA sequences to a vector space, then build a classifier to detect the start of DNA sequences.


I wondered whether this tool could be applied to automated pattern generation for network intrusion detection or network protocol identification. Patterns are used by various network intrusion detection systems (NIDS) and packet identifiers like Linux's Netfilter. For example, to detect bittorrent traffic, packet payloads are searched to find a string of characters that uniquely identify the bittorrent protocol. L7 (http://l7-filter.sourceforge.net/protocols) is a good source for patterns. L7 lists "bittorrent protocol" as one of the patterns that could be used for identifying bittorrent on your network. The idea therefore, would be to generate or discover patterns like "bittorrent protocol" automatically using Sally. This would be a little different to the Sally examples I looked at where a classifier was genereted by a supervised learner. In my push for an automated pattern generation tool, I will be using Sally together with an unsupervised machine learning technique, called clustering, to discover patterns in network traffic for any particular protocol. The idea would be to map bittorrent packets to a vector space using Sally, and then cluster those vector spaces. The generated clusters would each represent one or more possible patterns.


But first, how does Sally's mapping work ? I did'nt get it unitl I looked at the source code. It calculates the hash for every n-gram in the string and uses that hash value as a feature. The value for that feature is the total number of times that hash value is found in a string. If you chose to work with a byte 8-gram then you would have exactly 12 non-zero hash-value features for the string containing "bittorrent protocol", one each for "bittorre", "ittorren", "ttorrent", "torrent ", "orrent p", "rrent pr", "rent pro", "ent prot", "nt proto", "t protoc", " protoco" and "protocol". The hash values calculated for these n-grams with their corresponding counts look like this:


313166:1 575006:1 667871:1 1104474:1 1213512:1 1355539:1 1382445:1 1532866:1 1559069:1 2132221:1 2874899:1 2985843:1


A sparse vector of count values is generated and not all clustering algorithms can readily deal with a high-dimensional sparse feature set. The Cluto clustering toolkit can. It's not open-source but they do provide a binary for non-comercial use. Next time, I will be atempting to cluster the feature set Sally generates and hopefully use the clusters to discover patterns in network traffic.

Wednesday, March 02, 2011

Test Code Coverage with Objdump and Valgrind

I was working on a C++ project recently where gcov inexplicitly failed to report correct results, especially for template files. Some templated code was incorrectly being reported as not-covered by my tests. Instead of doing the right thing by finding out why gcov was not working with my source code, I decided to write a python script that combined the outputs of objdump and valgrind into a test coverage report. It worked well enough and I thought I would share it.  Here is an extract of a test coverage report:

---------------------------------
File: HTTPmsg.h
Code not executed in: HTTPmsg<TCPfragmsg>::getContentEncoding() const
line no: 110

Code not executed in: HTTPmsg<TCPfragmsg>::DJBHash(char const*, char const*)
line no: 271

Code not executed in: HTTPmsg<TCPmsg>::isCandidateRequest() const
line no: 215

Code not executed in: HTTPmsg<TCPfragmsg>::isCandidateResponse() const
line no: 236 238 239 242 248 251 252

Code not executed in: HTTPmsg<TCPfragmsg>::getContentDirection() const
line no: 105

Compiled lines: 82, Not executed: 11
Code coverage:  86%
---------------------------------

The script can be found here: coverage

Tested on objdump (GNU Binutils for Ubuntu) 2.20.1 and valgrind-3.6.0.SVN-Debian.


Below, you can see the output of objdump and valgrind's for the 'isCandidateRequest' member function. The objdump output describes what code was compiled and the output of callgrind describes what code was executed. The script works by checking for common line numbers between them. Line numbers that appear in the object dump and not in the valgrind output are marked as non-executed source code.

objdump:  'isCandidateRequest' member function:

080e9498 <HTTPmsg<TCPmsg>::isCandidateRequest() const>:
_ZNK7HTTPmsgI6TCPmsgE18isCandidateRequestEv():
/projects/conduits/conduits/HTTPmsg.h:203
 80e9498:       55                       push   %ebp
 80e9499:       89 e                    mov    %esp,%ebp
 80e949b:       83 e28                sub    $0x28,%esp
/projects/conduits/conduits/HTTPmsg.h:205
 80e949e:       8b 45 08              mov    0x8(%ebp),%eax
 80e94a1:       8b 80 e0 00         mov    0xe0(%eax),%eax
 80e94a7:       85 c0                   test   %eax,%eax
 80e94a9:       75 26                   jne    80e94d1 <HTTPmsg<TCPmsg>::isCandidateRequest() const+0x39>
/projects/conduits/conduits/HTTPmsg.h:206
 80e94ab:       b9 00 00 00 00    mov    $0x0,%ecx
 80e94b0:       a1 e8 7e 26 08    mov    0x8267ee8,%eax
 80e94b5:       8b 15 ec 7e 26    mov    0x8267eec,%edx
 80e94bb:       83 c0 01              add    $0x1,%eax
 80e94be:       83 d2 00              adc    $0x0,%edx
/projects/conduits/conduits/HTTPmsg.h:215
 80e9624:       b9 00 00 00 00    mov    $0x0,%ecx
 80e9629:       a1 20 7f 26 08     mov    0x8267f20,%eax
 80e962e:       8b 15 24 7f 26     mov    0x8267f24,%edx
 80e9634:       83 c0 01              add    $0x1,%eax
 80e9637:       83 d2 00              adc    $0x0,%edx
 80e963a:       a3 20 7f 26 08     mov    %eax,0x8267f20


Callgrind: 'isCandidateRequest' member function:

fn=HTTPmsg<TCPmsg>::isCandidateRequest() const
203 87
205 116
jcnd=29/29 206
205
206 174