• 19Nov

    Hi there!

    In my current work I’m working on optimizing some parallel software, and basically tries to make programs run faster. Within this work there are focus areas such as I/O and memory utilization, which are key areas when trying to optimize software. Generally when people think of highly optimized software, they think of C/C++ and possible assembly.

    Python is very simple in its syntax, and its speed is not at all bad. From a HPC (High Performance Computing) perspective Python may not look that interesting, but combining Python’s simplicity with C’s speed, you’ll get the best from both worlds.

    Since many of the core modules, and Python itself is written in C, it is possible to further extend Python in C. Even the official Python documentation (http://docs.python.org) have a whole section covering the Python C API. By downloading the Python development files (and of course the GNU compiler; apt-get install build-essential), then you’re ready to create C extensions for Python. The overall goal for your C extensions should be to only do the very compute intensive tasks, and keep the other stuff in Python. A key observation here is to identify which parts of your program you may need to create a C extension for – and for this to be found you can use a profiler. Included in Python (“batteries included”) there are a couple of profilers:

    • import profile
    • import cProfile
    • import hotshot

    I will not cover  these python profilers here, but rather inspire you to try them out. They might save you a lot of work and give you valuable knowledge of your software. Do some reading of python profiling here.

    Now lets create a very simple (and extremely stupid) python extension in C. We want it to contain two methods:

    1. a method which allows you to run shell commands by passing a string
    2. a method which returns some text based on your input

    To create this extension we follow these steps: a) decide the python module name b) create the C file with the same name as the python module name c) start programming C! Let’s name our module for “cool” (not the best name)..

    This is our cool.c file:

    #include <stdlib.h>
    #include <stdio.h>
    #include <Python.h>
     
    /*
     * Takes a string argument, which is the shell command, and runs it.
     * Returns the return code of the system(cmd) call.
     * */
    static PyObject *cool_command(PyObject *self, PyObject *args) {
        const char *cmd;
        int retval;
     
        if(!PyArg_ParseTuple(args, "s", &cmd))
            return NULL;
     
        retval = system( cmd );
        return Py_BuildValue("i", retval);
    }
     
    /*
     * Takes a string argument, and assembles it into a new python string
     * object and returns it.
     * */
    static PyObject *cool_greet(PyObject *self, PyObject *args) {
        char *input;
        char *resp = "Hi there: ";
     
        if(!PyArg_ParseTuple(args, "s", &input))
            return NULL;
     
        char *retstr = (char *) malloc( sizeof(char)* (strlen(input)+strlen(resp)) + 1 ); // +1 &lt;- null termination
        if (retstr == NULL)
            return NULL;
     
        PyObject *retString;
     
        strcpy( retstr, resp );
        strcat(retstr, input);
     
        retString = Py_BuildValue("s", retstr);
        Py_INCREF(retString);
        free(retstr);
     
        return retString;
    }
     
    /* an array of PyMethodDef structure. A PyMethodDef structure is
    used to describe a method of an extension type. */
    static PyMethodDef CoolMethods[] = {
        {"command", cool_command, METH_VARARGS, "Execute a shell command." },
        {"greet", cool_greet, METH_VARARGS, "Returns a greeting to the caller."},
        {NULL, NULL, 0, NULL}
    };
     
    PyMODINIT_FUNC initcool(void)
    {
        (void) Py_InitModule( "cool", CoolMethods );
    }

    This C extension file is built up with the following key sections:

    • Inclusion of headers.
    • static PyObject methods which constitutes the visible python functions of our extension module.
    • An array of PyMethodDef structures. A PyMethodDef structure describes a method of an extension type.
    • A initialization method which is named “init<modulename>“. This method has the sole purpose of calling the Py_InitModule function which takes the name and the PyMethodDef array.

    So there you go, this is our Python extension implemented in C with the help of the Python development headers. Now we’ll have a look at our Distutils script (setup.py) that will assist us with the compilation and creation of the actual python module.

    This is our setup.py file:

    from distutils.core import setup, Extension
     
    module1 = Extension('cool',
            sources = ['cool.c'] )
     
    setup ( name = 'CoolPackage',
            version = '0.1',
            description = 'A descriptive and informal C extension to Python',
            ext_modules = [module1] )

    jada..

    This file is much self-explained. We have basically one method named “setup” which takes a bunch of arguments. These arguments is the package name, version, description and a set of Extension objects. These Extension objects describes our C extension files. By simply running this command:

    python setup.py build

    When the compilation is done, you’ll have a build folder in the same directory as the setup.py. Go into the “build/lib.linux-i686-2.6/“. Then type “python” or “ipython”, and then “import cool”. Now you actually have loaded the “cool” module, and you may call “cool.greet(’Alex’)” and/or “cool.command(’ls /’)” and the actual computing happens in the C world instead of the Python world.

    Now, keep in mind that this C extension isn’t actual doing anything useful. But, given that you have some algorithm or other problem to solve, and the time is of the essence, then utilizing the power which lays in this C extensions can give you significant time savings.

  • 10Nov

    Hi there.

    I just recently discovered the power of the functional function named partial inside the python module “functools”. Functools provides: “Tools for working with functions and callable objects”, and comes with python v2.5 and up.

    But, let’s illustrate some of the beauty with examples.

    import functools
    def check_balance(amount, limit=0):
        if amount &lt;= limit:
            return False
        return True
    positive_account_balance = functools.partial( check_balance, limit=1 ) # new func object to check if account is more than 0
    rich_dude = functools.partial( check_balance, limit=1000 ) # new func object to check if this person has more than 1000 in his account
    poor_dude = functools.partial( check_balance, limit=-100 ) # new func object to check if this person is poor (i.e. less than 100)

    So, let me explain some of this code. We are first defining a function named “check_balance” which takes two arguments (amount and limit). This function is very simple, as it only checks if amount is less than or equal to limit and returns a boolean value.

    So, what really does the partial function do? It returns a new function object based on the function object passed in as the first argument, and the predefined arguments provided. In other words we construct new function objects where some or all of the arguments is given, and then call that particular function later on, thus saving us from writing the function arguments again.

    Another example could be that you want a function to check if a username is also and administrator:

    import functools
    groups = {'admin': ['asbjorn'], 'users': ['asbjorn', 'lolcat', 'frank']}
    is_admin = functools.partial( lambda user, group: user in groups.get(group), group="admin" )
    is_admin("asbjorn") # will return True

    This functionality makes it very simple to create detailed and specific functions based on more generic designed functions. This also makes it simpler to follow the design approach Domain-driven Design (DDD) and  create a Domain Specific Language, as its easier to create domain specific functions.

  • 13Sep

    Hi there.

    Recently I installed the Shiretoko 3.5.2, which is not an official Firefox release, but more a release candidate so do speak. One major problem I discovered was that for many major websites it loses its sessions (i.e. on Facebook at least).

    Now, it turns out that this is a mistake shared between this Shiretoko and Facebook, as Facebook tries to recognize the web browser used, and performs its actions based on that. When it doesn’t recognize the web browser clearly something undefined behavior happens. Well, undefined is maybe a little harsh, but at least the behavior is very strange.

    But, I found out that if you change the settings “general.useragent.extra.firefox” in the about:config page from “Shiretoko/3.5.X” to “Firefox/3.5.X”, then your back in buisness.

    Hope that they can finish and put a final Firefox 3.5 release into Ubuntu. Its not a good idea to wait 6 months for a new web browser release, as its the policy of the ubuntu release cycles. They should include such software releases during the 6 months period.

  • 20Aug

    One day I needed to make an embedded Python interpreter in a Fortran/C program aware of its path, since it should search for a predefined python source code file in the same directory from the binary is located.

    Now, this introduces some issues, as the Fortran/C program can be runned in many ways:

    1. Absolute path (/home/asbjorn/test1/main)
    2. Standing in the directory and run ./main
    3. Having the /home/asbjorn/test1/ in your $PATH variable (Linux/UNIX), and type ‘main’.

    Now, the simplest approach is to use the “int argc, char *argv[]” variables inside your ‘int main()’ method. Then the “argv[0]” would contain the binary file name, including any absolute path if that what being used. But, if your application is in your $PATH variable, it wont work.

    A ‘dirty’ trick could be to use:

    char path[255];
    path = system("which main");

    but, its not recommended using this approach. Another very hackish and cool solution is the following:

    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/param.h>
    int main(int argc, char* argv[])
    {
       char path[MAXPATHLEN];
       int length;
       length = readlink("/proc/self/exe", path, sizeof(path));
       if(length<0) {
          fprintf(stderr,"error resolving /proc/self/exe!\n");
          exit(1);
       }
       path[length] = '\0';
       printf("The absolute path to this running binary is: %s\n", path);
     
       return 0;
    }

    Another approach which is often used is to include some kind of configuration file that contains this path. Or, in my case, I could specify another folder which should hold my Python files. This approach would probably make more sense in the long run, as it is more flexible.

    So, in case you find yourself in this situation, then feel free to copy-paste the code and save the day ;) Have a good summer people!

  • 16Jul
    Categories: development Comments: 0

    Hi there!

    “Wow” you may think. Are people still using the Fortran programming language?? I know, I was shocked too. But, apparently, Fortran is very much used within large mathematical problems / scientific work, and can therefor be seen within HPC communities. As I’m currently an IT-consultant which is working within this kind of environment, I’ve got some hands-on experience with C and Fortran.

    One of the applications I’m working on is a mix of Fortran and some C. Since I’m a software developer at heart and I have almost never ever touched Fortran, I prefer C code over Fortran. However, rewriting all the Fortran code is not an option as it is very time consuming, and that one should respect the saying “if it works, don’t change it”. But, one of my tasks is to try to optimize the code, and running the whole application through the GNU gprof tools reveals some bad code. I’m not going to dive into exactly what it was, but to solve it I wanted to use C instead of Fortran. So, then I discovered that I could simply reimplement the subroutine (basically just a function in C terms) in C and append a “_” at the end of the function name.

    This fortran program for example calls the C function “my_c_func”:

    program testme
        implicit none
        integer*4 dx, dy, direction, i, j
        !real*4 dimension(:,:), allocateable :: data
        real*4 data(5,5)
        dx = 5
        dy = 5
        direction = 1
     
        do i = 1, dy
          do j = 1, dx
              data(i,j) = (i+j)**2
          end do
        end do
     
        ! call our C library function
        call my_func(dx, dy, data, direction)
     
        write (*,*) "Hello world!"
    end program testme

    And then this is the C function “my_c_func”:

    #include <stdlib.h>
    #include <stdio.h>
     
    void my_func_(int *dx, int *dy, float *data, int *direction)
    {
      int i,j;
     
      printf("You are now inside the C function.\nAwesome, right?\n\n");
      printf("dx=%d, dy=%d, direction=%d\n", *dx, *dy, *direction);
     
      printf("Lets go through our data matrix..\n");
     
      for(i=0;i<(*dy);i++)
      {
          for(j=0;j<(*dx);j++)
          {
              printf("data[%d][%d] = %.2f\n", i+1, j+1, data[(*dy)*i+j]);
          }
      }
    }

    Lets call the fortran file for test.f90 and the C file for c_lib.c, and lets create a Scons file instead of a Makefile to compile this example. Also, I have only tested this with the “gfortran v4.2″ compiler, and gcc v4.3.3.

    import os
    env = Environment( env = os.environ )
     
    env.Program( target = "program1", source = ["for.f90", "c_lib.c"])

    And now for the grand finally. Type scons, and the compilation should run. There should now be a binary named “program1″. Run it, and have a closer look at the info.

    So, to summarize, the only requirement for Fortran files to call C functions is that the respective C function has a underscore right after its function name. Also, before the fortran compilation, the C file containing the function needs to be compiled into a object file and be reachable.

  • 04Jun

    Hellu!

    Lately I’ve been working on securing some web accessible resource, especially Subversion access to repositories through the Apache webserver. One aspect we found very difficult was to secure subversion access through our apache server, when we had a Active Directory server to authenticate against (I know..).

    Apache have some directives such as “Require valid-user” which signals that a user has to be authenticated against some authentication provider. This is in most cases a standard “.htaccess” and “.htpasswd” combination which provides this. For small projects, this may be a working approach. However, in a large-scale organization where you want a dynamic handling of users and their access, then using groups to reflect the users access to resources may be a better working solution.

    For one of our projects, we wanted all LDAP (AD) users to have read access, while members of certain groups have read and write access. We solved this with the following Apache config:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    <Location /wicked_project>
       DAV svn
       SVNParentPath /var/svn/wicked_project
       AuthzLDAPAuthoritative off
       AuthType basic
       AuthBasicProvider ldap
       AuthName "Need to authenticate here"
       AuthLDAPBindDN "ldap_user@domain.net"
       AuthLDAPBindPassword secretPassword
       AuthLDAPURL "ldap://ad.domain.net/dc=domain,dc=net?sAMAccountName?sub?(objectClass=*)"
     
       <Limit GET PROPFIND OPTIONS CHECKOUT>
          Require valid-user
       </Limit>
       <Limit REPORT MKACTIVITY PROPPATCH PUT MKCOL MOVE COPY DELETE LOCK UNLOCK MERGE>
       Require ldap-filter |(memberOf=CN=Staff,OU=GROUPS,DC=DOMAIN,DC=NET) \
                    (memberOf=CN=Wicked_project_rw,OU=GROUPS,DC=DOMAIN,DC=NET)
       </Limit>
    </Location>

    Another LDAP directive which can work if you only need one group to have read and write access is the “ldap-group” directive. However, in our case we needed multiple groups, which is not supported by the “ldap-group” directive.

    To solve this problem we used “ldap-filter” with multiple group filters inside the same filter, and divide them with the boolean OR. I don’t know if there are any more elegant ways of achieving the same result, but this solved our problems.

    Having a second look at this “ldap-fiter” directive, I see that it have a significant strength in terms of flexibility. However, one aspect I have not considered is the performance of this approach. Without looking in-depth into the mod_ldap apache module, I can guess that for each filter inside the ldap-filter directive, it have to make a query to the LDAP (AD) server to retrieve the wanted resource. So, for each group filter inside the ldap-filter, you need a call. In our approach, we need two LDAP queries. As you now may see, the more groups to filter, the more LDAP queries, hence the performance will degrade the more complex the ldap-filter is.

  • 20Apr
    Categories: VIM, development Comments: 0

    Hi there.

    These days is all about being efficient in your working process. Now, I find myself using Vim a lot, bot for regular text processing, e-mail, Wiki page edit (Firefox + ViewSourceWith), and Python/C/C++ programming.

    I should be clear on that the usage of Vim as an IDE pretty much relies on the size of the software project. If we are making a very small software, then most often I find Vim most efficient. But, however, if the project is large, then maybe Eclipse or KDevelop is more efficient. I should also state that I keep Java programming in the Eclipse IDE.

    So, how can you use Vim Marks to be more efficient? Well, given that you have a large source code file, and you find yourself moving a lot between some blocks of code. Then, a way of saving you for the trouble of actually holding in your keys until your at the requested position, you just jump to a previously declared mark.

    Vim screenshot

    Vim screenshot

    Now, if we were have the marker over some frequently used function/class/variable (or something), and we want to store a VIM marker there, we hit the keys:ma.

    This stores a marker in registry a. So, if we’re somewhere else in our file, and want to jump back, we may type: 'a.

    Notice however that there is a distinction between lowercase and uppercase registry definition. If we type mA instead of ma, then we are able to jump between files (often you’re writing source code in multiple files). You may store these markers in registers from a-z for in-file references, and A-Z for between file references.

    To delete your marker, type: d'a. If you only want to replace it with a new marker, then simply type ma. You may also list all known markers with:


    :markers

    I would recommend reading this Vim wiki page regarding these Vim markes. I find them very useful, hopefully also you will.

    Vim screenshot

    Vim screenshot

  • 21Mar

    Since the very important and big release of Python v3.0 (also knowed as “Python 3000″) in december, there has been some minor bugfixes and further improvements. Now, in Python v3.1, we may get to feel these bugfixes and enhancements.

    Python Logo

    Python Logo

    Of important changes we may highlight such as:

    • The IO module have been reimplemented in C for gaining even more speed
    • Decoding of utf-8, utf-16 and latin1 is know from 2x to 4x times faster than before
    • int and str comparisons is know faster

    So, for all Python v3.0 people out there, I would suggest upgrading to this lastest version. Visit this site: http://www.python.org/download/releases/3.1/

  • 25Feb
    Categories: Funny Comments: 0

    I was on the AC/DC Black Ice concert in Oslo 18.Feb, and I must say, it was an incredible live performance of the AC/DC. The concert was held in the new “Telenor Arena”

    AC/DC

    AC/DC

    Classic songs like “Highway to hell”, “Thunderstruck”, “T.N.T.”.. Also the fact that these guys ain’t getting any younger, the sure did impress me. I would serious consider attending to another concert with these guys, even it means I have to put my arse on a plain to another country.

  • 30Jan

    Hi everyone.

    I just want to inform that I’ve taken some further steps to describe and provide my master thesis. I have written a page (http://asbjorn.fellinghaug.com/blog/master-thesis/) who’s goal is to summeraize and further describe the overall goals and design of my master thesis.

    I will also – in time – further work on the bigram index, as I want to see its full working potential one a more real-life collection. In the beginning I will use the dumps provided by the wonderful Wikipedia foundation. These dumps are several gigabytes with pure text (and some metadata). I realize that the content of each wikipedia article may not fully reflect typical websites on the internet, but it is a start. The next step I’ve made myself is to find a sufficiently large website, and then index all the data on it. Then, to check how the bigram index performs on it.

    I will most likely keep further developments in the Java programming language, as it is the language which Apache Lucene is written in. However, I’m also quite interessted in writing a Python analyzer for the PyLucene package (Python port of Lucene).