• 19Nov

    Hi there!

    In my current work I’m working on optimizing some parallel software, and basically tries to make programs run faster. Within this work there are focus areas such as I/O and memory utilization, which are key areas when trying to optimize software. Generally when people think of highly optimized software, they think of C/C++ and possible assembly.

    Python is very simple in its syntax, and its speed is not at all bad. From a HPC (High Performance Computing) perspective Python may not look that interesting, but combining Python’s simplicity with C’s speed, you’ll get the best from both worlds.

    Since many of the core modules, and Python itself is written in C, it is possible to further extend Python in C. Even the official Python documentation (http://docs.python.org) have a whole section covering the Python C API. By downloading the Python development files (and of course the GNU compiler; apt-get install build-essential), then you’re ready to create C extensions for Python. The overall goal for your C extensions should be to only do the very compute intensive tasks, and keep the other stuff in Python. A key observation here is to identify which parts of your program you may need to create a C extension for – and for this to be found you can use a profiler. Included in Python (“batteries included”) there are a couple of profilers:

    • import profile
    • import cProfile
    • import hotshot

    I will not cover  these python profilers here, but rather inspire you to try them out. They might save you a lot of work and give you valuable knowledge of your software. Do some reading of python profiling here.

    Now lets create a very simple (and extremely stupid) python extension in C. We want it to contain two methods:

    1. a method which allows you to run shell commands by passing a string
    2. a method which returns some text based on your input

    To create this extension we follow these steps: a) decide the python module name b) create the C file with the same name as the python module name c) start programming C! Let’s name our module for “cool” (not the best name)..

    This is our cool.c file:

    #include <stdlib.h>
    #include <stdio.h>
    #include <Python.h>
     
    /*
     * Takes a string argument, which is the shell command, and runs it.
     * Returns the return code of the system(cmd) call.
     * */
    static PyObject *cool_command(PyObject *self, PyObject *args) {
        const char *cmd;
        int retval;
     
        if(!PyArg_ParseTuple(args, "s", &cmd))
            return NULL;
     
        retval = system( cmd );
        return Py_BuildValue("i", retval);
    }
     
    /*
     * Takes a string argument, and assembles it into a new python string
     * object and returns it.
     * */
    static PyObject *cool_greet(PyObject *self, PyObject *args) {
        char *input;
        char *resp = "Hi there: ";
     
        if(!PyArg_ParseTuple(args, "s", &input))
            return NULL;
     
        char *retstr = (char *) malloc( sizeof(char)* (strlen(input)+strlen(resp)) + 1 ); // +1 &lt;- null termination
        if (retstr == NULL)
            return NULL;
     
        PyObject *retString;
     
        strcpy( retstr, resp );
        strcat(retstr, input);
     
        retString = Py_BuildValue("s", retstr);
        Py_INCREF(retString);
        free(retstr);
     
        return retString;
    }
     
    /* an array of PyMethodDef structure. A PyMethodDef structure is
    used to describe a method of an extension type. */
    static PyMethodDef CoolMethods[] = {
        {"command", cool_command, METH_VARARGS, "Execute a shell command." },
        {"greet", cool_greet, METH_VARARGS, "Returns a greeting to the caller."},
        {NULL, NULL, 0, NULL}
    };
     
    PyMODINIT_FUNC initcool(void)
    {
        (void) Py_InitModule( "cool", CoolMethods );
    }

    This C extension file is built up with the following key sections:

    • Inclusion of headers.
    • static PyObject methods which constitutes the visible python functions of our extension module.
    • An array of PyMethodDef structures. A PyMethodDef structure describes a method of an extension type.
    • A initialization method which is named “init<modulename>“. This method has the sole purpose of calling the Py_InitModule function which takes the name and the PyMethodDef array.

    So there you go, this is our Python extension implemented in C with the help of the Python development headers. Now we’ll have a look at our Distutils script (setup.py) that will assist us with the compilation and creation of the actual python module.

    This is our setup.py file:

    from distutils.core import setup, Extension
     
    module1 = Extension('cool',
            sources = ['cool.c'] )
     
    setup ( name = 'CoolPackage',
            version = '0.1',
            description = 'A descriptive and informal C extension to Python',
            ext_modules = [module1] )

    jada..

    This file is much self-explained. We have basically one method named “setup” which takes a bunch of arguments. These arguments is the package name, version, description and a set of Extension objects. These Extension objects describes our C extension files. By simply running this command:

    python setup.py build

    When the compilation is done, you’ll have a build folder in the same directory as the setup.py. Go into the “build/lib.linux-i686-2.6/“. Then type “python” or “ipython”, and then “import cool”. Now you actually have loaded the “cool” module, and you may call “cool.greet(’Alex’)” and/or “cool.command(’ls /’)” and the actual computing happens in the C world instead of the Python world.

    Now, keep in mind that this C extension isn’t actual doing anything useful. But, given that you have some algorithm or other problem to solve, and the time is of the essence, then utilizing the power which lays in this C extensions can give you significant time savings.

  • 21Mar

    Since the very important and big release of Python v3.0 (also knowed as “Python 3000″) in december, there has been some minor bugfixes and further improvements. Now, in Python v3.1, we may get to feel these bugfixes and enhancements.

    Python Logo

    Python Logo

    Of important changes we may highlight such as:

    • The IO module have been reimplemented in C for gaining even more speed
    • Decoding of utf-8, utf-16 and latin1 is know from 2x to 4x times faster than before
    • int and str comparisons is know faster

    So, for all Python v3.0 people out there, I would suggest upgrading to this lastest version. Visit this site: http://www.python.org/download/releases/3.1/

  • 30Jan

    Hi everyone.

    I just want to inform that I’ve taken some further steps to describe and provide my master thesis. I have written a page (http://asbjorn.fellinghaug.com/blog/master-thesis/) who’s goal is to summeraize and further describe the overall goals and design of my master thesis.

    I will also – in time – further work on the bigram index, as I want to see its full working potential one a more real-life collection. In the beginning I will use the dumps provided by the wonderful Wikipedia foundation. These dumps are several gigabytes with pure text (and some metadata). I realize that the content of each wikipedia article may not fully reflect typical websites on the internet, but it is a start. The next step I’ve made myself is to find a sufficiently large website, and then index all the data on it. Then, to check how the bigram index performs on it.

    I will most likely keep further developments in the Java programming language, as it is the language which Apache Lucene is written in. However, I’m also quite interessted in writing a Python analyzer for the PyLucene package (Python port of Lucene).

  • 18Nov

    Hi everyone.

    A couple of times now I’ve been amazed over how many people who is still unaware of the IPython. From the IPython webpage, a very short summary of what IPython is “Enhanced interactive Python shell”. The python programming language is surrounded its interpreter, which facilitates dynamic typing and execution. This feature sufficiently increases productivity as there is no problem to test and try code snippets on-the-fly.

    The IPython is a further extension of the standard python interpreter, as IPython provides more features in the python shell, such as auto-completion of imported modules, syntax highlightning, colors, and a variouse other usefull commands and features.

    IPython is highly flexible in terms of providing the user with the possibility to extend the python shell even further with custom commands (called magic commands). There is also an even tigther integration between the python interpreter and the underlying shell, such as bash,csh, etc. It is for instance much simpler to list files and folders, by just typing “ls” directly into the shell. Even commands such as “mkdir”, “mv”, “rm” is builtin, and its trivial to further extend the shell command vocabulary with more complex commands. We’ll show an example for howto extend with custom commands below.

    As every flexible software, IPython comes with a main configuration file ($HOME/.ipython/ipythonrc). If we wanted a custom command, such as “chmod <mod> <file>” (chmod 755 myfile.py), we could add this to the “ipythonrc” file:


    # my custom chmod alias. By typing '>>> chmod 755 myprog.py' or
    # '>>> chmod a+rx myprog.py' IPython will execute this
    # statement as a shell command.
    alias chmod chmod %s %s

    Also, debugging lists (tuples, dictionaries, etc) is more readable within the IPython, as it wraps all such print statements inside the “pprint” (pretty-print) module, and therefore a comprehensible representation will find place.

    So, if you often find yourself in the python interpreter, I would highly recommend you spending a half an hour to get to known IPython. I promise you – it will save you a lot of headaches in the future.

  • 27Oct
    Categories: python Comments: 0

    Finally!

    Its time for the monthly edition of the Python Magazine, which is a highly interesting and technial magazine regarding the python programming language.

    python magazine

    python magazine

    Everyone who is in to Python should subscribe to this magazine, as it covers many “hot” topics, as well as presenting many howto tutorials for everyday challenges. From cutting edge web applications and frameworks, to desktop applications and backbone server implementations.

  • 17Oct

    Hepp hepp!

    VIM (Vi IMproved) er en tekst editor som bygger på den svært gamle og anerkjente editoren VI. VIM (heretter kalt Vim) er en svært avansert teksteditor som har fra første stund lagt til rette for å være svært konfigurerbar. Tekst editoren har sitt eget “skript” språk der enkle trivielle operasjoner er representert ved små kommandoer, som i sammenheng kan utføre mange spesialiserte operasjoner (alt etter hva ønsket er).

    Vim har hatt et rykte på seg for å være en vanskelig editor, da den krever en litt spesiell tankegang i begynnelsen. Dog, etter å ha kommet over den første milepælen på læringskurven, så er verden forståelig igjen. Vim har forskjellige “modus” der man kan gjøre forskjellige ting. For eksempel har man skrivemodus som tilrettelegger for at det man skriver på tastaturet blir skrevet til dokumentet. Man har også et kommandomodus, der man skriver kommandoer som Vim skal utføre på dokumentet. Typiske kommandoer kan være “slett linjen, slett tegn, finn tekst, finn og erstatt tekst, flytt tekst, kopier, klipp ut, lim inn, etc..”.

    Nå detter jeg litt ut her, men kan tipse om dette dokumentet (på norsk) som beskriver Vim på en mye bedre måte. Siden Vim er så fleksibel og modulær som den er, så har mange aktive brukere skrevet “plugins” som er små spesialiserte programsnutter til Vim. Disse programsnuttene er gjerne direktet myntet mot en spesifikk oppgave; fargelegge spesielle ord, tegn, sørge for korrekt formatering gitt en spesiell filtype, etc. For de av dere som kjenner programmeringsspråket Python, så trenger jeg sikkert ikke å si at det kreves litt av din IDE (Integrated Development Environment) for å sørge for riktig indent for blokker med kode, samt korrekt syntax, etc. Med andre ord vil en bra IDE hindre unødvendige og slurvete feil i Python kode, derfor kreves det en bra IDE for å effektivt skrive rask og bra kode.

    Nå kommer det som sikkert ingen overraskelse at Vim fungerer utmerket som en IDE, selv om den aldri var ment for å fungere som det. Dette er en av de flotte egenskapene til Vim: den er skrevet såpass modulært at man faktisk kan sette på byggeklosser for å spesialisere den innenfor et felt. Det finnes mange python “plugins” til Vim, slik som fargelegging av Python kode (se bilde under/til siden). Andre automatiske oppgaver man gjerne vil delegere til “plugins” er at når man trykker “enter” for linjeskift så skal man holde seg på samme innrykksnivå. Dette fordi at Python ikke benytter slike klammeparanteres {} som f.eks C++ og Java.

    Automatisk feilkorrigering er også ting man gjerne vil ta høyde for i en IDE, og det finnes det gode løsninger for i Vim. Deriblant finnes det “plugins” som forsikrer seg om at det er riktig syntaks på slike logiske uttrykk som if-elif-else krever, samt at det bestandig er en kolon helt til høyre for slike uttrykk, og at alle strenger startes og avsluttes av fnutter, slikt som ’streng1′, “streng2″, “”"streng3″”". En annen faktor som har i større grad uttrykket styrken til en IDE er dens muligheter til å tilby auto-completion av kode. Ta for eksempel følgende lille kodesnutt:

    import sys
    if sys.platform == 'linux2':
        print "you'r surely using Linux, right?"

    så, når jeg så skriver ’sys.’ og vil gjerne ha opp hvilke funksjoner/klasser/variabler som kan aksesseres fra modulen ’sys’, så vil jeg gjerne at IDE’en skal presentere en liste over disse alternativene.

    Vim har innebygd en auto-completion feature, men denne baserer seg på alle enkeltord som allerede er skrevet i dokumentet. Så, gitt kodesnutten ovenfor så kan man taste “CTRL+n” for å få opp en liste med alternativer. Men, denne vil ikke funke for vårt problem, da vi gjerne vil ha alle funksjoner/klasser/variabler som ligger under modulen “sys”. Frykt ikke – det fins råd. I denne blogposten er det beskrevet hvordan man med enkle funksjoner kan muliggjøre en mer avansert auto-completion for Python. Merk at denne featuren krever at Vim er kompilert med Python støtte (de fleste Linux distroer har dette). Så, dersom du tar å editerer din $HOME/.vimrc fil, og legger til følgende:


    " enables python completion
    autocmd FileType python set omnifunc=pythoncomplete#Complete
    " maps CTRL+SPACE to autocompletion function
    inoremap <Nul> <C-x><C-o>

    Nå kan man enkelt få til den ønskede funksjonaliteten ved å skrive “sys.” også taste CTRL+SPACE. Da vil Vim editoren legge til et ekstra vindu helt øverst med forklarende tekst for den aktuelle funksjonene/klassen/variabelen, samt at man ved hjelp av piltastene kan bevege seg nedover i listen.

    Merk at dette er bare en av flere nyttige funksjoner for Vim som en IDE for Python. Vil på det sterkeste anbefale deg å lese hele denne blogposten* dersom du befinner deg i en situasjon der du gjerne vil benytte Vim til å kode Python. Legg også merke til at Vim kjører på mange plattformer: Linux, Mac, Windows. Og, den trenger ikke å kjøre i et terminalvindu, som man ofte ser på skjermbildene. Det finnes en GUI versjon (kalt gVim).

    * http://blog.sontek.net/2008/05/11/python-with-a-modular-ide-vim/

  • 03Oct
    Categories: python Comments: 0

    Hi everyone!

    2. October the python developmeant team released the Python v2.6. At http://planet.python.org there is a whole lot of other blogs listing all the new cool features, so I won’t use any space outlining them here.

    However, I will encourage everyone to have a look at the documentation, as there is some valuable key features in this release which will in the long run revolutionize the way we program in Python.

    In time I will cover some of these features.

  • 27Jun

    Since my master thesis is now delivered, I will dedicate some time to clean up the code and thoroughly document it. When I’m finished and the code is clean, I will make it freely available to the Apache Lucene community.

    I will also make my master thesis freely available for download (see my “master thesis” page for more info), so documentation regarding the code is somewhat covered by the thesis. Also, the abstract goals for the code (since the code reflects the experiment) is outlined in the thesis, in addition to a presentation regarding the results and observations made.

    Since I’m a huge fan of Python, I also thought of experiment with the performance of python and my bigram index. I would love to further enhance and maybe introduce some new improvements and such.. In time, I will create a project page for the “Bigram index” beneath my future django/turbogears website http://asbjorn.fellinghaug.com/

  • 05May

    I’ve been taken by the Twitter storm these days.. Damn, I should focus a hole lot more on my master report. Well, this took me only one little hour, so it’s not that waste of time.. :) So, I guess you have heard about the new “facebook” called Twitter? Well, its this new web community thing were people can write their current status for what they are doing in the world.. And, of course, one can follow friends and pay attention to were / what they are doing.. Now, after some time I found it rather heavy to enter the twitter webpage, login, and then post a new twitter message for each time I want to update my status. So, as a python fan I am, I created myself a little python script to capture this problem. It relies on the python-twitter module available at the Google Code pages. So, lets have a look at the code. I have named this file “update.py”, however feel free to rename it.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    
    #!/usr/bin/env python
    import twitter
    import sys
     
    USERNAME=""
    PASSWORD=""
     
    def postNewMessage(msg):
        api = twitter.Api()
        api = twitter.Api(username="", password="")
     
        if isinstance(msg, list):
            msg = " ".join(msg)
        msg = unicode(msg, "utf-8")
        if len(msg) > 140:
            print "ERROR: Message can't be over 140 chars."
            return
        try:
            api.PostUpdate(msg)
            print "OK. Was %i chars in msg." % len(msg)
        except Exception, e:
            print "FUck.."
     
        api.ClearCredentials()
     
    if __name__ == "__main__":
        if len(sys.argv) > 1:
            t = sys.argv[1:]
            if len(t) == 1 and len(t[0]) > 10:
                # writes ./update "hi there mate"
                postNewMessage(t[0])
            else:
                # writes ./update hello world
                postNewMessage(sys.argv[1:])
        else:
            print "fuck"
  • 01May

    Hi.

    In the worst writing rush in my master thesis I found it quite relaxing to enjoy some other reading than research papers and such. What I wanted to do is to create a master robot to control a set of other robots, whereas a robot is just a simple program running on a host. Pretty much the same thing that goes on in Botnets, but this is not a botnet though. My goal is to create a crawling system to crawl URIs, and download and parse the documents retrieved into plain text. Then, to create a forward index in a efficient data structure. When this is done, I would make my not-yet-constructed indexer index those documents and construct an inverted index.

    Now, this project is not meant to be any “enterprise” software or such, only a hobby or shall I say interest of my.

    However, the core in such a crawling system is that is needs a way to communicate with each other. So, I’ve explored some distributed approaches performed in Python, and come up with some alternatives:

    • Pyro: Python Remote Object is very much the same as the good old CORBA.
    • XML-RPC: Has the advantage of beeing extreme simplistic. However, there are some overhead of using XML to transport remote-procedure-calls. I would guess that it is quite application dependent (do you need speed or simplicity?)
    • SOAP: Doh.. SOAP is a very heavy protocol widely used in web services. It is also very simple, given the appropriate tools, but a little deprecated in light of the other approaches.
    • Sockets: Using pure TCP/UDP sockets (low-level) is significantly fast in contrast to the above approaches. However, it requires some more programming and hence more time. There is a tradeoff here somewhere..

    We will here illustrate a simple “client-server” example based on the Pyro module. Our goal is to add two integers and return the result. Lets first consider the server:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    #!/usr/bin/env python
    import Pyro.core
    import Pyro.naming
     
    class MathServer(Pyro.core.ObjBase):
        def add(self, a, b):
            return a+b
     
    daemon=Pyro.core.Daemon()
    ns=Pyro.naming.NameServerLocator().getNS()
    daemon.useNameServer(ns)
    uri=daemon.connect(MathServer(),"mathserver")
    daemon.requestLoop()
    </code>
    </div>
    So, basically what we do here is to create a simple Class (MathServer) which enherits from Pyro.core.ObjBase. Next we simply create our methods as for a standard class. Later on we "start" the server, and connects an instance of MathServer to our Pyro server and awaits incoming connections. Lets now consider the Pyro client:
    <div class="code">
    <code>
    #!/usr/bin/env python
    import Pyro.core
    mathserver = Pyro.core.getProxyForURI("PYRONAME://mathserver")
    print mathserver.add(1000, 2500) # will return 3500.
    print mathserver.add("hello ", "world") # will return "hello world"

    In our client code we take advantage of the simple name service which is shipped with Pyro. Basically what happens is that we instruct our client to lookup the Pyro name server and then connect to the provided controller named “mathserver” (recall from the server code?).

    One important note is that to be able to use the name server that is shipped with Pyro, it will need to be started. In Ubuntu Gutsy the name server is disabled by default, so changing the value “ENABLED=0″ to 1 in file /etc/default/pyro-nsd, and then “/etc/init.d/pyro-nsd start” will do the trick.