• 30Aug
    Categories: java, lucene, school Comments: 0

    Hi everyone.

    I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code - a lot later than initial planned though. I’m not however totally satisfied with the final code, since it may give the impression that it is a “run-and-play” code, which it is not. Also, I would recommend reading my master thesis, as a lot of the concepts in the source code is in much more extent defined there.

    I would also like to emphasize that the important thing in the source code is the DocumentAnalyzer.java#PhraseFilter3, which is responsible for manipulating the Lucene index into promoting phrase searching capabilities, as discussed in my master thesis.

    The code is available in both tar.gz and zip compression:

  • 02Aug

    Hei hei.

    I dag er min siste arbeidsdag hos NTNU-IT, også kjent som ITEA. NTNU-IT er den sentrale IT-avdelingen hos NTNU i Trondheim. Jeg har arbeidet der som studentansatt siden høsten 2003, da jeg startet å studere ved universitet. Jeg må absolutt si at det har vært en ekstrem lærerik og inspirerende jobb, og at jeg har vært veldig heldig med å finne en jobb ved siden av studiene som er svært relevant i forhold til studiet mitt. Og som ligger på samme plass (bokstavelig talt) som jeg studerer.. :)

    NTNU-IT har veldig mange servere, derav et eget tungregningssenter. Litt kule maskiner, så måtte jo seff vise et bilde av et lite cluster de har..

    Jeg vil gjerne få takke alle ansatte i NTNU-IT for mange gode minner, og ønsker dere alle lykke til videre.

    Neste jobb som nå venter meg er konsulentselskapet Bouvet her i Trondheim, der jeg skal arbeide som IT-konsulent. I form av arbeidsoppgaver vil jeg primært jobbe med software utvikling, og da spesielt innen opensource og Java. Jeg må si jeg gleder med til å komme igang. Er dog en smule skremmende, da dette er det første steget mot det “voksne” liv, men jeg tror jeg kommer til å trives i den nye jobben og situasjonen. Ser frem til gode og utfordrende dager hos Bouvet.

  • 27Jun

    Since my master thesis is now delivered, I will dedicate some time to clean up the code and thoroughly document it. When I’m finished and the code is clean, I will make it freely available to the Apache Lucene community.

    I will also make my master thesis freely available for download (see my “master thesis” page for more info), so documentation regarding the code is somewhat covered by the thesis. Also, the abstract goals for the code (since the code reflects the experiment) is outlined in the thesis, in addition to a presentation regarding the results and observations made.

    Since I’m a huge fan of Python, I also thought of experiment with the performance of python and my bigram index. I would love to further enhance and maybe introduce some new improvements and such.. In time, I will create a project page for the “Bigram index” beneath my future django/turbogears website http://asbjorn.fellinghaug.com/

  • 09Jun

    Woho!

    I’ve finally completed and delivered my master thesis. At exactly kl.23.54 yesterday (Sunday 8.June 2008), I delivered my master thesis to my faculty delivery system, also know as DAIM. It was a huge relief.. :)

    In a short time, I will provide the PDF here, as well as code and such.

  • 26May

    Puff. This semester must be the most demanding one, as the master thesis should be completed, and all necessary writing, coding, analyzing, etc should be comprehensible presented in the report.

    One observation I’ve made is that concepts and understandings which I find quite simple and understandable in my own words, may have the opposite effect on others. It is hard to mentally set yourself behind the wheels of another person, and try to write explanations as simple and understandable that almost everyone can get the idea at the first try. This is where figures comes in handy. One tips: Learn to use the Dia tool early, because you will need it. Also, gnuplot is extremely usefull.. :)

    It seems like for each day goes by, I only produce more and more post-its on my desk.

    The final delivery date for my master thesis is now 8.June 2008, as I managed to get an postponement from my faculty. The second after delivery, I’ll go straight home and pop up a ice-cold refreshing beer. Can’t wait.. :)

    The above picture is from my working desk. Notice the left-most pc beneath the desk - that’s my feet-relaxer box. Efficient resource usage one might say. Also, the post-it’s, every geek got to love post-it’s…

    Every working environment needs a whiteboard - in fact, every home needs a whiteboard. That shall be one of the first things I will purchase when I’ve delivered my master thesis… :) This working environment is shared by 12 others master students, and a room full of 13 geeks

    needs some “time-killers”.. For instance, we have a racetrack, as shown below:

  • 05May

    I’ve been taken by the Twitter storm these days.. Damn, I should focus a hole lot more on my master report. Well, this took me only one little hour, so it’s not that waste of time.. :) So, I guess you have heard about the new “facebook” called Twitter? Well, its this new web community thing were people can write their current status for what they are doing in the world.. And, of course, one can follow friends and pay attention to were / what they are doing.. Now, after some time I found it rather heavy to enter the twitter webpage, login, and then post a new twitter message for each time I want to update my status. So, as a python fan I am, I created myself a little python script to capture this problem. It relies on the python-twitter module available at the Google Code pages. So, lets have a look at the code. I have named this file “update.py”, however feel free to rename it.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    
    #!/usr/bin/env python
    import twitter
    import sys
     
    USERNAME=""
    PASSWORD=""
     
    def postNewMessage(msg):
        api = twitter.Api()
        api = twitter.Api(username="", password="")
     
        if isinstance(msg, list):
            msg = " ".join(msg)
        msg = unicode(msg, "utf-8")
        if len(msg) > 140:
            print "ERROR: Message can't be over 140 chars."
            return
        try:
            api.PostUpdate(msg)
            print "OK. Was %i chars in msg." % len(msg)
        except Exception, e:
            print "FUck.."
     
        api.ClearCredentials()
     
    if __name__ == "__main__":
        if len(sys.argv) > 1:
            t = sys.argv[1:]
            if len(t) == 1 and len(t[0]) > 10:
                # writes ./update "hi there mate"
                postNewMessage(t[0])
            else:
                # writes ./update hello world
                postNewMessage(sys.argv[1:])
        else:
            print "fuck"
  • 01May

    Hi.

    In the worst writing rush in my master thesis I found it quite relaxing to enjoy some other reading than research papers and such. What I wanted to do is to create a master robot to control a set of other robots, whereas a robot is just a simple program running on a host. Pretty much the same thing that goes on in Botnets, but this is not a botnet though. My goal is to create a crawling system to crawl URIs, and download and parse the documents retrieved into plain text. Then, to create a forward index in a efficient data structure. When this is done, I would make my not-yet-constructed indexer index those documents and construct an inverted index.

    Now, this project is not meant to be any “enterprise” software or such, only a hobby or shall I say interest of my.

    However, the core in such a crawling system is that is needs a way to communicate with each other. So, I’ve explored some distributed approaches performed in Python, and come up with some alternatives:

    • Pyro: Python Remote Object is very much the same as the good old CORBA.
    • XML-RPC: Has the advantage of beeing extreme simplistic. However, there are some overhead of using XML to transport remote-procedure-calls. I would guess that it is quite application dependent (do you need speed or simplicity?)
    • SOAP: Doh.. SOAP is a very heavy protocol widely used in web services. It is also very simple, given the appropriate tools, but a little deprecated in light of the other approaches.
    • Sockets: Using pure TCP/UDP sockets (low-level) is significantly fast in contrast to the above approaches. However, it requires some more programming and hence more time. There is a tradeoff here somewhere..

    We will here illustrate a simple “client-server” example based on the Pyro module. Our goal is to add two integers and return the result. Lets first consider the server:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    #!/usr/bin/env python
    import Pyro.core
    import Pyro.naming
     
    class MathServer(Pyro.core.ObjBase):
        def add(self, a, b):
            return a+b
     
    daemon=Pyro.core.Daemon()
    ns=Pyro.naming.NameServerLocator().getNS()
    daemon.useNameServer(ns)
    uri=daemon.connect(MathServer(),"mathserver")
    daemon.requestLoop()
    </code>
    </div>
    So, basically what we do here is to create a simple Class (MathServer) which enherits from Pyro.core.ObjBase. Next we simply create our methods as for a standard class. Later on we "start" the server, and connects an instance of MathServer to our Pyro server and awaits incoming connections. Lets now consider the Pyro client:
    <div class="code">
    <code>
    #!/usr/bin/env python
    import Pyro.core
    mathserver = Pyro.core.getProxyForURI("PYRONAME://mathserver")
    print mathserver.add(1000, 2500) # will return 3500.
    print mathserver.add("hello ", "world") # will return "hello world"

    In our client code we take advantage of the simple name service which is shipped with Pyro. Basically what happens is that we instruct our client to lookup the Pyro name server and then connect to the provided controller named “mathserver” (recall from the server code?).

    One important note is that to be able to use the name server that is shipped with Pyro, it will need to be started. In Ubuntu Gutsy the name server is disabled by default, so changing the value “ENABLED=0″ to 1 in file /etc/default/pyro-nsd, and then “/etc/init.d/pyro-nsd start” will do the trick.

  • 25Apr
    Categories: Funny Comments: 2

    Hi everyone.

    Allow me to share a rather embarrassing episode I had yesterday that resulted in the doom of my laptop screen. So, I was on my way home from another hard-working-good-lookin day at school, and as always I was kind of stressed. The “walking” part was to the parking lot not far from the school, and I was talking to a friend at the same time as I had my brother on the phone. (You know, we boys can’t multitask things). When I was about to unlock my car from the passenger side, I had to drop my backsack on the ground infront of my car. Unlocking my car from the passenger side and then open the driver door from the inside is the algorithm to unlock my car (quite old car). The idea was that I was suppose to fetch my backsack when walking around the car. However, in all the mess and stress, I take the other way around the car.

    Jumping into the car thinking everything is OK, igniting the car, adjusting the stereo, need the right moood and sound before driving you know. Then, shifting into 1th gear and flooring the throttle. I was soon to discovery a rather unusual sound from the front of the car. Hmm, what could it be? Hitting the break a millisecond later, I came to realize that I was missing my backpack. WTF?! Backup up the car I find my backpack rather flat and dusty. The first thing that goes through my mind was the laptop…. Was it OK? Taking a quick physical check tells me it survived, so far so good. But, what’s that next to my laptop? Noooooooo, my coffee cup is broooken!! So, I drive home rather grumpy, but thinking it went OK since my laptop survived. I keept that thought to the next morning. Sitting in the reading hall at the university I suddenly realized that my laptop screen was big time fucked. Unable to read a single character.

    Now, you may have your bad mornings, but this as so far been my worst..

    EDIT: I’ve now added a photo of my laptop.. I haz proof!

  • 21Apr
    Categories: school Comments: 0

    Woho, deadline for master rapporten (også kjent som “Diplomen”) nærmer seg med stormskritt. Litt i overkant av en måned igjen til deadline (1.Juni). Blir en voldsom mengde espresso og sene kvelder fremover, da det bestandig er ting man bare forandre på.

    For alle andre som skal eller driver på å skriver diplomen sin, så vil jeg bare ønske dere lykke til. Det kan i begynnelsen virke som en ubestridelig stor oppgave, men såfremt man tar det et steg av gangen så er det faktisk lys i enden av tunnelen.

    En annen ting: Kreativiteten for alt annet enn selve master oppgaven vokser med ca 80%!

  • 20Apr
    Categories: Funny Comments: 2

    Okey, so this post is somewhat special since its not geeky in any way, it is much more an observation from my side.

    At NTNU there is several lavatories around, and there is one observation in the gentlemens lavatory thats been bugging me. In most lavatories at NTNU there is two urinals pr lavatory, and here’s the case. You know when you’re just let the “water” flow and there is one “spare” place besides you and there is not much space between the urinals? Well, for some reason many people find it natural to occupy the urinal right next to the one you’re using. Those people are wrong! The available urinal next to another one in use is not meant to be used, its purpose is only psychological.This behavior is only allowed in cases where there is at least more than a half meter between each urinal!

    So, to summeraize, if you find yourself in this situation and you’re that poor person already in action at the urinal, then you’re entitled to take verbal action. If you’re that other horrible person, then I would urge you to think twice before you use the urinal.

    Urinal