• 01May
    Categories: distributed, python

    Hi.

    In the worst writing rush in my master thesis I found it quite relaxing to enjoy some other reading than research papers and such. What I wanted to do is to create a master robot to control a set of other robots, whereas a robot is just a simple program running on a host. Pretty much the same thing that goes on in Botnets, but this is not a botnet though. My goal is to create a crawling system to crawl URIs, and download and parse the documents retrieved into plain text. Then, to create a forward index in a efficient data structure. When this is done, I would make my not-yet-constructed indexer index those documents and construct an inverted index.

    Now, this project is not meant to be any “enterprise” software or such, only a hobby or shall I say interest of my.

    However, the core in such a crawling system is that is needs a way to communicate with each other. So, I’ve explored some distributed approaches performed in Python, and come up with some alternatives:

    • Pyro: Python Remote Object is very much the same as the good old CORBA.
    • XML-RPC: Has the advantage of beeing extreme simplistic. However, there are some overhead of using XML to transport remote-procedure-calls. I would guess that it is quite application dependent (do you need speed or simplicity?)
    • SOAP: Doh.. SOAP is a very heavy protocol widely used in web services. It is also very simple, given the appropriate tools, but a little deprecated in light of the other approaches.
    • Sockets: Using pure TCP/UDP sockets (low-level) is significantly fast in contrast to the above approaches. However, it requires some more programming and hence more time. There is a tradeoff here somewhere..

    We will here illustrate a simple “client-server” example based on the Pyro module. Our goal is to add two integers and return the result. Lets first consider the server:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
    #!/usr/bin/env python
    import Pyro.core
    import Pyro.naming
     
    class MathServer(Pyro.core.ObjBase):
        def add(self, a, b):
            return a+b
     
    daemon=Pyro.core.Daemon()
    ns=Pyro.naming.NameServerLocator().getNS()
    daemon.useNameServer(ns)
    uri=daemon.connect(MathServer(),"mathserver")
    daemon.requestLoop()
    </code>
    </div>
    So, basically what we do here is to create a simple Class (MathServer) which enherits from Pyro.core.ObjBase. Next we simply create our methods as for a standard class. Later on we "start" the server, and connects an instance of MathServer to our Pyro server and awaits incoming connections. Lets now consider the Pyro client:
    <div class="code">
    <code>
    #!/usr/bin/env python
    import Pyro.core
    mathserver = Pyro.core.getProxyForURI("PYRONAME://mathserver")
    print mathserver.add(1000, 2500) # will return 3500.
    print mathserver.add("hello ", "world") # will return "hello world"

    In our client code we take advantage of the simple name service which is shipped with Pyro. Basically what happens is that we instruct our client to lookup the Pyro name server and then connect to the provided controller named “mathserver” (recall from the server code?).

    One important note is that to be able to use the name server that is shipped with Pyro, it will need to be started. In Ubuntu Gutsy the name server is disabled by default, so changing the value “ENABLED=0″ to 1 in file /etc/default/pyro-nsd, and then “/etc/init.d/pyro-nsd start” will do the trick.

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.