Apache Lucene - search engine

Have you ever heard of the Apache Lucene open-source search library? Well, now you have. It’s basiclly a big library which have all the necessary technology for high-performance search engine. Lucene is focused on text indexing and searching.

In my master thesis which I’m currently working on, I’ve created a prototype software which’s main goal is to build different kinds of indexes and perform a hugh number of searches on them. What I’m then doing is to collect numbers such as the time it takes to construct the indexes, the disk space needed, the time to perform a huge chunk of queries on each index, and more. With this information I will then analyze and discuss the results in light of phrase searching, which is my master thesis main goal. My master thesis is concerned with how to enhance phrase searching in text indexes, and this is what I will discuss using the numbers extracted from my experiments.

So, if you would like to learn more regarding search technology I would recommend you to have a look at Apache Lucene.

Tags: , , ,

4 Responses to “Apache Lucene - search engine”

  1. Antti K. Says:

    Hi,

    a great subject for a master’s thesis. Have you thought about sharing the code created for testing lucene as an open source project?

    Best regards,
    Antti

  2. Asbjørn Alexander Fellinghaug Says:

    Hi Antti.

    Thanks for the reply. After the delivery date (June), I will try share the whole experiment; both the report and the code. One major component of the experiment is that I’ve manipulated the index (into some extent). I’ve achieved this by creating an Analyzer that construct new Tokens based on pair of words. These word pairs is constructed based on high-frequent terms (stopwords).

    I will paste a new blog post when my master thesis is complete.. :)

  3. Murf Says:

    Hi ,
    I am also interested in doing and experimenting on Lucene. I plan to do my masters thesis related to Apache Lucene. Brother can you share your ideas and code with me so that i will be very beneficiable and others too in this blog. Good Luck! kindly reply ………….

  4. Asbjørn Alexander Fellinghaug Says:

    Hi.

    Sorry for the slow respons here, I’ve been kind of busy with my new work. I will share my code soon now, just need to rewrite some stuff and make some more comments to give it a better readability.

    Hopefully, I will be able to publish the code during this week or the next.:)

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word