Apache Lucene - search engine
Have you ever heard of the Apache Lucene open-source search library? Well, now you have. It’s basiclly a big library which have all the necessary technology for high-performance search engine. Lucene is focused on text indexing and searching.
In my master thesis which I’m currently working on, I’ve created a prototype software which’s main goal is to build different kinds of indexes and perform a hugh number of searches on them. What I’m then doing is to collect numbers such as the time it takes to construct the indexes, the disk space needed, the time to perform a huge chunk of queries on each index, and more. With this information I will then analyze and discuss the results in light of phrase searching, which is my master thesis main goal. My master thesis is concerned with how to enhance phrase searching in text indexes, and this is what I will discuss using the numbers extracted from my experiments.
So, if you would like to learn more regarding search technology I would recommend you to have a look at Apache Lucene.
May 12th, 2008 at 3:35 pm
Hi,
a great subject for a master’s thesis. Have you thought about sharing the code created for testing lucene as an open source project?
Best regards,
Antti
May 12th, 2008 at 3:44 pm
Hi Antti.
Thanks for the reply. After the delivery date (June), I will try share the whole experiment; both the report and the code. One major component of the experiment is that I’ve manipulated the index (into some extent). I’ve achieved this by creating an Analyzer that construct new Tokens based on pair of words. These word pairs is constructed based on high-frequent terms (stopwords).
I will paste a new blog post when my master thesis is complete..
August 18th, 2008 at 11:02 pm
Hi ,
I am also interested in doing and experimenting on Lucene. I plan to do my masters thesis related to Apache Lucene. Brother can you share your ideas and code with me so that i will be very beneficiable and others too in this blog. Good Luck! kindly reply ………….
August 19th, 2008 at 8:19 am
Hi.
Sorry for the slow respons here, I’ve been kind of busy with my new work. I will share my code soon now, just need to rewrite some stuff and make some more comments to give it a better readability.
Hopefully, I will be able to publish the code during this week or the next.:)