30Aug
Categories: java, lucene, school
Hi everyone.
I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code - a lot later than initial planned though. I’m not however totally satisfied with the final code, since it may give the impression that it is a “run-and-play” code, which it is not. Also, I would recommend reading my master thesis, as a lot of the concepts in the source code is in much more extent defined there.
I would also like to emphasize that the important thing in the source code is the DocumentAnalyzer.java#PhraseFilter3, which is responsible for manipulating the Lucene index into promoting phrase searching capabilities, as discussed in my master thesis.

The code is available in both tar.gz and zip compression:
27Jun
Since my master thesis is now delivered, I will dedicate some time to clean up the code and thoroughly document it. When I’m finished and the code is clean, I will make it freely available to the Apache Lucene community.
I will also make my master thesis freely available for download (see my “master thesis” page for more info), so documentation regarding the code is somewhat covered by the thesis. Also, the abstract goals for the code (since the code reflects the experiment) is outlined in the thesis, in addition to a presentation regarding the results and observations made.
Since I’m a huge fan of Python, I also thought of experiment with the performance of python and my bigram index. I would love to further enhance and maybe introduce some new improvements and such.. In time, I will create a project page for the “Bigram index” beneath my future django/turbogears website http://asbjorn.fellinghaug.com/
19Apr
Have you ever heard of the Apache Lucene open-source search library? Well, now you have. It’s basiclly a big library which have all the necessary technology for high-performance search engine. Lucene is focused on text indexing and searching.
In my master thesis which I’m currently working on, I’ve created a prototype software which’s main goal is to build different kinds of indexes and perform a hugh number of searches on them. What I’m then doing is to collect numbers such as the time it takes to construct the indexes, the disk space needed, the time to perform a huge chunk of queries on each index, and more. With this information I will then analyze and discuss the results in light of phrase searching, which is my master thesis main goal. My master thesis is concerned with how to enhance phrase searching in text indexes, and this is what I will discuss using the numbers extracted from my experiments.
So, if you would like to learn more regarding search technology I would recommend you to have a look at Apache Lucene.