<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fellinghaug Blog &#187; lucene</title>
	<atom:link href="http://asbjorn.fellinghaug.com/blog/tag/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://asbjorn.fellinghaug.com/blog</link>
	<description>&#62;&#62;&#62; from fellinghaug import asbjorn; asbjorn.play()</description>
	<lastBuildDate>Thu, 19 Nov 2009 21:22:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Concerning my master thesis</title>
		<link>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/#comments</comments>
		<pubDate>Fri, 30 Jan 2009 14:36:37 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[bigram]]></category>
		<category><![CDATA[bigram index]]></category>
		<category><![CDATA[fellinghaug]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[master thesis]]></category>
		<category><![CDATA[search engine]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/blog/?p=181</guid>
		<description><![CDATA[Hi everyone.
I just want to inform that I&#8217;ve taken some further steps to describe and provide my master thesis. I have written a page (http://asbjorn.fellinghaug.com/blog/master-thesis/) who&#8217;s goal is to summeraize and further describe the overall goals and design of my master thesis.
I will also &#8211; in time &#8211; further work on the bigram index, as [...]]]></description>
			<content:encoded><![CDATA[<p>Hi everyone.</p>
<p>I just want to inform that I&#8217;ve taken some further steps to describe and provide my master thesis. I have written a page (<a title="Master thesis" href="http://asbjorn.fellinghaug.com/blog/master-thesis/">http://asbjorn.fellinghaug.com/blog/master-thesis/</a>) who&#8217;s goal is to summeraize and further describe the overall goals and design of my master thesis.</p>
<p>I will also &#8211; in time &#8211; further work on the <strong>bigram index</strong>, as I want to see its full working potential one a more <em>real-life</em> collection. In the beginning I will use the dumps provided by the wonderful Wikipedia foundation. These dumps are several gigabytes with pure text (and some metadata). I realize that the content of each wikipedia article may not fully reflect typical websites on the internet, but it is a start. The next step I&#8217;ve made myself is to find a sufficiently large website, and then index all the data on it. Then, to check how the bigram index performs on it.</p>
<p>I will most likely keep further developments in the Java programming language, as it is the language which Apache Lucene is written in. However, I&#8217;m also quite interessted in writing a Python analyzer for the PyLucene package (Python port of Lucene).</p>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The code for my master thesis</title>
		<link>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 10:06:04 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[bigram]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[phrase]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[thesis]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/blog/?p=52</guid>
		<description><![CDATA[Hi everyone.
I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code &#8211; a lot later than initial planned though. I&#8217;m not however totally satisfied with the final code, since it may give [...]]]></description>
			<content:encoded><![CDATA[<p>Hi everyone.</p>
<p>I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code &#8211; a lot later than initial planned though. I&#8217;m not however totally satisfied with the final code, since it may give the impression that it is a &#8220;run-and-play&#8221; code, which it is not. Also, I would recommend reading my <a title="Master thesis" href="http://asbjorn.fellinghaug.com/blog/master-thesis/">master thesis</a>, as a lot of the concepts in the source code is in much more extent defined there.</p>
<p>I would also like to emphasize that the important thing in the source code is the <strong>DocumentAnalyzer.java</strong>#PhraseFilter3, which is responsible for manipulating the Lucene index into promoting phrase searching capabilities, as discussed in my master thesis.</p>
<p><a href="http://asbjorn.fellinghaug.com/blog/wp-content/uploads/2008/08/lucene_green_300.gif"><img class="alignnone size-medium wp-image-53" title="lucene_green_300" src="http://asbjorn.fellinghaug.com/blog/wp-content/uploads/2008/08/lucene_green_300.gif" alt="" width="300" height="46" /></a></p>
<p>The code is available in both tar.gz and zip compression:</p>
<ul>
<li><a title="Baldr code" href="http://asbjorn.fellinghaug.com/filer/master/Baldr_code.tar.gz">Baldr_code.tar.gz</a></li>
<li><a title="Baldr code" href="http://asbjorn.fellinghaug.com/filer/master/Baldr_code.zip">Baldr_code.zip</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My master thesis</title>
		<link>http://asbjorn.fellinghaug.com/blog/2008/06/my-master-thesis/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2008/06/my-master-thesis/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 09:12:43 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[thesis]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/blog/?p=36</guid>
		<description><![CDATA[Since my master thesis is now delivered, I will dedicate some time to clean up the code and thoroughly document it. When I&#8217;m finished and the code is clean, I will make it freely available to the Apache Lucene community.
I will also make my master thesis freely available for download (see my &#8220;master thesis&#8221; page [...]]]></description>
			<content:encoded><![CDATA[<p>Since my master thesis is now delivered, I will dedicate some time to clean up the code and thoroughly document it. When I&#8217;m finished and the code is clean, I will make it freely available to the Apache Lucene community.</p>
<p>I will also make my master thesis freely available for download (see my <a href="http://asbjorn.fellinghaug.com/blog/master-thesis/">&#8220;master thesis&#8221;</a> page for more info), so documentation regarding the code is somewhat covered by the thesis. Also, the abstract goals for the code (since the code reflects the experiment) is outlined in the thesis, in addition to a presentation regarding the results and observations made.</p>
<p>Since I&#8217;m a huge fan of Python, I also thought of experiment with the performance of python and my bigram index. I would love to further enhance and maybe introduce some new improvements and such.. In time, I will create a project page for the &#8220;Bigram index&#8221; beneath my future django/turbogears website <a title="Asbjørn Alexander Fellinghaug" href="http://asbjorn.fellinghaug.com">http://asbjorn.fellinghaug.com/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2008/06/my-master-thesis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Apache Lucene &#8211; search engine</title>
		<link>http://asbjorn.fellinghaug.com/blog/2008/04/apache-lucene-search-engine/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2008/04/apache-lucene-search-engine/#comments</comments>
		<pubDate>Sat, 19 Apr 2008 19:54:11 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/wp/?p=10</guid>
		<description><![CDATA[Have you ever heard of the Apache Lucene open-source search library? Well, now you have. It&#8217;s basiclly a big library which have all the necessary technology for high-performance search engine. Lucene is focused on text indexing and searching.
In my master thesis which I&#8217;m currently working on, I&#8217;ve created a prototype software which&#8217;s main goal is [...]]]></description>
			<content:encoded><![CDATA[<p>Have you ever heard of the <a href="http://lucene.apache.org/">Apache Lucene</a> open-source search library? Well, now you have. It&#8217;s basiclly a big library which have all the necessary technology for high-performance search engine. Lucene is focused on text indexing and searching.</p>
<p>In my master thesis which I&#8217;m currently working on, I&#8217;ve created a prototype software which&#8217;s main goal is to build different kinds of indexes and perform a hugh number of searches on them. What I&#8217;m then doing is to collect numbers such as the time it takes to construct the indexes, the disk space needed, the time to perform a huge chunk of queries on each index, and more. With this information I will then analyze and discuss the results in light of phrase searching, which is my master thesis main goal. My master thesis is concerned with how to enhance phrase searching in text indexes, and this is what I will discuss using the numbers extracted from my experiments.</p>
<p>So, if you would like to learn more regarding search technology I would recommend you to have a look at Apache Lucene.</p>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2008/04/apache-lucene-search-engine/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
