<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fellinghaug Blog &#187; bigram</title>
	<atom:link href="http://asbjorn.fellinghaug.com/blog/tag/bigram/feed/" rel="self" type="application/rss+xml" />
	<link>http://asbjorn.fellinghaug.com/blog</link>
	<description>&#62;&#62;&#62; from fellinghaug import asbjorn; asbjorn.play()</description>
	<lastBuildDate>Thu, 19 Nov 2009 21:22:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Concerning my master thesis</title>
		<link>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/#comments</comments>
		<pubDate>Fri, 30 Jan 2009 14:36:37 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[development]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[bigram]]></category>
		<category><![CDATA[bigram index]]></category>
		<category><![CDATA[fellinghaug]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[master thesis]]></category>
		<category><![CDATA[search engine]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/blog/?p=181</guid>
		<description><![CDATA[Hi everyone.
I just want to inform that I&#8217;ve taken some further steps to describe and provide my master thesis. I have written a page (http://asbjorn.fellinghaug.com/blog/master-thesis/) who&#8217;s goal is to summeraize and further describe the overall goals and design of my master thesis.
I will also &#8211; in time &#8211; further work on the bigram index, as [...]]]></description>
			<content:encoded><![CDATA[<p>Hi everyone.</p>
<p>I just want to inform that I&#8217;ve taken some further steps to describe and provide my master thesis. I have written a page (<a title="Master thesis" href="http://asbjorn.fellinghaug.com/blog/master-thesis/">http://asbjorn.fellinghaug.com/blog/master-thesis/</a>) who&#8217;s goal is to summeraize and further describe the overall goals and design of my master thesis.</p>
<p>I will also &#8211; in time &#8211; further work on the <strong>bigram index</strong>, as I want to see its full working potential one a more <em>real-life</em> collection. In the beginning I will use the dumps provided by the wonderful Wikipedia foundation. These dumps are several gigabytes with pure text (and some metadata). I realize that the content of each wikipedia article may not fully reflect typical websites on the internet, but it is a start. The next step I&#8217;ve made myself is to find a sufficiently large website, and then index all the data on it. Then, to check how the bigram index performs on it.</p>
<p>I will most likely keep further developments in the Java programming language, as it is the language which Apache Lucene is written in. However, I&#8217;m also quite interessted in writing a Python analyzer for the PyLucene package (Python port of Lucene).</p>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2009/01/concerning-my-master-thesis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The code for my master thesis</title>
		<link>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/</link>
		<comments>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 10:06:04 +0000</pubDate>
		<dc:creator>Asbjørn Alexander Fellinghaug</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[bigram]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[phrase]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[thesis]]></category>

		<guid isPermaLink="false">http://asbjorn.fellinghaug.com/blog/?p=52</guid>
		<description><![CDATA[Hi everyone.
I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code &#8211; a lot later than initial planned though. I&#8217;m not however totally satisfied with the final code, since it may give [...]]]></description>
			<content:encoded><![CDATA[<p>Hi everyone.</p>
<p>I have now rewritten some items in the source code of my master thesis, in addition to write some javadoc to make it more comprehensible. So, I will now publish the whole code &#8211; a lot later than initial planned though. I&#8217;m not however totally satisfied with the final code, since it may give the impression that it is a &#8220;run-and-play&#8221; code, which it is not. Also, I would recommend reading my <a title="Master thesis" href="http://asbjorn.fellinghaug.com/blog/master-thesis/">master thesis</a>, as a lot of the concepts in the source code is in much more extent defined there.</p>
<p>I would also like to emphasize that the important thing in the source code is the <strong>DocumentAnalyzer.java</strong>#PhraseFilter3, which is responsible for manipulating the Lucene index into promoting phrase searching capabilities, as discussed in my master thesis.</p>
<p><a href="http://asbjorn.fellinghaug.com/blog/wp-content/uploads/2008/08/lucene_green_300.gif"><img class="alignnone size-medium wp-image-53" title="lucene_green_300" src="http://asbjorn.fellinghaug.com/blog/wp-content/uploads/2008/08/lucene_green_300.gif" alt="" width="300" height="46" /></a></p>
<p>The code is available in both tar.gz and zip compression:</p>
<ul>
<li><a title="Baldr code" href="http://asbjorn.fellinghaug.com/filer/master/Baldr_code.tar.gz">Baldr_code.tar.gz</a></li>
<li><a title="Baldr code" href="http://asbjorn.fellinghaug.com/filer/master/Baldr_code.zip">Baldr_code.zip</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
