•  
  •  
 

e-Research: A Journal of Undergraduate Work

Authors

Matthew Shaffer

Abstract

In this paper, I will explain how I used the probability modeling tool, Markov Models, in combination with Hadoop MapReduce parallel programming platform in order to quickly and efficiently analyses documents and create a probability model of them. I will explain what Markov Models are, give a brief overview of what MapReduce is, explain why Markov models can be used for document analysis, explain my code of the modeling program, and examine the performance of various MapReduce platforms and techniques in analyzing documents.

Share

COinS
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.