News Search Engine

In this project we implemented a fully working news search engine that has crawling, page ranking, indexing and a easy to use simple user interface to retrieve the indexed document.

Our design motivation is to build modular system, where each module will work independently. We tried to design such that failure of one single module won’t bring down the whole system.

We have crawled 574,000 pages and total size of our document collection is about 79GB. Here is the Report and Presentation for the project.


Java, MongoDB, Apache Lucene, Spark Java, MapDB

Development Time

December 2013


Information Retrieval