Open Source Tools Applied to Text Data Recovery in Big Data Environments

Abstract

As the volume of data on the web continue to increase, it is getting more challenging for the search mechanism to find with a high precision rate what the users want to find. As a solution to improve these results, the development of a recommender engine, based on the content of the documents, would prove itself very useful. In this context, this research has the objective to show how the current search and indexing tools could be improved with recommendation, Machine Learning and textual analysis algorithms. The idea behind these project would be to, based on the content of the documents recovered in the search, find similar documents using most of the Open Source technology we have available right now

    Similar works