40,975 research outputs found
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
A pattern mining approach for information filtering systems
It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well
The contribution of data mining to information science
The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research
Discovering the Impact of Knowledge in Recommender Systems: A Comparative Study
Recommender systems engage user profiles and appropriate filtering techniques
to assist users in finding more relevant information over the large volume of
information. User profiles play an important role in the success of
recommendation process since they model and represent the actual user needs.
However, a comprehensive literature review of recommender systems has
demonstrated no concrete study on the role and impact of knowledge in user
profiling and filtering approache. In this paper, we review the most prominent
recommender systems in the literature and examine the impression of knowledge
extracted from different sources. We then come up with this finding that
semantic information from the user context has substantial impact on the
performance of knowledge based recommender systems. Finally, some new clues for
improvement the knowledge-based profiles have been proposed.Comment: 14 pages, 3 tables; International Journal of Computer Science &
Engineering Survey (IJCSES) Vol.2, No.3, August 201
Deep Learning for User Comment Moderation
Experimenting with a new dataset of 1.6M user comments from a Greek news
portal and existing datasets of English Wikipedia comments, we show that an RNN
outperforms the previous state of the art in moderation. A deep,
classification-specific attention mechanism improves further the overall
performance of the RNN. We also compare against a CNN and a word-list baseline,
considering both fully automatic and semi-automatic moderation
- …