1,939 research outputs found
Evaluating Multilingual Gisting of Web Pages
We describe a prototype system for multilingual gisting of Web pages, and
present an evaluation methodology based on the notion of gisting as decision
support. This evaluation paradigm is straightforward, rigorous, permits fair
comparison of alternative approaches, and should easily generalize to
evaluation in other situations where the user is faced with decision-making on
the basis of information in restricted or alternative form.Comment: 7 pages, uses psfig and aaai style
Detection of topic on Health News in Twitter Data
Abstract: The development and rapid popularization of the internet has led to an exponential growth of data in the network, thus, the text mining becomes more important. Users search for the information from the immense information available online. The ways to obtain valuable information, and to classify, organize and manage vast text data automatically make the text processing even more difficult. Therefore, in order to solve those problems and requirements, intelligent information processing has been extensively studied. Topic modelling has been widely employed in the field of natural language processing. Current research directions are more focused on ways to improve the classification speed and accuracy of text classification and topic detection as well as selecting feature methods in achieving better dimension reduction operations. Latent Dirichlet Allocation (LDA) topic model works well on data noise reduction. The LDA is widely used as a feature model combined with the classifier design in order to achieve a good classification effect. This study aims to conduct data mining and save load from the huge database. Thus, three supervised learning algorithms are run, which are Naïve Bayes, Decision Tree and Random Forest. Random Forest classifier outperforms the other two classifiers with 99.99% accuracy. Seven clusters for topic modelling have been revealed using Random Forest classifier. Each output has been set to four highest word and shows the highest term and its weight. The highest term used in the dataset is term ‘Ebola’. Based on the finding of this study, it shows that the combination of the LDA and supervised learning algorithm effectively solve the problem of data sparseness in short text sets. The method of selecting microblogs that are most likely to discuss news topics will significantly reduce the size of data objects of concern, and to a certain extent eliminate the interference of non-news blogs
Recent Trends in Computational Intelligence
Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications
MultiGBS: A multi-layer graph approach to biomedical summarization
Automatic text summarization methods generate a shorter version of the input
text to assist the reader in gaining a quick yet informative gist. Existing
text summarization methods generally focus on a single aspect of text when
selecting sentences, causing the potential loss of essential information. In
this study, we propose a domain-specific method that models a document as a
multi-layer graph to enable multiple features of the text to be processed at
the same time. The features we used in this paper are word similarity, semantic
similarity, and co-reference similarity, which are modelled as three different
layers. The unsupervised method selects sentences from the multi-layer graph
based on the MultiRank algorithm and the number of concepts. The proposed
MultiGBS algorithm employs UMLS and extracts the concepts and relationships
using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation
by ROUGE and BERTScore shows increased F-measure values
Web news mining in an evolving framework
Online news has become one of the major channels for Internet users to get news. News websites are daily overwhelmed with plenty of news articles. Huge amounts of online news articles are generated and updated everyday, and the processing and analysis of this large corpus of data is an important challenge. This challenge needs to be tackled by using big data techniques which process large volume of data within limited run times. Also, since we are heading into a social-media data explosion, techniques such as text mining or social network analysis need to be seriously taken into consideration. In this work we focus on one of the most common daily activities: web news reading. News websites produce thousands of articles covering a wide spectrum of topics or categories which can be considered as a big data problem. In order to extract useful information, these news articles need to be processed by using big data techniques. In this context, we present an approach for classifying huge amounts of different news articles into various categories (topic areas) based on the text content of the articles. Since these categories are constantly updated with new articles, our approach is based on Evolving Fuzzy Systems (EFS). The EFS can update in real time the model that describes a category according to the changes in the content of the corresponding articles. The novelty of the proposed system relies in the treatment of the web news articles to be used by these systems and the implementation and adjustment of them for this task. Our proposal not only classifies news articles, but it also creates human interpretable models of the different categories. This approach has been successfully tested using real on-line news. (C) 2015 Elsevier B.V. All rights reserved.This work has been supported by the Spanish Government under i-Support (Intelligent Agent Based Driver Decision Support) Project (TRA2011-29454-C03-03)
- …