16,905 research outputs found
Time Aware Knowledge Extraction for Microblog Summarization on Twitter
Microblogging services like Twitter and Facebook collect millions of user
generated content every moment about trending news, occurring events, and so
on. Nevertheless, it is really a nightmare to find information of interest
through the huge amount of available posts that are often noise and redundant.
In general, social media analytics services have caught increasing attention
from both side research and industry. Specifically, the dynamic context of
microblogging requires to manage not only meaning of information but also the
evolution of knowledge over the timeline. This work defines Time Aware
Knowledge Extraction (briefly TAKE) methodology that relies on temporal
extension of Fuzzy Formal Concept Analysis. In particular, a microblog
summarization algorithm has been defined filtering the concepts organized by
TAKE in a time-dependent hierarchy. The algorithm addresses topic-based
summarization on Twitter. Besides considering the timing of the concepts,
another distinguish feature of the proposed microblog summarization framework
is the possibility to have more or less detailed summary, according to the
user's needs, with good levels of quality and completeness as highlighted in
the experimental results.Comment: 33 pages, 10 figure
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus
It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators
- …