13,801 research outputs found
Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain
Word embeddings have made enormous inroads in recent years in a wide variety
of text mining applications. In this paper, we explore a word embedding-based
architecture for predicting the relevance of a role between two financial
entities within the context of natural language sentences. In this extended
abstract, we propose a pooled approach that uses a collection of sentences to
train word embeddings using the skip-gram word2vec architecture. We use the
word embeddings to obtain context vectors that are assigned one or more labels
based on manual annotations. We train a machine learning classifier using the
labeled context vectors, and use the trained classifier to predict contextual
role relevance on test data. Our approach serves as a good minimal-expertise
baseline for the task as it is simple and intuitive, uses open-source modules,
requires little feature crafting effort and performs well across roles.Comment: DSMM 2017 workshop at ACM SIGMOD conferenc
Cross-Domain Labeled LDA for Cross-Domain Text Classification
Cross-domain text classification aims at building a classifier for a target
domain which leverages data from both source and target domain. One promising
idea is to minimize the feature distribution differences of the two domains.
Most existing studies explicitly minimize such differences by an exact
alignment mechanism (aligning features by one-to-one feature alignment,
projection matrix etc.). Such exact alignment, however, will restrict models'
learning ability and will further impair models' performance on classification
tasks when the semantic distributions of different domains are very different.
To address this problem, we propose a novel group alignment which aligns the
semantics at group level. In addition, to help the model learn better semantic
groups and semantics within these groups, we also propose a partial supervision
for model's learning in source domain. To this end, we embed the group
alignment and a partial supervision into a cross-domain topic model, and
propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and
Reuters dataset, extensive quantitative (classification, perplexity etc.) and
qualitative (topic detection) experiments are conducted to show the
effectiveness of the proposed group alignment and partial supervision.Comment: ICDM 201
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
- …