116 research outputs found
Semi-supervised latent variable models for sentence-level sentiment analysis
We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines
Automated Big Text Security Classification
In recent years, traditional cybersecurity safeguards have proven ineffective
against insider threats. Famous cases of sensitive information leaks caused by
insiders, including the WikiLeaks release of diplomatic cables and the Edward
Snowden incident, have greatly harmed the U.S. government's relationship with
other governments and with its own citizens. Data Leak Prevention (DLP) is a
solution for detecting and preventing information leaks from within an
organization's network. However, state-of-art DLP detection models are only
able to detect very limited types of sensitive information, and research in the
field has been hindered due to the lack of available sensitive texts. Many
researchers have focused on document-based detection with artificially labeled
"confidential documents" for which security labels are assigned to the entire
document, when in reality only a portion of the document is sensitive. This
type of whole-document based security labeling increases the chances of
preventing authorized users from accessing non-sensitive information within
sensitive documents. In this paper, we introduce Automated Classification
Enabled by Security Similarity (ACESS), a new and innovative detection model
that penetrates the complexity of big text security classification/detection.
To analyze the ACESS system, we constructed a novel dataset, containing
formerly classified paragraphs from diplomatic cables made public by the
WikiLeaks organization. To our knowledge this paper is the first to analyze a
dataset that contains actual formerly sensitive information annotated at
paragraph granularity.Comment: Pre-print of Best Paper Award IEEE Intelligence and Security
Informatics (ISI) 2016 Manuscrip
Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.
Cross-Cultural Comparisons of Review Aspect Importance
Previous text mining studies identified key common aspects in online restaurant reviews. However, it is not clear how important these aspects are for consumers. In this exploratory study, we used Yelp restaurant reviews on an ethnic food item, ramen noodles, and assessed the importance of each aspect to both U.S. and Japanese consumers. The results show that food and atmosphere are far more important than the other common aspects in both the U.S. and Japan. However, we found noticeable differences between consumers in the two countries regarding how the food aspect plays a role on star ratings. Both implications and a future research agenda are discussed
- ā¦