Search CORE

35 research outputs found

Thumbs up? Sentiment Classification using Machine Learning Techniques

Author: Lee Lillian
Pang Bo
Vaithyanathan Shivakumar
Publication venue
Publication date: 01/01/2002
Field of study

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200

arXiv.org e-Print Archive

CiteSeerX

Generalized Model Selection For Unsupervised Learning In High Dimensions

Author: Byron Dom
Shivakumar Vaithyanathan
Publication venue: MIT Press
Publication date
Field of study

In this paper we describe an approach to model selection in unsupervised learning. This approach determines both the feature set and the number of clusters. To this end we first derive an objective function that explicitly incorporates this generalization. We then evaluate two schemes for model selection - one using this objective function (a Bayesian estimation scheme that selects the best model structure using the marginal or integrated likelihood) and the second based on a technique using a cross-validated likelihood criterion. In the first scheme, for a particular application in document clustering, we derive a closed-form solution of the integrated likelihood by assuming an appropriate form of the likelihood function and prior. Extensive experiments are carried out to ascertain the validity of both approaches and all results are verified by comparison against ground truth. In our experiments the Bayesian scheme using our objective function gave better results tha n cross-validatio..

CiteSeerX

Clustering with model-level constraints

Author: Ashutosh Garg
David Gondek
Shivakumar Vaithyanathan
Publication venue
Publication date: 01/01/2005
Field of study

In this paper we describe a systematic approach to uncovering multiple clusterings underlying a dataset. In contrast to previous approaches, the proposed method uses information about structures that are not desired and consequently is very useful in an exploratory datamining setting. Specifically, the problem is formulated as constrained model-based clustering where the constraints are placed at a model-level. Two variants of an EM algorithm, for this constrained model, are derived. The performance of both variants is compared against a state-of-the-art information bottleneck algorithm on both synthetic and real datasets.

CiteSeerX

Crossref

OLAP over Imprecise Data With Domain Constraints

Author: Burdick Doug
Doan AnHai
Ramakrishnan Raghu
Vaithyanathan Shivakumar
Publication venue: University of Wisconsin-Madison Department of Computer Sciences
Publication date: 01/01/2007
Field of study

Several recent works have focused on OLAP over imprecise data, where each fact can be a region, instead of a point, in a multi-dimensional space. They have provided a multiple-world semantics for such data, and developed efficient solutions to answer OLAP aggregation queries over the imprecise facts. These solutions however assume that the imprecise facts can be interpreted {\em independently\/} of one another, a key assumption that is often violated in practice. Indeed, imprecise facts in real-world applications are often correlated, and such correlations can be captured as domain integrity constraints (e.g., repairs with the same customer names and models took place in the same city, or a text span can refer to a person or a city, but not both). In this paper we provide a solution to answer OLAP aggregation queries over imprecise data, in the presence of such domain constraints. We first describe a relatively simple yet powerful constraint language, and define what it means to take into account such constraints in query answering. Next, we prove that OLAP queries can be answered efficiently given a database

D*

of fact marginals. We then exploit the regularities in the constraint space (captured in a constraint hypergraph) and the fact space to efficiently construct D*. Extensive experiments over real-world and synthetic data demonstrate the effectiveness of our approach

CiteSeerX

MINDS@UW (Univ. of Wisconsin)

OLAP Over Uncertain and Imprecise Data

Author: Doug Burdick
Prasad M. Deshpande
Shivakumar Vaithyanathan et al.
Publication venue
Publication date
Field of study

We extend the OLAP data model to represent data ambiguity, specifically imprecision and uncertainty, and introduce an allocation-based approach to the semantics of aggregation queries over such data. We identify three natural query properties and use them to shed light on alternative query semantics. While there is much work on representing and querying ambiguous data, to our knowledge this is the first paper to handle both imprecision and uncertainty in an OLAP setting

CiteSeerX

Massively parallel analog tabu search using neural networks applied to simple plant location problems

Author: Burke Laura I.
Magent Michael A.
Vaithyanathan Shivakumar
Publication venue
Publication date
Field of study

Research Papers in Economics