Search CORE

20,495 research outputs found

Classification Algorithm Sensitivity to Training Data with Non Representative Attribute Noise

Author: Mannino Michael
Ryu Young
Yang Yanjuan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2007
Field of study

AIS Electronic Library (AISeL)

Recommended from our members

Incremental learning of independent, overlapping, and graded concept descriptions with an instance-based process framework

Author: Aha David W.
Publication venue: eScholarship, University of California
Publication date: 23/05/1989
Field of study

Supervised learning algorithms make several simplifying assumptions concerning the characteristics of the concept descriptions to be learned. For example, concepts are often assumed to be (1) defined with respect to the same set of relevant attributes, (2) disjoint in instance space, and (3) have uniform instance distributions. While these assumptions constrain the learning task, they unfortunately limit an algorithm's applicability. We believe that supervised learning algorithms should learn attribute relevancies independently for each concept, allow instances to be members of any subset of concepts, and represent graded concept descriptions. This paper introduces a process framework for instance-based learning algorithms that exploit only specific instance and performance feedback information to guide their concept learning processes. We also introduce Bloom, a specific instantiation of this framework. Bloom is a supervised, incremental, instance-based learning algorithm that learns relative attribute relevancies independently for each concept, allows instances to be members of any subset of concepts, and represents graded concept memberships. We describe empirical evidence to support our claims that Bloom can learn independent, overlapping, and graded concept descriptions

eScholarship - University of California

Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

Author: Ateniese Giuseppe
Felici Giovanni
Mancini Luigi V.
Spognardi Angelo
Villani Antonio
Vitali Domenico
Publication venue
Publication date: 19/06/2013
Field of study

Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

arXiv.org e-Print Archive

CiteSeerX

Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

Author: Ardehaly Ehsan Mohammady
Culotta Aron
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2017
Field of study

Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods

arXiv.org e-Print Archive

Crossref

Attribute Noise-Sensitivity Impact: Model Performance and Feature Ranking

Author: Sidahmed Mohamed
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2008
Field of study

Developing robust and less complex models capable of coping with environment volatility is the quest of every data mining project. This study attempts to establish heuristics for investigating the impact of noise in instance attributes data on learning model volatility. In addition, an alternative method for determining attribute importance and feature ranking, based on attribute sensitivity to noise is introduced. We present empirical analysis of the effect of attribute noise on model performance and how it impacts the overall learning process. Datasets drawn from different domains including Medicine, CRM, and security are employed by the study. Using proposed technique has practical implications by supporting building low volatile, high performance predictive models prior to production deployment. Also the study has implications for research by filling the gap in attribute noise research and its impact

AIS Electronic Library (AISeL)

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Predictive User Modeling with Actionable Attributes

Author: Pechenizkiy Mykola
Zliobaite Indre
Publication venue
Publication date: 01/01/2013
Field of study

Different machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target variable for unseen instances. For example, for marketing purposes a company consider profiling a new user based on her observed web browsing behavior, referral keywords or other relevant information. In many real world applications the values of some attributes are not only observable, but can be actively decided by a decision maker. Furthermore, in some of such applications the decision maker is interested not only to generate accurate predictions, but to maximize the probability of the desired outcome. For example, a direct marketing manager can choose which type of a special offer to send to a client (actionable attribute), hoping that the right choice will result in a positive response with a higher probability. We study how to learn to choose the value of an actionable attribute in order to maximize the probability of a desired outcome in predictive modeling. We emphasize that not all instances are equally sensitive to changes in actions. Accurate choice of an action is critical for those instances, which are on the borderline (e.g. users who do not have a strong opinion one way or the other). We formulate three supervised learning approaches for learning to select the value of an actionable attribute at an instance level. We also introduce a focused training procedure which puts more emphasis on the situations where varying the action is the most likely to take the effect. The proof of concept experimental validation on two real-world case studies in web analytics and e-learning domains highlights the potential of the proposed approaches

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository