Search CORE

3,355 research outputs found

Pruning training sets for learning of object categories

Author: Abu-Mostafa Yaser S.
Angelova Anelia
Perona Pietro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Training datasets for learning of object categories are often contaminated or imperfect. We explore an approach to automatically identify examples that are noisy or troublesome for learning and exclude them from the training set. The problem is relevant to learning in semi-supervised or unsupervised setting, as well as to learning when the training data is contaminated with wrongly labeled examples or when correctly labeled, but hard to learn examples, are present. We propose a fully automatic mechanism for noise cleaning, called ’data pruning’, and demonstrate its success on learning of human faces. It is not assumed that the data or the noise can be modeled or that additional training examples are available. Our experiments show that data pruning can improve on generalization performance for algorithms with various robustness to noise. It outperforms methods with regularization properties and is superior to commonly applied aggregation methods, such as bagging

CiteSeerX

Crossref

Caltech Authors

Viewing the process of generating counterfactuals as a source of knowledge -- Application to the Naive Bayes classifier

Author: Boudec Nathan Le
Fessant Françoise
Guyomard Victor
Lemaire Vincent
Publication venue
Publication date: 08/09/2023
Field of study

There are now many comprehension algorithms for understanding the decisions of a machine learning algorithm. Among these are those based on the generation of counterfactual examples. This article proposes to view this generation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.Comment: 12 page

arXiv.org e-Print Archive

Exploiting and Ranking Dominating Product Features through Communal Sentiments

Author: Mr. S. P. Ghode, Prof. S. S. Bere
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2015
Field of study

The rapidly expanding e-commerce has facilitated consumers to purchase products online. Various brands and millions of products have been offered online. Varieties of customers’ reviews are available now days in internet. These reviews are important for the consumers as well as the merchants. Most of the reviews are disorganized so it generates difficulty for usefulness of information. In this paper we are proposing a product feature ranking framework, which will identify important features of products from online customer opinions, and aim to improve the usability of the different reviews. The important product features are recognized using two observations 1) the important features are mostly commented on by a large number of users 2) users reviews on the important features are greatly influence on the overall reviews on the product. We first identify product features by shallow dependency parser and determine customer’s reviews on these features via a sentiment classifier. Then we adopt develop a probabilistic feature ranking algorithm to conclude the importance of features by considering frequency and the influence of the influence of the users reviews given to each feature over their overall reviews. DOI: 10.17762/ijritcc2321-8169.15068

International Journal on Recent and Innovation Trends in Computing and Communication

Deducing and Ordering Most-influencing Product Features through Well-established Sentiments using NLP

Author: Mr. Amit S. Kamale, Prof. Prakash B. Dhainje, Dr. Pradip K. Deshmukh
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2015
Field of study

The quickly extending e-commerce has encouraged shoppers to buy items on the web. Different brands and a huge number of items have been offered on the web. Mixtures of clients' reviews are accessible now days on web. These free audits cum reviews are imperative for the buyers and additionally the shippers/merchants. The greater parts of the reviews are disorganized leading to ambiguity in helpfulness of data. In this paper we are proposing a product feature ranking framework, which will distinguish important features cum aspects of products from online customer reviews, and aim to enhance usability of the these reviews. The important aspects or features of product can be usually distinguished using two interpretations 1) the critical aspects are generally remarked by larger audience 2) customers reviews on the key aspects- significantly influence on the overall reviews on the product. Firstly we distinguish product aspects by shallow dependency parser and conclude client's surveys on these elements by means of a sentiment classifier. Then we suggest probabilistic feature detection and ordering them by their rank algorithm to finish up the significance of features by considering recurrence and the impact of customers opinions given to every feature over their entire reviews. DOI: 10.17762/ijritcc2321-8169.150711

International Journal on Recent and Innovation Trends in Computing and Communication

Flexible Graph-based Learning with Applications to Genetic Data Analysis

Author: Liu Jianyu
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2019
Field of study

With the abundance of increasingly complex and high dimensional data in many scientific disciplines, graphical models have become an extremely useful statistical tool to explore data structures. In this dissertation, we study graphical models from two perspectives: i) to enhance supervised learning, classification in particular, and ii) graphical model estimation for specific data types. For classification, the optimal classifier is often connected with the feature structure within each class. In the first project, starting from the Gaussian population scenario, we aim to find an approach to utilize the graphical structure information of the features in classification. With respect to graphical models, many existing graphical estimation methods have been proposed based on a homogeneous Gaussian population. Due to the Gaussian assumption, these methods may not be suitable for many typical genetic data. For instance, the gene expression data may come from individuals of multiple populations with possibly distinct graphical structures. Another instance would be the single cell RNA-sequencing data, which are featured by substantial sample dependence and zero-inflation. In the second and the third project, we propose multiple graphical model estimation methods for these scenarios respectively. In particular, two dependent count-data graphical models are introduced for the latter case. Both numerical and theoretical studies are performed to demonstrate the effectiveness of these methods.Doctor of Philosoph

Carolina Digital Repository