Search CORE

4,084 research outputs found

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

Author: Chawla N. V.
Karakoulas Grigoris
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

arXiv.org e-Print Archive

Crossref

Box Drawings for Learning with Imbalanced Data

Author: Abe N.
Chawla N. V.
Qi Y.
Sniadecki J.
Wu G.
Publication venue
Publication date: 07/06/2014
Field of study

The vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly imbalanced classification problems. The classifiers constructed by both methods are created as unions of parallel axis rectangles around the positive examples, and thus have the benefit of being interpretable. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. Regularization is introduced to improve generalization performance. The second method uses an approximation in order to assist with scalability. Specifically, it follows a \textit{characterize then discriminate} approach, where the positive class is characterized first by boxes, and then each box boundary becomes a separate discriminative classifier. This method has the computational advantages that it can be easily parallelized, and considers only the relevant regions of feature space

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Class Balanced Similarity-Based Instance Transfer Learning for Botnet Family Classification

Author: B Alothman
Bo Liu
Guanglu Sun
Mark Hall
N Japkowicz
N. V. Chawla
Nitesh V. Chawla
R Feldman
Sinno Jialin Pan
Publication venue
Publication date: 24/07/2018
Field of study

The use of Transfer Learning algorithms for enhancing the performance of machine learning algorithms has gained attention over the last decade. In this paper we introduce an extension and evaluation of our novel approach Similarity Based Instance Transfer Learning (SBIT). The extended version is denoted Class Balanced SBIT (or CB-SBIT for short) because it ensures the dataset resulting after instance transfer does not contain class imbalance. We compare the performance of CB-SBIT against the original SBIT algorithm. In addition, we compare its performance against that of the classical Synthetic Minority Over-sampling Technique (SMOTE) using network tra ffic data. We also compare the performance of CB-SBIT against the performance of the open source transfer learning algorithm TransferBoost using text data. Our results show that CB-SBIT outperforms the original SBIT and SMOTE using varying sizes of network tra ffic data but falls short when compared to TransferBoost using text data

Crossref

De Montfort University Open Research Archive

SMOTE: Synthetic Minority Over-sampling Technique

Author: Bowyer K. W.
Chawla N. V.
Hall L. O.
Kegelmeyer W. P.
Publication venue: 'AI Access Foundation'
Publication date: 09/06/2011
Field of study

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy

arXiv.org e-Print Archive

Crossref

A review onquantification learning

Author: Castaño Gutiérrez Alberto
Chawla N. V.
Coz Velasco Juan José del
González González Pablo
Publication venue: ACM
Publication date: 01/01/2017
Field of study

The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the fiel

Repositorio Institucional de la Universidad de Oviedo

Effect of plant growth regulators on flowering behavior of cashew cv. Vengurla-4 grown in the hilly tracts of South Gujarat

Author: Ahlawat T. R.
Chawla S. L.
Ghadage Nitesh
Ghadage V. R.
Shah N. I.
Publication venue: 'ANSF Publications'
Publication date: 01/03/2016
Field of study

A trial was conducted at Subhir and Chikhalda locations in Dang district of South Gujarat, India to assess the effect of Ethrel, NAA and GA3 on the flowering behavior of cashew cultivar Vengurla-4 during 2013-14. Three concentrations each of GA3 (50, 75, 100 ppm), Ethrel (10, 30, 50 ppm) and NAA (50, 75, 100ppm) were applied as foliar sprays 20 days before blossoming and 20 days after full bloom in twenty year old trees of cashew cultivar Vengurla-4. Trees sprayed with 50 ppm Ethrel had significantly the highest number of flowering panicles per squaremeter (13.09), number of perfect flowers per panicle (87.11) and sex ratio (0.24) across locations and in pooled data. However, this was at par with 10 ppm Ethrel which emerged as the second best treatment of the trial. This study demonstrated the potential of Ethrel in improving various flowering parameters of cashew which are important determinations in increasing nut production

Journal of Applied and Natural Science

Recommended from our members

Comparison of Interactions Between Control and Mutant Macrophages

Author: A Chawla
B Stramer
B Stramer
FO Martinez
JA Solís-Lemus
JA Solís-Lemus
KM Henry
M Maška
N Otsu
RJ Petrie
SM Pocha
V Ulman
W Wood
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/01/2020
Field of study

This paper presents a preliminary study on macrophages migration in Drosophila embryos, comparing two types of cells. The study is carried out by a framework called macrosight which analyses the movement and interaction of migrating macrophages. The framework incorporates a segmentation and tracking algorithm into analysing motion characteristics of cells after contact. In this particular study, the interactions between cells is characterised in the case of control embryos and Shot3 mutants, where the cells have been altered to suppress a specific protein, looking to understand what drives the movement. Statistical significance between control and mutant cells was found when comparing the direction of motion after contact in specific conditions. Such discoveries provide insights for future developments in combining biological experiments to computational analysis

City Research Online

Crossref

Identification of micro satellite markers on chromosomes of bread wheat showing an association with karnal bunt resistance

Author: Chaudhary L
Chawla V
Kumar M
Kumar R
Luthra OP
Saini N
Sharma I
Yadav NR
Publication venue: 'African Journals Online (AJOL)'
Publication date: 11/08/2010
Field of study

A set of 104 wheat recombinant inbred lines developed from a cross between parents resistant (HD 29) and susceptible (WH 542) to karnal bunt (caused by Neovossia indica) were screened and used toidentify SSR markers linked with resistance to karnal bunt as these would allow indirect marker assisted selection of karnal bunt resistant genotypes. The two parents were analysed with 46 SSR primer pairs. Of these, 15 (32%) were found polymorphic between the two parental genotypes. Using these primer pairs, we carried out bulked segregate analysis on two bulked DNAs, one obtained by pooling DNA from 10 karnal bunt resistant recombinant inbred lines and the other similarly derived by pooling DNA from 10 karnal bunt susceptible recombinant inbred lines. Two molecular markers, Xgwm 337-1D and Xgwm 637-4A showed apparent linkage with resistance to karnal bunt. This was confirmed following selective genotyping of individual recombinant inbred lines included in the bulks. These markers may be useful in marker assisted selection for karnal bunt resistance in wheat

AJOL - African Journals Online