Search CORE

62 research outputs found

Improved imbalanced classification through convex space learning

Author: Bej Saptarshi (gnd: 1252250940)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling

Rostocker Dokumentenserver

SMOTEFUNA: Synthetic Minority Over-Sampling Technique Based on Furthest Neighbour Algorithm

Author: Almohammadi Khalid
Bellinger Colin
Csetverikov Dmitrij
Hassanat Ahmad B. A.
Tarawneh Ahmad S.
Publication venue
Publication date: 01/01/2020
Field of study

SZTAKI Publication Repository

Learning from Multi-Class Imbalanced Big Data with Apache Spark

Author: Sleeman William C, IV
Publication venue: VCU Scholars Compass
Publication date: 01/01/2021
Field of study

With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results as learning from the rare cases may be the primary goal. Many of the classic machine learning algorithms were not designed for multi-class, imbalanced data or parallelism, and so their effectiveness has been hindered. This dissertation addresses some of these challenges with in-depth experimentation using novel implementations of machine learning algorithms using Apache Spark, a distributed computing framework based on the MapReduce model designed to handle very large datasets. Experimentation showed that many of the traditional classifier algorithms do not translate well to a distributed computing environment, indicating the need for a new generation of algorithms targeting modern high-performance computing. A collection of popular oversampling methods, originally designed for small binary class datasets, have been implemented using Apache Spark for the first time to improve parallelism and add multi-class support. An extensive study on how instance level difficulty affects the learning from large datasets was also performed

VCU Scholars Compass

Iterative Training Sample Expansion to Increase and Balance the Accuracy of Land Classification from VHR Imagery

Author: Benediktsson Jon Atli
Foody Giles M.
Jin Zhenong
Li Guangfei
Lv Zhiyong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

© 1980-2012 IEEE. Imbalanced training sets are known to produce suboptimal maps for supervised classification. Therefore, one challenge in mapping land cover is acquiring training data that will allow classification with high overall accuracy (OA) in which each class is also mapped onto similar user's accuracy. To solve this problem, we integrated local adaptive region and box-and-whisker plot (BP) techniques into an iterative algorithm to expand the size of the training sample for selected classes in this article. The major steps of the proposed algorithm are as follows. First, a very small initial training sample (ITS) for each class set is labeled manually. Second, potential new training samples are found within an adaptive region by conducting local spectral variation analysis. Lastly, three new training samples are acquired to capture information regarding intraclass variation; these samples lie in the lower, median, and upper quartiles of BP. After adding these new training samples to the ITS, classification is retrained and the process is continued iteratively until termination. The proposed approach was applied to three very high-resolution (VHR) remote-sensing images and compared with a set of cognate methods. The comparison demonstrated that the proposed approach produced the best result in terms of OA and exhibited superiority in balancing user's accuracy. For example, the proposed approach was typically 2%-10% more accurate than the compared methods in terms of OA and it generally yielded the most balanced classification

Repository@Nottingham

Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

Author: Bunte Kerstin
Canducci Marco
De Rijcke Sven
Mastropietro Michele
Peletier Reynier
Taghribi Albolfazl
Tino Peter
Yin H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2021
Field of study

The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Birmingham Research Portal

Dissertations of the University of Groningen

Approach to identify product and process state drivers in manufacturing systems using supervised machine learning

Author: Wuest Thorsten
Publication venue
Publication date: 01/01/2014
Field of study

The developed concept allows identifying relevant state drivers of complex, multi-stage manufacturing systems holistically. It is able to utilize complex, diverse and high-dimensional data sets which often occur in manufacturing applications and integrate the important process intra- and inter-relations. The evaluation was conducted by using three different scenarios from distinctive manufacturing domains (aviation, chemical and semiconductor). The evaluation confirmed that it is possible to incorporate implicit process intra- and inter-relations on process as well as programme level through applying SVM based feature ranking. The analysis outcome presents a direct benefit for practitioners in form of the most important process parameters and state characteristics, so-called state drivers, of a manufacturing system. Given the increasing availability of data and information, this selection support can be directly utilized in, e.g., quality monitoring and advanced process control

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

Author: Vluymans Sarah
Publication venue: Ghent University. Faculty of Medicine and Health Sciences ; University of Granada. Department of Computer Science and Artificial Intelligence
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography