54 research outputs found
SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered \de facto" standard in the framework of learning from imbalanced data. This
is due to its simplicity in the design of the procedure, as well as its robustness when applied
to di erent type of problems. Since its publication in 2002, SMOTE has proven
successful in a variety of applications from several di erent domains. SMOTE has also inspired
several approaches to counter the issue of class imbalance, and has also signi cantly
contributed to new supervised learning paradigms, including multilabel classi cation, incremental
learning, semi-supervised learning, multi-instance learning, among others. It is
standard benchmark for learning from imbalanced data. It is also featured in a number of
di erent software packages | from open source to commercial. In this paper, marking the
fteen year anniversary of SMOTE, we re
ect on the SMOTE journey, discuss the current
state of a airs with SMOTE, its applications, and also identify the next set of challenges
to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology
under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project
887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016;
and the National Science Foundation (NSF) Grant IIS-1447795
Recommended from our members
Similarity hash based scoring of portable executable files for efficient malware detection in IoT
YesThe current rise in malicious attacks shows that existing security systems are bypassed by malicious files. Similarity hashing has been adopted for sample triaging in malware analysis and detection. File similarity is used to cluster malware into families such that their common signature can be designed. This paper explores four hash types currently used in malware analysis for portable executable (PE) files. Although each hashing technique produces interesting results, when applied independently, they have high false detection rates. This paper investigates into a central issue of how different hashing techniques can be combined to provide a quantitative malware score and to achieve better detection rates. We design and develop a novel approach for malware scoring based on the hashes results. The proposed approach is evaluated through a number of experiments. Evaluation clearly demonstrates a significant improvement (> 90%) in true detection rates of malware
Generalized topographic block model
Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model
Serotonergic modulation of cognitive computations
Serotonin is a neuromodulator that is implicated in awake-sleep cycle, motor behaviors, reward, motivation, and mood. Recent molecular tools for cell-type-specific activity recording and manipulation with fine temporal and spatial resolutions are providing unprecedentedly detailed data about serotonergic neuromodulation. These newly gained information show substantial differences in the signaling and effect of serotonergic neuromodulation depending on the projection targets. To find the common denominator for this diversity, we conjecture that the evolution of serotonergic neuromodulation originates from signaling the time and resource available for action, learning, and development
A Comprehensive Study on Pedestrians' Evacuation
Human beings face threats because of unexpected happenings, which can be
avoided through an adequate crisis evacuation plan, which is vital to stop
wound and demise as its negative results. Consequently, different typical
evacuation pedestrians have been created. Moreover, through applied research,
these models for various applications, reproductions, and conditions have been
examined to present an operational model. Furthermore, new models have been
developed to cooperate with system evacuation in residential places in case of
unexpected events. This research has taken into account an inclusive and a
'systematic survey of pedestrian evacuation' to demonstrate models methods by
focusing on the applications' features, techniques, implications, and after
that gather them under various types, for example, classical models, hybridized
models, and generic model. The current analysis assists scholars in this field
of study to write their forthcoming papers about it, which can suggest a novel
structure to recent typical intelligent reproduction with novel features
Engaging adolescents with Down syndrome in an educational video game
ProducciĂłn CientĂficaThis article describes the design, implementation and evaluation of an educational video game that helps individuals with Down syndrome to improve their speech skills, specifically those related to prosody. Special attention has been paid to the design of the user interface, taking into account the cognitive, learning, and attentional limitations of people with Down syndrome. The learning content is conveyed by activities of production and perception of prosodic phenomena, aimed at increasing their communicative competence. These activities are introduced within the narrative of a video game so that the players do not conceive the tool as a mere succession of learning activities, but so that they learn and improve their speech while playing. The evaluation strategy that has been followed involves real users and combines different evaluation activities. Results show a high level of acceptance by participants and also by professionals, speech therapists, and special education teachers.2018-09-01MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y LeĂłn Regional Grant VA145U1
Get Your Foes Fooled: Proximal Gradient Split Learning for Defense Against Model Inversion Attacks on IoMT Data
The past decade has seen a rapid adoption of Artificial Intelligence (AI), specifically the deep learning networks, in Internet of Medical Things (IoMT) ecosystem. However, it has been shown recently that the deep learning networks can be exploited by adversarial attacks that not only make IoMT vulnerable to the data theft but also to the manipulation of medical diagnosis. The existing studies consider adding noise to the raw IoMT data or model parameters which not only reduces the overall performance concerning medical inferences but also is ineffective to the likes of deep leakage from gradients method. In this work, we propose proximal gradient split learning (PSGL) method for defense against the model inversion attacks. The proposed method intentionally attacks the IoMT data when undergoing the deep neural network training process at client side. We propose the use of proximal gradient method to recover gradient maps and a decision-level fusion strategy to improve the recognition performance. Extensive analysis show that the PGSL not only provides effective defense mechanism against the model inversion attacks but also helps in improving the recognition performance on publicly available datasets. We report 14.0 % , 17.9 % , and 36.9 % gains in accuracy over reconstructed and adversarial attacked images, respectively
- …