4,965 research outputs found
SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered \de facto" standard in the framework of learning from imbalanced data. This
is due to its simplicity in the design of the procedure, as well as its robustness when applied
to di erent type of problems. Since its publication in 2002, SMOTE has proven
successful in a variety of applications from several di erent domains. SMOTE has also inspired
several approaches to counter the issue of class imbalance, and has also signi cantly
contributed to new supervised learning paradigms, including multilabel classi cation, incremental
learning, semi-supervised learning, multi-instance learning, among others. It is
standard benchmark for learning from imbalanced data. It is also featured in a number of
di erent software packages | from open source to commercial. In this paper, marking the
fteen year anniversary of SMOTE, we re
ect on the SMOTE journey, discuss the current
state of a airs with SMOTE, its applications, and also identify the next set of challenges
to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology
under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project
887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016;
and the National Science Foundation (NSF) Grant IIS-1447795
An Empirical Study of AML Approach for Credit Card Fraud Detection—Financial Transactions
Credit card fraud is one of the flip sides of the digital world, where transactions are made without the knowledge of the genuine user. Based on the study of various papers published between 1994 and 2018 on credit card fraud, the following objectives are achieved: the various types of credit card frauds has identified and to detect automatically these frauds, an adaptive machine learning techniques (AMLTs) has studied and also their pros and cons has summarized. The various dataset are used in the literature has studied and categorized into the real and synthesized datasets.The performance matrices and evaluation criteria have summarized which has used to evaluate the fraud detection system.This study has also covered the deep analysis and comparison of the performance (i.e sensitivity, specificity, and accuracy) of existing machine learning techniques in the credit card fraud detection area.The findings of this study clearly show that supervised learning, card-not-present fraud, skimming fraud, and website cloning method has been used more frequently.This Study helps to new researchers by discussing the limitation of existing fraud detection techniques and providing helpful directions of research in the credit card fraud detection field
Model-based and Model-free Approaches for Power System Security Assessment
Continuous security assessment of a power system is necessary to insure a reliable, stable, and continuous supply of electrical power to customers. To this end, this dissertation identifies and explores some of the various challenges encountered in the field of power system security assessment. Accordingly, several model-based and/or model-free approaches were developed to overcome these challenges.
First, a voltage stability index, named TAVSI, is proposed. This index has three important features: TAVSI applies to general load models including ZIP, exponential, and induction motor loads; TAVSI can be used for both measurement-based and model-based voltage stability assessment; and finally, TAVSI is calculated based on normalized sensitivities which enables identification of weak buses and the definition of a global instability threshold. TAVSI was tested on both the IEEE 14-bus and the 181-bus WECC systems. Results show that TAVSI gives a reliable assessment of system stability.
Second, a data-driven and model-based hybrid reinforcement learning approach is proposed for training a control agent to re-dispatch generators’ output power in order to relieve stressed branches. For large power systems, the agent’s action space is highly dimensioned which challenges the successful training of data-driven agents. Therefore, we propose a hybrid approach where model-based actions are utilized to help the agent learn an optimal control policy. The proposed approach was tested and compared to the generic data-driven DDPG-based approach on the IEEE 118-bus system and a larger 2749-bus real-world system. Results show that the hybrid approach performs well for large power systems and that it is superior to the DDPG-based approach.
Finally, a Convolutional Neural Network (CNN) based approach is proposed as a faster alternative to the classical AC power flow-based contingency screening. The proposed approach is investigated on both the IEEE 118-bus system and the Texas 2000-bus synthetic system. For such large systems, the implementation of the proposed approach came with several challenges, such as computational burden, learning from imbalanced datasets, and performance evaluation of trained models. Accordingly, this work contributes a set of novel techniques and best practices that enables both efficient and successful implementation of CNN-based multi-contingency classifiers for large power systems
Event program
UNLV Undergraduates from all departments, programs and colleges participated in a campus-wide symposium on April 16, 2011. Undergraduate posters from all disciplines and also oral presentations of research activities, readings and other creative endeavors were exhibited throughout the festival
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised
fine-tuning (SFT) with human annotations and reinforcement learning from human
feedback (RLHF) to align the output of large language models (LLMs) with human
intentions, ensuring they are helpful, ethical, and reliable. However, this
dependence can significantly constrain the true potential of AI-assistant
agents due to the high cost of obtaining human supervision and the related
issues on quality, reliability, diversity, self-consistency, and undesirable
biases. To address these challenges, we propose a novel approach called
SELF-ALIGN, which combines principle-driven reasoning and the generative power
of LLMs for the self-alignment of AI agents with minimal human supervision. Our
approach encompasses four stages: first, we use an LLM to generate synthetic
prompts, and a topic-guided method to augment the prompt diversity; second, we
use a small set of human-written principles for AI models to follow, and guide
the LLM through in-context learning from demonstrations (of principles
application) to produce helpful, ethical, and reliable responses to user's
queries; third, we fine-tune the original LLM with the high-quality
self-aligned responses so that the resulting model can generate desirable
responses for each query directly without the principle set and the
demonstrations anymore; and finally, we offer a refinement step to address the
issues of overly-brief or indirect responses. Applying SELF-ALIGN to the
LLaMA-65b base language model, we develop an AI assistant named Dromedary. With
fewer than 300 lines of human annotations (including < 200 seed prompts, 16
generic principles, and 5 exemplars for in-context learning). Dromedary
significantly surpasses the performance of several state-of-the-art AI systems,
including Text-Davinci-003 and Alpaca, on benchmark datasets with various
settings.Comment: Accepted at NeurIPS 2023 (Spotlight). Project page:
https://github.com/IBM/Dromedar
- …