7 research outputs found
EFN-SMOTE: An effective oversampling technique for credit card fraud detection by utilizing noise filtering and fuzzy c-means clustering
Credit card fraud poses a significant challenge for both consumers and organizations worldwide, particularly with the increasing reliance on credit cards for financial transactions. Therefore, it is crucial to establish effective mechanisms to detect credit card fraud. However, the uneven distribution of instances between the two classes in the credit card dataset hinders traditional machine learning techniques, as they tend to prioritize the majority class, leading to inaccurate fraud pre- dictions. To address this issue, this paper focuses on the use of the Elbow Fuzzy Noise Filtering SMOTE (EFN-SMOTE) technique, an oversampling approach, to handle unbalanced data. EFN-SMOTE partitions the dataset into multiple clusters using the Elbow method, applies noise filtering to each cluster, and then employs SMOTE to synthesize new minority instances based on the nearest majority instance to each minority instance, thereby improving the model’s ability to perceive the decision boundary. EFN-SMOTE’s performance was evaluated using an Artificial Neural Network model with four hidden layers, resulting in significant improvements in classification performance, achieving an accuracy of 0.999, precision of 0.998, sensitivity of 0.999, specificity of 0.998, F-measure of 0.999, and G-Mean of 0.999
Recognition and Exploitation of Gate Structure in SAT Solving
In der theoretischen Informatik ist das SAT-Problem der archetypische Vertreter der Klasse der NP-vollständigen Probleme, weshalb effizientes SAT-Solving im Allgemeinen als unmöglich angesehen wird.
Dennoch erzielt man in der Praxis oft erstaunliche Resultate, wo einige Anwendungen Probleme mit Millionen von Variablen erzeugen, die von neueren SAT-Solvern in angemessener Zeit gelöst werden können.
Der Erfolg von SAT-Solving in der Praxis ist auf aktuelle Implementierungen des Conflict Driven Clause-Learning (CDCL) Algorithmus zurückzuführen, dessen Leistungsfähigkeit weitgehend von den verwendeten Heuristiken abhängt, welche implizit die Struktur der in der industriellen Praxis erzeugten Instanzen ausnutzen.
In dieser Arbeit stellen wir einen neuen generischen Algorithmus zur effizienten Erkennung der Gate-Struktur in CNF-Encodings von SAT Instanzen vor, und außerdem drei Ansätze, in denen wir diese Struktur explizit ausnutzen.
Unsere Beiträge umfassen auch die Implementierung dieser Ansätze in unserem SAT-Solver Candy und die Entwicklung eines Werkzeugs für die verteilte Verwaltung von Benchmark-Instanzen und deren Attribute, der Global Benchmark Database (GBD)
Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing
Existing regression models tend to fall short in both accuracy and
uncertainty estimation when the label distribution is imbalanced. In this
paper, we propose a probabilistic deep learning model, dubbed variational
imbalanced regression (VIR), which not only performs well in imbalanced
regression but naturally produces reasonable uncertainty estimation as a
byproduct. Different from typical variational autoencoders assuming I.I.D.
representations (a data point's representation is not directly affected by
other data points), our VIR borrows data with similar regression labels to
compute the latent representation's variational distribution; furthermore,
different from deterministic regression models producing point estimates, VIR
predicts the entire normal-inverse-gamma distributions and modulates the
associated conjugate distributions to impose probabilistic reweighting on the
imbalanced data, thereby providing better uncertainty estimation. Experiments
in several real-world datasets show that our VIR can outperform
state-of-the-art imbalanced regression models in terms of both accuracy and
uncertainty estimation. Code will soon be available at
https://github.com/Wang-ML-Lab/variational-imbalanced-regression.Comment: Accepted at NeurIPS 202
Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow
This thesis addresses two closely related problems. The first, translation alignment, consists of identifying bilingual document pairs that are translations of each other within
multilingual document collections (document alignment); identifying sentences, titles,
etc, that are translations of each other within bilingual document pairs (sentence alignment); and identifying corresponding word and phrase translations within bilingual
sentence pairs (phrase alignment). The second is extraction of bilingual pairs of equivalent word and multi-word expressions, which we call translation equivalents (TEs), from sentence- and phrase-aligned parallel corpora.
While these same problems have been investigated by other authors, their focus has
been on fully unsupervised methods based mostly or exclusively on parallel corpora.
Bilingual lexica, which are basically lists of TEs, have not been considered or given enough importance as resources in the treatment of these problems. Human validation of TEs, which consists of manually classifying TEs as correct or incorrect translations, has also not been considered in the context of alignment and extraction. Validation strengthens the importance of infrequent TEs (most of the entries of a validated lexicon) that otherwise would be statistically unimportant.
The main goal of this thesis is to revisit the alignment and extraction problems in the
context of a lexica-centered iterative workflow that includes human validation. Therefore, the methods proposed in this thesis were designed to take advantage of knowledge accumulated in human-validated bilingual lexica and translation tables obtained by unsupervised methods. Phrase-level alignment is a stepping stone for several applications, including the extraction of new TEs, the creation of statistical machine translation systems, and the creation of bilingual concordances. Therefore, for phrase-level alignment, the higher accuracy of human-validated bilingual lexica is crucial for achieving higher quality results in these downstream applications.
There are two main conceptual contributions. The first is the coverage maximization
approach to alignment, which makes direct use of the information contained in a lexicon, or in translation tables when this is small or does not exist. The second is the introduction of translation patterns which combine novel and old ideas and enables precise and productive extraction of TEs. As material contributions, the alignment and extraction methods proposed in this thesis have produced source materials for three lines of research, in the context of three PhD theses (two of them already defended), all sharing with me the supervision of my advisor. The topics of these lines of research are statistical machine translation, algorithms and data structures for indexing and querying phrase-aligned parallel corpora, and bilingual lexica classification and generation. Four publications have resulted directly from the work presented in this thesis and twelve from the collaborative lines of research
Using Genetic Programming to Build Self-Adaptivity into Software-Defined Networks
Self-adaptation solutions need to periodically monitor, reason about, and
adapt a running system. The adaptation step involves generating an adaptation
strategy and applying it to the running system whenever an anomaly arises. In
this article, we argue that, rather than generating individual adaptation
strategies, the goal should be to adapt the control logic of the running system
in such a way that the system itself would learn how to steer clear of future
anomalies, without triggering self-adaptation too frequently. While the need
for adaptation is never eliminated, especially noting the uncertain and
evolving environment of complex systems, reducing the frequency of adaptation
interventions is advantageous for various reasons, e.g., to increase
performance and to make a running system more robust. We instantiate and
empirically examine the above idea for software-defined networking -- a key
enabling technology for modern data centres and Internet of Things
applications. Using genetic programming,(GP), we propose a self-adaptation
solution that continuously learns and updates the control constructs in the
data-forwarding logic of a software-defined network. Our evaluation, performed
using open-source synthetic and industrial data, indicates that, compared to a
baseline adaptation technique that attempts to generate individual adaptations,
our GP-based approach is more effective in resolving network congestion, and
further, reduces the frequency of adaptation interventions over time. In
addition, we show that, for networks with the same topology, reusing over
larger networks the knowledge that is learned on smaller networks leads to
significant improvements in the performance of our GP-based adaptation
approach. Finally, we compare our approach against a standard data-forwarding
algorithm from the network literature, demonstrating that our approach
significantly reduces packet loss.Comment: arXiv admin note: text overlap with arXiv:2205.0435