857 research outputs found
Outlier detection using flexible categorisation and interrogative agendas
Categorization is one of the basic tasks in machine learning and data
analysis. Building on formal concept analysis (FCA), the starting point of the
present work is that different ways to categorize a given set of objects exist,
which depend on the choice of the sets of features used to classify them, and
different such sets of features may yield better or worse categorizations,
relative to the task at hand. In their turn, the (a priori) choice of a
particular set of features over another might be subjective and express a
certain epistemic stance (e.g. interests, relevance, preferences) of an agent
or a group of agents, namely, their interrogative agenda. In the present paper,
we represent interrogative agendas as sets of features, and explore and compare
different ways to categorize objects w.r.t. different sets of features
(agendas). We first develop a simple unsupervised FCA-based algorithm for
outlier detection which uses categorizations arising from different agendas. We
then present a supervised meta-learning algorithm to learn suitable (fuzzy)
agendas for categorization as sets of features with different weights or
masses. We combine this meta-learning algorithm with the unsupervised outlier
detection algorithm to obtain a supervised outlier detection algorithm. We show
that these algorithms perform at par with commonly used algorithms for outlier
detection on commonly used datasets in outlier detection. These algorithms
provide both local and global explanations of their results
A social media and crowd-sourcing data mining system for crime prevention during and post-crisis situations
A number of large crisis situations, such as natural disasters have affected the planet over the last decade. The outcomes of such disasters are catastrophic for the infrastructures of modern societies. Furthermore, after large disasters, societies come face-to-face with important issues, such as the loss of human lives, people who are missing and the increment of the criminality rate. In many occasions, they seem unprepared to face such issues. This paper aims to present an automated system for the synchronization of the police and Law Enforcement Agencies (LEAs) for the prevention of criminal activities during and post a large crisis situation. The paper presents a review of the literature focusing on the necessity of using data mining in combination with advanced web technologies, such as social media and crowd-sourcing, for the resolution of the problems related to criminal activities caused during and post-crisis situations. The paper provides an introduction to examples of different techniques and algorithms used for social media and crowd-sourcing scanning, such as sentiment analysis and link analysis. The main focus of the paper is the ATHENA Crisis Management system. The function of the ATHENA system is based on the use of social media and crowd-sourcing for collecting crisis-related information. The system uses a number of data mining techniques to collect and analyze data from the social media for the purpose of crime prevention. A number of conclusions are drawn on the significance of social media and crowd-sourcing data mining techniques for the resolution of problems related to large crisis situations with emphasis to the ATHENA system
A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis
With the increasing amounts of textual data being collected online, automated text classification techniques are becoming increasingly important. However, a lot of this data is in the form of short-text with just a handful of terms per document (e.g. Text messages, tweets or Facebook posts). This data is generally too sparse and noisy to obtain satisfactory classification. Two techniques which aim to alleviate this problem are Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). Both techniques have been shown to improve the performance of short-text classification by reducing the sparsity of the input data. The relative performance of classifiers that have been enhanced using each technique has not been directly compared so, to address this issue, this work presents an experiment to compare them, using super- vised models. It has shown that FCA leads to a much higher degree of correlation among terms than LDA and initially gives lower classification accuracy. However, once a subset of features is selected for training, the FCA models can outperform those trained on LDA expanded data
A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis
With the increasing amounts of textual data being collected online, automated text classification techniques are becoming increasingly important. However, a lot of this data is in the form of short-text with just a handful of terms per document (e.g. Text messages, tweets or Facebook posts). This data is generally too sparse and noisy to obtain satisfactory classification. Two techniques which aim to alleviate this problem are Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). Both techniques have been shown to improve the performance of short-text classification by reducing the sparsity of the input data. The relative performance of classifiers that have been enhanced using each technique has not been directly compared so, to address this issue, this work presents an experiment to compare them, using super- vised models. It has shown that FCA leads to a much higher degree of correlation among terms than LDA and initially gives lower classification accuracy. However, once a subset of features is selected for training, the FCA models can outperform those trained on LDA expanded data
The AI Revolution: Opportunities and Challenges for the Finance Sector
This report examines Artificial Intelligence (AI) in the financial sector,
outlining its potential to revolutionise the industry and identify its
challenges. It underscores the criticality of a well-rounded understanding of
AI, its capabilities, and its implications to effectively leverage its
potential while mitigating associated risks. The potential of AI potential
extends from augmenting existing operations to paving the way for novel
applications in the finance sector. The application of AI in the financial
sector is transforming the industry. Its use spans areas from customer service
enhancements, fraud detection, and risk management to credit assessments and
high-frequency trading. However, along with these benefits, AI also presents
several challenges. These include issues related to transparency,
interpretability, fairness, accountability, and trustworthiness. The use of AI
in the financial sector further raises critical questions about data privacy
and security. A further issue identified in this report is the systemic risk
that AI can introduce to the financial sector. Being prone to errors, AI can
exacerbate existing systemic risks, potentially leading to financial crises.
Regulation is crucial to harnessing the benefits of AI while mitigating its
potential risks. Despite the global recognition of this need, there remains a
lack of clear guidelines or legislation for AI use in finance. This report
discusses key principles that could guide the formation of effective AI
regulation in the financial sector, including the need for a risk-based
approach, the inclusion of ethical considerations, and the importance of
maintaining a balance between innovation and consumer protection. The report
provides recommendations for academia, the finance industry, and regulators
Bagged Randomized Conceptual Machine Learning Method
Formal concept analysis (FCA) is a scientific approach aiming to investigate, analyze and represent the conceptual knowledge deduced from the data in conceptual structures (lattice). Recently many researchers are counting on the potentials of FCA to resolve or contribute addressing machine learning problems. However, some of these heuristics are still far from achieving this goal. In another context, ensemble-learning methods are deemed effective in addressing the classification problem, in addition, introducing randomness to ensemble learning found effective in certain scenarios. We exploit the potentials of FCA and the notion of randomness in ensemble learning, and propose a new machine learning method based on random conceptual decomposition. We also propose a novel approach for rule optimization. We develop an effective learning algorithm that is capable of handling some of learning problem aspects, with results that are comparable to other ensemble learning algorithms
DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication
Finger vein authentication, recognized for its high security and specificity,
has become a focal point in biometric research. Traditional methods
predominantly concentrate on vein feature extraction for discriminative
modeling, with a limited exploration of generative approaches. Suffering from
verification failure, existing methods often fail to obtain authentic vein
patterns by segmentation. To fill this gap, we introduce DiffVein, a unified
diffusion model-based framework which simultaneously addresses vein
segmentation and authentication tasks. DiffVein is composed of two dedicated
branches: one for segmentation and the other for denoising. For better feature
interaction between these two branches, we introduce two specialized modules to
improve their collective performance. The first, a mask condition module,
incorporates the semantic information of vein patterns from the segmentation
branch into the denoising process. Additionally, we also propose a Semantic
Difference Transformer (SD-Former), which employs Fourier-space self-attention
and cross-attention modules to extract category embedding before feeding it to
the segmentation task. In this way, our framework allows for a dynamic
interplay between diffusion and segmentation embeddings, thus vein segmentation
and authentication tasks can inform and enhance each other in the joint
training. To further optimize our model, we introduce a Fourier-space
Structural Similarity (FourierSIM) loss function, which is tailored to improve
the denoising network's learning efficacy. Extensive experiments on the USM and
THU-MVFV3V datasets substantiates DiffVein's superior performance, setting new
benchmarks in both vein segmentation and authentication tasks
- …