14,533 research outputs found
Intelligent Financial Fraud Detection Practices: An Investigation
Financial fraud is an issue with far reaching consequences in the finance
industry, government, corporate sectors, and for ordinary consumers. Increasing
dependence on new technologies such as cloud and mobile computing in recent
years has compounded the problem. Traditional methods of detection involve
extensive use of auditing, where a trained individual manually observes reports
or transactions in an attempt to discover fraudulent behaviour. This method is
not only time consuming, expensive and inaccurate, but in the age of big data
it is also impractical. Not surprisingly, financial institutions have turned to
automated processes using statistical and computational methods. This paper
presents a comprehensive investigation on financial fraud detection practices
using such data mining methods, with a particular focus on computational
intelligence-based techniques. Classification of the practices based on key
aspects such as detection algorithm used, fraud type investigated, and success
rate have been covered. Issues and challenges associated with the current
practices and potential future direction of research have also been identified.Comment: Proceedings of the 10th International Conference on Security and
Privacy in Communication Networks (SecureComm 2014
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Deep learning models have become state of the art for natural language
processing (NLP) tasks, however deploying these models in production system
poses significant memory constraints. Existing compression methods are either
lossy or introduce significant latency. We propose a compression method that
leverages low rank matrix factorization during training,to compress the word
embedding layer which represents the size bottleneck for most NLP models. Our
models are trained, compressed and then further re-trained on the downstream
task to recover accuracy while maintaining the reduced size. Empirically, we
show that the proposed method can achieve 90% compression with minimal impact
in accuracy for sentence classification tasks, and outperforms alternative
methods like fixed-point quantization or offline word embedding compression. We
also analyze the inference time and storage space for our method through FLOP
calculations, showing that we can compress DNN models by a configurable ratio
and regain accuracy loss without introducing additional latency compared to
fixed point quantization. Finally, we introduce a novel learning rate schedule,
the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate
to outperform other popular adaptive learning rate algorithms on a sentence
classification benchmark.Comment: Accepted in Thirty-Third AAAI Conference on Artificial Intelligence
(AAAI 2019
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
Classifying Amharic News Text Using Self-Organizing Maps
The paper addresses using artificial neural networks for classification of Amharic news items. Amharic is the language for countrywide communication in Ethiopia and has its own writing system containing extensive systematic redundancy. It is quite dialectally diversified and probably representative of the languages of a continent that so far has received little attention within the language processing field.
The experiments investigated document clustering around user queries using Self-Organizing Maps, an unsupervised learning neural network strategy. The best ANN model showed a precision of 60.0% when trying to cluster unseen data, and a 69.5% precision when trying to classify it
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
- …