31,436 research outputs found
Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach
In this report, I present a deep learning approach to conduct a natural
language processing (hereafter NLP) binary classification task for analyzing
financial-fraud texts. First, I searched for regulatory announcements and
enforcement bulletins from HKEX news to define fraudulent companies and to
extract their MD&A reports before I organized the sentences from the reports
with labels and reporting time. My methodology involved different kinds of
neural network models, including Multilayer Perceptrons with Embedding layers,
vanilla Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and
Gated Recurrent Unit (GRU) for the text classification task. By utilizing this
diverse set of models, I aim to perform a comprehensive comparison of their
accuracy in detecting financial fraud. My results bring significant
implications for financial fraud detection as this work contributes to the
growing body of research at the intersection of deep learning, NLP, and
finance, providing valuable insights for industry practitioners, regulators,
and researchers in the pursuit of more robust and effective fraud detection
methodologies
Dialogue Act Recognition via CRF-Attentive Structured Network
Dialogue Act Recognition (DAR) is a challenging problem in dialogue
interpretation, which aims to attach semantic labels to utterances and
characterize the speaker's intention. Currently, many existing approaches
formulate the DAR problem ranging from multi-classification to structured
prediction, which suffer from handcrafted feature extensions and attentive
contextual structural dependencies. In this paper, we consider the problem of
DAR from the viewpoint of extending richer Conditional Random Field (CRF)
structural dependencies without abandoning end-to-end training. We incorporate
hierarchical semantic inference with memory mechanism on the utterance
modeling. We then extend structured attention network to the linear-chain
conditional random field layer which takes into account both contextual
utterances and corresponding dialogue acts. The extensive experiments on two
major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder
Dialogue Act (MRDA) datasets show that our method achieves better performance
than other state-of-the-art solutions to the problem. It is a remarkable fact
that our method is nearly close to the human annotator's performance on SWDA
within 2% gap.Comment: 10 pages, 4figure
- …