8 research outputs found

    Combining Static and Dynamic Features for Multivariate Sequence Classification

    Full text link
    Model precision in a classification task is highly dependent on the feature space that is used to train the model. Moreover, whether the features are sequential or static will dictate which classification method can be applied as most of the machine learning algorithms are designed to deal with either one or another type of data. In real-life scenarios, however, it is often the case that both static and dynamic features are present, or can be extracted from the data. In this work, we demonstrate how generative models such as Hidden Markov Models (HMM) and Long Short-Term Memory (LSTM) artificial neural networks can be used to extract temporal information from the dynamic data. We explore how the extracted information can be combined with the static features in order to improve the classification performance. We evaluate the existing techniques and suggest a hybrid approach, which outperforms other methods on several public datasets.Comment: Presented at IEEE DSAA 201

    Detection of Abuse in Financial Transaction Descriptions Using Machine Learning

    Full text link
    Since introducing changes to the New Payments Platform (NPP) to include longer messages as payment descriptions, it has been identified that people are now using it for communication, and in some cases, the system was being used as a targeted form of domestic and family violence. This type of tech-assisted abuse poses new challenges in terms of identification, actions and approaches to rectify this behaviour. Commonwealth Bank of Australia's Artificial Intelligence Labs team (CBA AI Labs) has developed a new system using advances in deep learning models for natural language processing (NLP) to create a powerful abuse detector that periodically scores all the transactions, and identifies cases of high-risk abuse in millions of records. In this paper, we describe the problem of tech-assisted abuse in the context of banking services, outline the developed model and its performance, and the operating framework more broadly.Comment: 7 pages, 3 figure

    Generatiivsete mudelite kasutamine staatiliste ja jadatunnuste kombineerimiseks klassifitseerimise eesmärgil

    Get PDF
    Tänapäeval veedame suure osa oma ajast võrgus. Me suhtleme suhtlusvõrgustikes, ostame asju e-poodides ja haldame pangaülekandeid e-panga kaudu. Tihti on meie tegevused seotud rahaliste teenustega, millega kaasnevad ka riski, et raha varastatakse. Petuskeeme on palju ja nad on pidevas muutumises. Teenusepakkujad üritavad meid finantspettuste eest kaitsta erinevatel viisidel, kuid see pakub suuri väljakutseid. Samas, kuna tegu on võrguteenustega, on võimalik salvestada andmeid, mida saab kasutada pettuste automaatse tuvastamise jaoks. Andmed võivad olla erinevatest allikatest ja erineval kujul. Mõni informatsioon võib olla staatiline, mis ajas ei muutu, ja mõningaid andmeid kogutakse mingi perioodi vältel, ehk nad on jadatunnused. Selleks, et treenida mudelit, mis võimalikult hästi eristab kliente ja pettureid, on oluline kasutada kõiki olemasolevaid andmeid. Petturite kättesaamine on üks näide paljudest erinevatest ülesannetest, mida saab lahendada automaatse klassifitseerimise abil. Käesolevas väitekirjas me uurime, kuidas kasutada selliseid andmetüüpe nagu staatilised ja jadatunnused ning kombineerida neid klassifitseerimise eesmäargil. Me rakendame erinevaid kombineerimisskeeme kolme ülesande puhul erinevatest valdkondadest. Esimene on petturite automaatne tuvastamine. Teine on katseisikute kujutletavate liigutuste ajusignaalide põhjal klassifitseerimine ning kolmas on äriprotsesside lõpptulemuse ennustamine nii varakult kui võimalik. Mida varem me suudame ennustada, et äriprotsess võib lõppeda tõrkega, seda rohkem on aega sekkuda olukorra parandamiseks. Antud töös me näitame, et saame tuvastada pettureid, kasutades selleks ainult 4 kuu andmed, ajusignaalide põhjal eristada 80% täpsusega katseisiku kujutletavaid liigutusi ning varakult - vaid 5 sündmuse realiseerimisel - ennustada äriprotsessi lõpptulemust. Need tulemused demonstreerivad, et meie töös pakutud meetod on potentsiaalselt kasulik ka teistes valdkondades klassifitseerimisprobleemide lahendamiseksNowadays, major part of our daily activities takes place online, whether we chat in social networks, do shopping, manage our bank accounts. Often such online activities are accompanied by financial transactions, where the suspicious activity is often present. Providers of the services try their best to protect their clients, but it is a challenging task as fraudulent users come up with new schemes and change their strategy. Most of these online activities can be recorded. This data can be used to automate the procedure of fraud detection. Data come from different sources and in different form. Some data include static attributes that do not change over time; some data are sequential, meaning that they capture client behavior over time. In order to build a model that automatically discriminates between clients and fraudulent users we want to incorporate all of the available data in a way that improves the detection. Capturing fraudulent activity is just one example out of the wide variety of problems that can be solved with automatic classification technique. In this thesis we investigate how to use different types of data, such as sequential and static attributes, and fuse them together to improve the classification. We apply various data fusion strategies on three tasks. One is the fraudulent user detection problem, while second is the discrimination between imaginary movements of patients using their brain activity signals. The third problem is early prediction of the outcome of the business processes, where the earlier we are able to predict whether the business process ends up being a failure, the better the chances are to intervene timely and change an undesired outcome. The results in our work suggest that the developed approach compliments the existing techniques and it can be useful for other realworld problems

    Complex symbolic sequence encodings for predictive monitoring of business processes

    Get PDF
    This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible out- comes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple symbolic sequence classification, meaning that they extract features from traces seen as sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex symbolic sequences, that is, sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex symbolic sequences and compares their predictive accuracy on real-life business process event logs

    Network of genes associated with Alzheimer's diseases. Data description.

    No full text
    File contains data description for both parts of the dataset- a graph of biological interactions and node attributes
    corecore