Search CORE

190 research outputs found

Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

Author: Carvalho Victor Diogho Heuer de
Costa Ana Paula Cabral Seixas
Publication venue: SciELO Preprints
Publication date: 19/01/2023
Field of study

Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

SciELO Preprints

LDA-BASED PERSONALIZED DOCUMENT RECOMMENDATION

Author: Te-Min Chang
Wen-Feng Hsiao
Publication venue
Publication date: 11/04/2020
Field of study

Abstrac

CiteSeerX

Prototype/topic based Clustering Method for Weblogs

Author: Cardiff John
Perez-Tellez Fernando
Pinto Avendaño David Eduardo
Rosso Paolo
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, resulting in a clear need for intelligent approaches to analyse and organise this massive amount of information. In this work, we present a methodology to cluster weblog posts according to the topics discussed therein, which we derive by text analysis. We have called the methodology Prototype/Topic Based Clustering, an approach which is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. The usage of the Self-Term Expansion methodology is to improve the representation of the data and the generative probabilistic model is employed to identify relevant topics discussed in the weblogs. We have modified the generative probabilistic model in order to exploit predefined initialisations of the model and have performed our experiments in narrow and wide domain subsets. The results of our approach have demonstrated a considerable improvement over the pre-defined baseline and alternative state of the art approaches, achieving an improvement of up to 20% in many cases. The experiments were performed on both narrow and wide domain datasets, with the latter showing better improvement. However in both cases, our results outperformed the baseline and state of the art algorithms.The work of the third author was carried out in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie, the DIANA APPLICATIONS Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Perez-Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2016). Prototype/topic based Clustering Method for Weblogs. Intelligent Data Analysis. 20(1):47-65. https://doi.org/10.3233/IDA-150793S476520

RiuNet

Learning Context on a Humanoid Robot using Incremental Latent Dirichlet Allocation

Author: Celikkanat Hande
Guerin Frank
Kalkan Sinan
Orhan Guner
Pugeault N
Sahin Erol
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2015
Field of study

In this article, we formalize and model context in terms of a set of concepts grounded in the sensorimotor interactions of a robot. The concepts are modeled as a web using Markov Random Field, inspired from the concept web hypothesis for representing concepts in humans. On this concept web, we treat context as a latent variable of Latent Dirichlet Allocation (LDA), which is a widely-used method in computational linguistics for modeling topics in texts. We extend the standard LDA method in order to make it incremental so that (i) it does not re-learn everything from scratch given new interactions (i.e., it is online) and (ii) it can discover and add a new context into its model when necessary. We demonstrate on the iCub platform that, partly owing to modeling context on top of the concept web, our approach is adaptive, online and robust: It is adaptive and online since it can learn and discover a new context from new interactions. It is robust since it is not affected by irrelevant stimuli and it can discover contexts after a few interactions only. Moreover, we show how to use the context learned in such a model for two important tasks: object recognition and planning.Scientific and Technological Research Council of TurkeyMarie Curie International Outgoing Fellowship titled “Towards Better Robot Manipulation: Improvement through Interaction

University of Surrey

Open Research Exeter

Surrey Research Insight

OpenMETU (Middle East Technical University)

From feature engineering and topics models to enhanced prediction rates in phishing detection

Author: Costa João Paulo Carvalho Lustosa da
Duque Cláudio Gottschalg
Gualberto Éder Souza
Souza Júnior Rafael Timóteo de
Vieira Thiago Pereira de Brito
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/04/2021
Field of study

Phishing is a type of fraud attempt in which the attacker, usually by e-mail, pretends to be a trusted person or entity in order to obtain sensitive information from a target. Most recent phishing detection researches have focused on obtaining highly distinctive features from the metadata and text of these e-mails. The obtained attributes are then used to feed classification algorithms in order to determine whether they are phishing or legitimate messages. In this paper, it is proposed an approach based on machine learning to detect phishing e-mail attacks. The methods that compose this approach are performed through a feature engineering process based on natural language processing, lemmatization, topics modeling, improved learning techniques for resampling and cross-validation, and hyperparameters configuration. The first proposed method uses all the features obtained from the Document-Term Matrix (DTM) in the classification algorithms. The second one uses Latent Dirichlet Allocation (LDA) as a operation to deal with the problems of the “curse of dimensionality”, the sparsity, and the text context portion included in the obtained representation. The proposed approach reached marks with an F1-measure of 99.95% success rate using the XGBoost algorithm. It outperforms state-of-the-art phishing detection researches for an accredited data set, in applications based only on the body of the e-mails, without using other e-mail features such as its header, IP information or number of links in the text

Repositório Institucional da Universidade de Brasília

Biomedical text mining: State-of-the-art, open problems and future challenges

Author: Holzinger Andreas
Schantl Johannes
Schroettner Miriam
Seifert Christin
Verspoor Karin
Publication venue: Springer
Publication date: 01/01/2014
Field of study

University of Twente Research Information

Changepoint model for Bayesian online fraud detection in call data

Author: Tüysüz Hilal
Publication venue
Publication date: 31/07/2018
Field of study

Illegal use in the phone network is a massive problem for both telecommunication companies and their users. By gaining criminal access to customers' telephone, fraudsters make an illicit pro t and cause heavy tra c in the call network. After rising trend in mobile phone fraud, telecommunication companies' security departments mainly focused on increasing the e ciency of fraud detection algorithms and decreasing the number of false alarms. In this thesis, we represent an online event-based fraud detection algorithm based on Hidden Markov Models (HMM). Detection problem is formulated as a changepoint model on caller's behavior. To capture call behavior more speci cally, we split it into three parts; call frequency, call duration and call features. We prefer to adapt changepoint model for call data because of its memoryless property; the data before the changepoint does not depend on the data after the change point. To investigate the performance of our algorithm, we conducted an extensive computational study on our generated data. Our results indicate that the algorithm is practical and resampling methods can control the di culty of linearly increasing computational cost

Sabanci University Research Database

Artificial Intelligence in Banking Industry: A Review on Fraud Detection, Credit Management, and Document Processing

Author: Alhaddad Musaab Mohammad
Publication venue: ResearchBerg Review of Science and Technology
Publication date: 08/11/2018
Field of study

AI is likely to alter the banking industry during the next several years. It is progressively being utilized by banks for analyzing and executing credit applications and examining vast volumes of data. This helps to avoid fraud and enables resource-heavy, repetitive procedures and client operations to be automated without any sacrifice in quality. This study reviews how the three most promising AI applications can make the banking sector robust and efficient. Specifically, we review AI fraud detection and prevention, AI credit management, and intelligent document processing. Since the majority of transactions have become digital, there is a great need for enhanced fraud detection algorithms and fraud prevention systems in banking. We argued that the conventional strategy for identifying bank fraud may be inadequate to combat complex fraudulent activity. Instead, artificial intelligence algorithms might be very useful.  Credit management is time-consuming and expensive in terms of resources. Furthermore, because of the number of phases involved, these processes need a significant amount of work involving many laborious tasks. Banks can assess new clients for credit services, calculate loan amounts and pricing, and decrease the risk of fraud by using strong AA/ML models to assess these large and varied data sets in real-time. Documents perform critical functions in the financial system and have a substantial influence on day-to-day operations. Currently, a large percentage of this data is preserved in email messages, online forms, PDFs, scanned images, and other digital formats. Using such a massive dataset is a difficult undertaking for any bank. We discuss how the artificial intelligence techniques that automatically pull critical data from all documents received by the bank, regardless of format, and feed it to the bank's existing portals/systems while maintaining consistency

ResearchBerg