Search CORE

520 research outputs found

Classification in P2P Networks with Cascade Support Vendor Machines

Author: ANG Hock Hee
Gopalkrishnan Vivekanand
HOI Steven C. H.
NG Wee-Keong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2013
Field of study

Institutional Knowledge at Singapore Management University

Cascade RSVM in Peer-to-Peer Network

Author: ANG Hock Hee
Gopalkrishnan Vivekanand
HOI Steven C. H.
NG Wee Keong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

Institutional Knowledge at Singapore Management University

Classification in P2P Networks by Bagging Cascade RSVMs

Author: ANG Hock Hee
DATTA Anwitaman
GOPALKRISHNAN Vikvekanand
HOI Steven C. H.
NG Wee Keong
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2008
Field of study

Institutional Knowledge at Singapore Management University

Optimizing Deep Packet Inspection for High-Speed Traffic Analysis

Author: A. Este
D. Bonfiglio
F. Gringoli
Fulvio Risso
L. Bernaille
Luigi Ciminiera
M. Crotti
Niccolò Cascarano
R. Smith
V. Paxson
V. Vapnik
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Machine learning-driven credit risk: a systemic review

Author: D'Addona S.
Luo W.
Pau G.
Shi S.
Tse R.
Publication venue
Publication date: 01/01/2022
Field of study

Credit risk assessment is at the core of modern economies. Traditionally, it is measured by statistical methods and manual auditing. Recent advances in financial artificial intelligence stemmed from a new wave of machine learning (ML)-driven credit risk models that gained tremendous attention from both industry and academia. In this paper, we systematically review a series of major research contributions (76 papers) over the past eight years using statistical, machine learning and deep learning techniques to address the problems of credit risk. Specifically, we propose a novel classification methodology for ML-driven credit risk algorithms and their performance ranking using public datasets. We further discuss the challenges including data imbalance, dataset inconsistency, model transparency, and inadequate utilization of deep learning models. The results of our review show that: 1) most deep learning models outperform classic machine learning and statistical algorithms in credit risk estimation, and 2) ensemble methods provide higher accuracy compared with single models. Finally, we present summary tables in terms of datasets and proposed models

Archivio della Ricerca - Università di Roma 3

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Communication-efficient Classification in P2P Networks

Author: ANG Hock Hee
Gopalkrishnan Vivekanand
HOI Steven C. H.
NG Wee Keong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2009
Field of study

Institutional Knowledge at Singapore Management University

Gossip Learning with Linear Models on Fully Distributed Data

Author: Hegedüs István
Jelasity Márk
Ormándi Róbert
Publication venue: 'Wiley'
Publication date: 16/05/2012
Field of study

Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult, because there is no possibility to learn local models, the system model offers almost no guarantees for reliability, yet the communication cost needs to be kept low. Here we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method which---through the continuous combination of the models in the network---implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared to independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark datasets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.Comment: The paper was published in the journal Concurrency and Computation: Practice and Experience http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291532-0634 (DOI: http://dx.doi.org/10.1002/cpe.2858). The modifications are based on the suggestions from the reviewer

arXiv.org e-Print Archive

Crossref

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

A recent review on optimisation methods applied to credit scoring models

Author: Kamimura Elias Shohei
Nagano Marcelo Seido
Pinto Anderson Rogerio Faia
Publication venue: PE
Publication date: 01/02/2024
Field of study

Purpose: This paper aims to present a literature review of the most recent optimisation methods applied to Credit Scoring Models (CSMs). Design/methodology/approach: The research methodology employed technical procedures based on bibliographic and exploratory analyses. A traditional investigation was carried out using the Scopus, ScienceDirect and Web of Science databases. The papers selection and classification took place in three steps considering only studies in English language and published in electronic journals (from 2008 to 2022). The investigation led up to the selection of 46 publications (10 presenting literature reviews and 36 proposing CSMs). Findings: The findings showed that CSMs are usually formulated using Financial Analysis, Machine Learning, Statistical Techniques, Operational Research and Data Mining Algorithms. The main databases used by the researchers were banks and the University of California, Irvine. The analyses identified 48 methods used by CSMs, the main ones being: Logistic Regression (13%), Naive Bayes (10%) and Artificial Neural Networks (7%). The authors conclude that advances in credit score studies will require new hybrid approaches capable of integrating Big Data and Deep Learning algorithms into CSMs. These algorithms should have practical issues considered consider practical issues for improving the level of adaptation and performance demanded for the CSMs. Practical implications: The results of this study might provide considerable practical implications for the application of CSMs. As it was aimed to demonstrate the application of optimisation methods, it is highly considerable that legal and ethical issues should be better adapted to CSMs. It is also suggested improvement of studies focused on micro and small companies for sales in instalment plans and commercial credit through the improvement or new CSMs. Originality/value: The economic reality surrounding credit granting has made risk management a complex decision-making issue increasingly supported by CSMs. Therefore, this paper satisfies an important gap in the literature to present an analysis of recent advances in optimisation methods applied to CSMs. The main contribution of this paper consists of presenting the evolution of the state of the art and future trends in studies aimed at proposing better CSMs

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas