Search CORE

55 research outputs found

Data pre-processing for database marketing

Author: Cortez Paulo
Pinto Filipe
Quintela Hélder
Santos Manuel Filipe
Publication venue
Publication date: 01/01/2004
Field of study

To increase effectiveness in their marketing and Customer Relationship Manager activities, many organizations are adopting strategies of Database Marketing (DBM). Nowadays, DBM faces new challenges in business knowledge since current strategies are mainly approached by classical statistical inference, which may fail when complex, multi-dimensional and incomplete data is available. An alternative is to use Knowledge Discovery from Databases (KDD), which aims at automatic extraction of useful patterns by using Data Mining (DM) techniques. When applied to DBM, the identified patterns can be used for the efficient characterization of the customers. This paper focus several problems that arose in the data pre-processing step (e.g. data cleaning), which is necessary for the success of the DM approach to a DBM project

Universidade do Minho: RepositoriUM

Geospatial data pre-processing on watershed datasets: A GIS approach

Author: Armstrong Leisa
Croke Barry
Nallan Sreedhar
Tripathy Amiya K
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2014
Field of study

Spatial data mining helps to identify interesting patterns from the spatial data sets. However, geo spatial data requires substantial data pre-processing before data can be interrogated further using data mining techniques. Multi-dimensional spatial data has been used to explain the spatial analysis and SOLAP for pre-processing data. This paper examines some of the methods for pre-processing of the data using Arc GIS 10.2 and Spatial Analyst with a case study dataset of a watershed

Research Online @ ECU

Dealing with missing data for prognostic purposes

Author: Bennett I.
Duan F.
Loukopoulos Panagiotis
Mba David
Pilidis Pericles
Sampath Suresh
Zolkiewski G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2016
Field of study

Centrifugal compressors are considered one of the most critical components in oil industry, making the minimization of their downtime and the maximization of their availability a major target. Maintenance is thought to be a key aspect towards achieving this goal, leading to various maintenance schemes being proposed over the years. Condition based maintenance and prognostics and health management (CBM/PHM), which is relying on the concepts of diagnostics and prognostics, has been gaining ground over the last years due to its ability of being able to plan the maintenance schedule in advance. The successful application of this policy is heavily dependent on the quality of data used and a major issue affecting it, is that of missing data. Missing data's presence may compromise the information contained within a set, thus having a significant effect on the conclusions that can be drawn from the data, as there might be bias or misleading results. Consequently, it is important to address this matter. A number of methodologies to recover the data, called imputation techniques, have been proposed. This paper reviews the most widely used techniques and presents a case study with the use of actual industrial centrifugal compressor data, in order to identify the most suitable ones

Cranfield CERES

RecSys Challenge 2023: From data preparation to prediction, a simple, efficient, robust and scalable solution

Author: Lecron Fabian
Manderlier Maxime
Publication venue
Publication date: 12/01/2024
Field of study

The RecSys Challenge 2023, presented by ShareChat, consists to predict if an user will install an application on his smartphone after having seen advertising impressions in ShareChat & Moj apps. This paper presents the solution of 'Team UMONS' to this challenge, giving accurate results (our best score is 6.622686) with a relatively small model that can be easily implemented in different production configurations. Our solution scales well when increasing the dataset size and can be used with datasets containing missing values

arXiv.org e-Print Archive

An FPGA-based network system with service-uninterrupted remote functional update

Author: Marsono Muhammad Nadzir
Ooi Chia Yee
Tan Tze Hon
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/01/2021
Field of study

The recent emergence of 5G network enables mass wireless sensors deployment for internet-of-things (IoT) applications. In many cases, IoT sensors in monitoring and data collection applications are required to operate continuously and active at all time (24/7) to ensure all data are sampled without loss. Field-programmable gate array (FPGA)-based systems exhibit a balanced processing throughput and datapath flexibility. Specifically, datapath flexibility is acquired from the FPGA-based system architecture that supports dynamic partial reconfiguration feature. However, device functional update can cause interruption to the application servicing, especially in an FPGA-based system. This paper presents a standalone FPGA-based system architecture that allows remote functional update without causing service interruption by adopting a redundancy mechanism in the application datapath. By utilizing dynamic partial reconfiguration, only the updating datapath is temporarily inactive while the rest of the circuitry, including the redundant datapath, remain active. Hence, there is no service interruption and downtime when a remote functional update takes place due to the existence of redundant application datapath, which is critical for network and communication systems. The proposed architecture has a significant impact for application in FPGA-based systems that have little or no tolerance in service interruption

ZENODO

Universiti Teknologi Malaysia Institutional Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Internet traffic forecasting using neural networks

Author: Cortez Paulo
Rio Miguel
Rocha Miguel
Sousa Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The forecast of Internet traffic is an important issue that has received few attention from the computer networks field. By improving this task, efficient traffic engineering and anomaly detection tools can be created, resulting in economic gains from better resource management. This paper presents a Neural Network Ensemble (NNE) for the prediction of TCP/IP traffic using a Time Series Forecasting (TSF) point of view. Several experiments were devised by considering real-world data from two large Internet Service Providers. In addition, different time scales (e.g. every five minutes and hourly) and forecasting horizons were analyzed. Overall, the NNE approach is competitive when compared with other TSF methods (e.g. Holt-Winters and ARIMA).Engineering and Physical Sciences Research Council (EP/522885 grant).Portuguese National Conference of Rectors (CRUP)/British Council Portugal (B-53/05 grant).Nuffield Foundation (NAL/001136/A grant)

CiteSeerX

Universidade do Minho: RepositoriUM

Scipedia

Predicting inpatient length of stay in a Portuguese hospital using the CRISP-DM methodology

Author: Abelha
Freitas
Han
Hastie
Liu
Pena
Rufino
Tanuja
Tsumoto
Walczak
Publication venue: 'Academy Publisher'
Publication date: 01/01/2014
Field of study

Com base nos dados disponíveis num hospital português relativos aos processos de internamento, ocorridos no período de 2000 a 2013, e seguindo a metodologia de data mining CRISP-DM, obteve-se um modelo de previsão dos tempos de internamento baseado no algoritmo random forest que apresentou uma elevada qualidade, e superior à obtida com outras técnicas de data mining, e que permitiu identificar os atributos clínicos do paciente como os mais importantes para a explicação dos tempos de internamento.Using data collected from a Portuguese hospital, within the period 2000 to 2013, we adopted the CRISP-DM methodology to predict inpatient length of stay. The best method (random forest algorithm) achieved a high quality prediction. Such model allowed the identification of the most relevant input features, which are related with the patients’ clinical attributes.(undefined

Universidade do Minho: RepositoriUM

Crossref

Repositório Institucional do ISCTE-IUL

Directory of Open Access Journals

Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology

Author: A Freitas
A Oliveira
AD Kalra
C Clifton
D Merom
F Abelha
G Chiusano
IH Witten
K Cios
M Brown
P Cortez
P Cortez
S Menard
T Hastie
U Fayyad
Á Silva
Á Silva
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2015
Field of study

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artiﬁcial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coeﬃcient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three inﬂuential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge conﬁrmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers

Universidade do Minho: RepositoriUM

Crossref

Variable importance for sustaining macrophyte presence via random forests : data imputation and model settings

Author: Goethals Peter
Van Echelpoel Wout
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Data sets plagued with missing data and performance-affecting model parameters represent recurrent issues within the field of data mining. Via random forests, the influence of data reduction, outlier and correlated variable removal and missing data imputation technique on the performance of habitat suitability models for three macrophytes (Lemna minor, Spirodela polyrhiza and Nuphar lutea) was assessed. Higher performances (Cohen’s kappa values around 0.2–0.3) were obtained for a high degree of data reduction, without outlier or correlated variable removal and with imputation of the median value. Moreover, the influence of model parameter settings on the performance of random forest trained on this data set was investigated along a range of individual trees (ntree), while the number of variables to be considered (mtry), was fixed at two. Altering the number of individual trees did not have a uniform effect on model performance, but clearly changed the required computation time. Combining both criteria provided an ntree value of 100, with the overall effect of ntree on performance being relatively limited. Temperature, pH and conductivity remained as variables and showed to affect the likelihood of L. minor, S. polyrhiza and N. lutea being present. Generally, high likelihood values were obtained when temperature is high (>20 °C), conductivity is intermediately low (50–200 mS m−1) or pH is intermediate (6.9–8), thereby also highlighting that a multivariate management approach for supporting macrophyte presence remains recommended. Yet, as our conclusions are only based on a single freshwater data set, they should be further tested for other data sets

Ghent University Academic Bibliography

Directory of Open Access Journals