1,111 research outputs found
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
Radar-based Feature Design and Multiclass Classification for Road User Recognition
The classification of individual traffic participants is a complex task,
especially for challenging scenarios with multiple road users or under bad
weather conditions. Radar sensors provide an - with respect to well established
camera systems - orthogonal way of measuring such scenes. In order to gain
accurate classification results, 50 different features are extracted from the
measurement data and tested on their performance. From these features a
suitable subset is chosen and passed to random forest and long short-term
memory (LSTM) classifiers to obtain class predictions for the radar input.
Moreover, it is shown why data imbalance is an inherent problem in automotive
radar classification when the dataset is not sufficiently large. To overcome
this issue, classifier binarization is used among other techniques in order to
better account for underrepresented classes. A new method to couple the
resulting probabilities is proposed and compared to others with great success.
Final results show substantial improvements when compared to ordinary
multiclass classificationComment: 8 pages, 6 figure
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Estudio de métodos de construcción de ensembles de clasificadores y aplicaciones
La inteligencia artificial se dedica a la creación de sistemas informáticos con un comportamiento inteligente. Dentro de este área el aprendizaje computacional estudia la creación de sistemas que aprenden por sà mismos.
Un tipo de aprendizaje computacional es el aprendizaje supervisado, en el cual, se le proporcionan al sistema tanto las entradas como la salida esperada y el sistema aprende a partir de estos datos. Un sistema de este tipo se denomina clasificador.
En ocasiones ocurre, que en el conjunto de ejemplos que utiliza el sistema para aprender, el número de ejemplos de un tipo es mucho mayor que el número de ejemplos de otro tipo. Cuando esto ocurre se habla de conjuntos desequilibrados.
La combinación de varios clasificadores es lo que se denomina "ensemble", y a menudo ofrece mejores resultados que cualquiera de los miembros que lo forman. Una de las claves para el buen funcionamiento de los ensembles es la diversidad.
Esta tesis, se centra en el desarrollo de nuevos algoritmos de construcción de ensembles, centrados en técnicas de incremento de la diversidad y en los problemas desequilibrados. Adicionalmente, se aplican estas técnicas a la solución de varias problemas industriales.Ministerio de EconomÃa y Competitividad, proyecto TIN-2011-2404
A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction
Magnetic resonance imaging (MRI) plays an important role in modern medical
diagnostic but suffers from prolonged scan time. Current deep learning methods
for undersampled MRI reconstruction exhibit good performance in image
de-aliasing which can be tailored to the specific kspace undersampling
scenario. But it is very troublesome to configure different deep networks when
the sampling setting changes. In this work, we propose a deep plug-and-play
method for undersampled MRI reconstruction, which effectively adapts to
different sampling settings. Specifically, the image de-aliasing prior is first
learned by a deep denoiser trained to remove general white Gaussian noise from
synthetic data. Then the learned deep denoiser is plugged into an iterative
algorithm for image reconstruction. Results on in vivo data demonstrate that
the proposed method provides nice and robust accelerated image reconstruction
performance under different undersampling patterns and sampling rates, both
visually and quantitatively.Comment: 5 pages, 3 figure
An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification
In predictive tasks, real-world datasets often present di erent degrees of imbalanced (i.e., long-tailed or skewed) distributions.
While the majority (the head or the most frequent) classes have su cient samples, the minority (the tail or
the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing
has been shown to be very e ective in dealing with such problems. On one hand, data re-sampling is a common
approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a
conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between
feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before.
To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and
re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines
for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number
of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 resampling
approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results
show that there is no constant winner between the two pipelines; thus both of them should be considered to derive
the best performing model for imbalance classification. We find that the performance of an imbalance classification
model not only depends on the classifier adopted and the ratio between the number of majority and minority samples,
but also depends on the ratio between the number of samples and features. Overall, this study should provide new
reference value for researchers and practitioners in imbalance learning.TIN2017-89517-
An Examination of the Smote and Other Smote-based Techniques That Use Synthetic Data to Oversample the Minority Class in the Context of Credit-Card Fraud Classification
This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope with large differentials between classes. For that reason, various different methods have been developed to help tackle this problem. Oversampling and undersampling are examples of techniques that help deal with the class imbalance problem through sampling. This paper will evaluate oversampling techniques that use synthetic data to balance the minority class. The idea of using synthetic data to compensate for the minority class was first proposed by (Chawla et al., 2002). The technique is known as Synthetic Minority Over-Sampling Technique (SMOTE). Following the development of the technique, other techniques were developed from it. This paper will evaluate the SMOTE technique along with other also popular SMOTE-based extensions of the original technique
- …