3,130 research outputs found
Gaining insight from large data volumes with ease
Efficient handling of large data-volumes becomes a necessity in today's
world. It is driven by the desire to get more insight from the data and to gain
a better understanding of user trends which can be transformed into economic
incentives (profits, cost-reduction, various optimization of data workflows,
and pipelines). In this paper, we discuss how modern technologies are
transforming well established patterns in HEP communities. The new data insight
can be achieved by embracing Big Data tools for a variety of use-cases, from
analytics and monitoring to training Machine Learning models on a terabyte
scale. We provide concrete examples within context of the CMS experiment where
Big Data tools are already playing or would play a significant role in daily
operations
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
- …