6 research outputs found

    Identification of top-K influential communities in big networks

    Get PDF

    Customer churn prediction in telecom using machine learning and social network analysis in big data platform

    Full text link
    Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK

    Network communities of dynamical influence

    Get PDF
    Fuelled by a desire for greater connectivity, networked systems now pervade our society at an unprecedented level that will affect it in ways we do not yet understand. In contrast, nature has already developed efficient networks that can instigate rapid response and consensus when key elements are stimulated. We present a technique for identifying these key elements by investigating the relationships between a system’s most dominant eigenvectors. This approach reveals the most effective vertices for leading a network to rapid consensus when stimulated, as well as the communities that form under their dynamical influence. In applying this technique, the effectiveness of starling flocks was found to be due, in part, to the low outdegree of every bird, where increasing the number of outgoing connections can produce a less responsive flock. A larger outdegree also affects the location of the birds with the most influence, where these influentially connected birds become more centrally located and in a poorer position to observe a predator and, hence, instigate an evasion manoeuvre. Finally, the technique was found to be effective in large voxel-wise brain connectomes where subjects can be identified from their influential communities

    Advanced Applications Of Big Data Analytics

    Full text link
    Human life is progressing with advancements in technology such as laptops, smart phones, high speed communication networks etc., which helps us by reducing load in doing our daily activities. For instance, one can chat, talk, make video calls with his/her friends instantly using social networking platforms such as Facebook, Twitter, Google+, WhatsApp etc. LinkedIn, Indeed, etc., connects employees with potential employers. The number of people using these applications are increasing day-by-day, and so is the amount of data generated from these applications. Processing such vast amounts of data, may require new techniques for gaining valuable insights. Network theory concepts form the core of such techniques that are designed to uncover valuable insights from large social network datasets. Many interesting problems such as ranking top-K nodes and top-K communities that can effectively diffuse any given message into the network, restaurant recommendations, friendship recommendations on social networking websites, etc., can be addressed by using the concepts of network centrality. Network centrality measures such as In-degree centrality, Out-degree centrality, Eigen-vector centrality, Katz Broadcast centrality, Katz Receive centrality, and PageRank centrality etc., comes handy in solving these problems. In this thesis, we propose different formulae for computing the strength for identifying top-K nodes and communities that can spread viral marketing messages into the network. The strength formulae are based on Katz Broadcast centrality, Resolvent matrix measure and Personalized PageRank measure. Moreover, the effects of intercommunity and intracommunity connectivity in ranking top-K communities are studied. Top-K nodes for spreading any message effectively into the network are determined by using Katz Broadcast centrality measure. Results obtained through this technique are compared with the top-K nodes obtained by using Degree centrality measure. We also studied the effects of varying α on the number of nodes in search space. In Algorithms 2 and 3, top-K communities are obtained by using Resolvent matrix and Personalized PageRank measure. Algorithm 2 results were studied by varying the parameter α