96 research outputs found

    Churn Identification and Prediction from a Large-Scale Telecommunication Dataset Using NLP

    Get PDF
    The identification of customer churn is a major issue for large telecom businesses. In order to manage the data of current customers as well as acquire and manage new customers, every day, a substantial volume of data gets generated. Therefore, it's crucial to identify the causes of client churn so that the appropriate steps can be taken to lower it. Numerous researchers have already discussed their efforts to combine static and dynamic approaches in order to reduce churn in big data sets, but these systems still have many issues when it comes to actually identifying churn. In this paper, we suggested two methods, the first of which is churn identification and using Natural Language Processing (NLP) methods and machine learning techniques, we make predictions based on a vast telecommunication data set. The NLP process involves data pre-processing, normalization, feature extraction, and feature selection. For feature extraction, we employ unique techniques like TF-IDF, Stanford NLP, and occurrence correlation methods, have been suggested. Throughout the lesson, a machine learning classification algorithm is used for training and testing. Finally, the system employs a variety of cross validation techniques and training and evaluating Machine learning algorithms. The experimental analysis shows the system's efficacy and accuracy

    Boosting Ant Colony Optimization with Reptile Search Algorithm for Churn Prediction

    Get PDF
    © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).The telecommunications industry is greatly concerned about customer churn due to dissatisfaction with service. This industry has started investing in the development of machine learning (ML) models for churn prediction to extract, examine and visualize their customers’ historical information from a vast amount of big data which will assist to further understand customer needs and take appropriate actions to control customer churn. However, the high-dimensionality of the data has a large influence on the performance of the ML model, so feature selection (FS) has been applied since it is a primary preprocessing step. It improves the ML model’s performance by selecting salient features while reducing the computational time, which can assist this sector in building effective prediction models. This paper proposes a new FS approach ACO-RSA, that combines two metaheuristic algorithms (MAs), namely, ant colony optimization (ACO) and reptile search algorithm (RSA). In the developed ACO-RSA approach, an ACO and RSA are integrated to choose an important subset of features for churn prediction. The ACO-RSA approach is evaluated on seven open-source customer churn prediction datasets, ten CEC 2019 test functions, and its performance is compared to particle swarm optimization (PSO), multi verse optimizer (MVO) and grey wolf optimizer (GWO), standard ACO and standard RSA. According to the results along with statistical analysis, ACO-RSA is an effective and superior approach compared to other competitor algorithms on most datasets.Peer reviewedFinal Published versio

    Feature Selection with Integrated Gaussian Seahorse Optimization Data Mining for Cross-border Business Cooperation between the Malaysian Medical Industry and Tourism Industry

    Get PDF
    The cross-border collaboration between the medical industry and the tourism industry has gained significant attention as a promising avenue for economic growth and development. Data mining techniques are employed to extract valuable patterns and insights from large-scale datasets, shedding light on the opportunities and challenges associated with this collaborative effort. This study proposes an integrated approach that combines feature selection with Gaussian Seahorse Optimization Data Mining (GSH-DM) to identify the most relevant features and optimize the data mining process. The GSH-DM assembling comprehensive datasets encompassing information from both the Malaysian medical industry and tourism industry. The integrated GSH-DM model then applies the Gaussian Seahorse Optimization algorithm to optimize the data mining process, enhancing the accuracy and efficiency of pattern discovery. the GSH-DM model, this study aims to uncover hidden patterns, relationships, and predictive models that can guide decision-making and strategy development for cross-border business cooperation. The findings of this study contribute to a deeper understanding of the factors that influence cross-border business cooperation between the Malaysian medical industry and the tourism industry. The integrated GSH-DM approach showcases the potential of combining feature selection techniques with advanced optimization algorithms in data mining applications. The results of GSH-DM provide actionable insights for stakeholders, enabling them to make informed decisions and foster successful cross-border collaborations between the Malaysian medical industry and the tourism industry. The analysis of the results demonstrated that GSH-DM exhibits improved performance for feature selection and classification

    A review of the use of artificial intelligence methods in infrastructure systems

    Get PDF
    The artificial intelligence (AI) revolution offers significant opportunities to capitalise on the growth of digitalisation and has the potential to enable the ‘system of systems’ approach required in increasingly complex infrastructure systems. This paper reviews the extent to which research in economic infrastructure sectors has engaged with fields of AI, to investigate the specific AI methods chosen and the purposes to which they have been applied both within and across sectors. Machine learning is found to dominate the research in this field, with methods such as artificial neural networks, support vector machines, and random forests among the most popular. The automated reasoning technique of fuzzy logic has also seen widespread use, due to its ability to incorporate uncertainties in input variables. Across the infrastructure sectors of energy, water and wastewater, transport, and telecommunications, the main purposes to which AI has been applied are network provision, forecasting, routing, maintenance and security, and network quality management. The data-driven nature of AI offers significant flexibility, and work has been conducted across a range of network sizes and at different temporal and geographic scales. However, there remains a lack of integration of planning and policy concerns, such as stakeholder engagement and quantitative feasibility assessment, and the majority of research focuses on a specific type of infrastructure, with an absence of work beyond individual economic sectors. To enable solutions to be implemented into real-world infrastructure systems, research will need to move away from a siloed perspective and adopt a more interdisciplinary perspective that considers the increasing interconnectedness of these systems

    Personality Identification from Social Media Using Deep Learning: A Review

    Get PDF
    Social media helps in sharing of ideas and information among people scattered around the world and thus helps in creating communities, groups, and virtual networks. Identification of personality is significant in many types of applications such as in detecting the mental state or character of a person, predicting job satisfaction, professional and personal relationship success, in recommendation systems. Personality is also an important factor to determine individual variation in thoughts, feelings, and conduct systems. According to the survey of Global social media research in 2018, approximately 3.196 billion social media users are in worldwide. The numbers are estimated to grow rapidly further with the use of mobile smart devices and advancement in technology. Support vector machine (SVM), Naive Bayes (NB), Multilayer perceptron neural network, and convolutional neural network (CNN) are some of the machine learning techniques used for personality identification in the literature review. This paper presents various studies conducted in identifying the personality of social media users with the help of machine learning approaches and the recent studies that targeted to predict the personality of online social media (OSM) users are reviewed

    Supplier Selection and Relationship Management: An Application of Machine Learning Techniques

    Get PDF
    Managing supply chains is an extremely challenging task due to globalization, short product life cycle, and recent advancements in information technology. These changes result in the increasing importance of managing the relationship with suppliers. However, the supplier selection literature mainly focuses on selecting suppliers based on previous performance, environmental and social criteria and ignores supplier relationship management. Moreover, although the explosion of data and the capabilities of machine learning techniques in handling dynamic and fast changing environment show promising results in customer relationship management, especially in customer lifetime value, this area has been untouched in the upstream side of supply chains. This research is an attempt to address this gap by proposing a framework to predict supplier future value, by incorporating the contract history data, relationship value, and supply network properties. The proposed model is empirically tested for suppliers of public works and government services Canada. Methodology wise, this thesis demonstrates the application of machine learning techniques for supplier selection and developing effective strategies for managing relationships. Practically, the proposed framework equips supply chain managers with a proactive and forward-looking approach for managing supplier relationship

    A Survey on Evolutionary Computation Approaches to Feature Selection

    Get PDF
    Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.</p

    14th Conference on DATA ANALYSIS METHODS for Software Systems

    Get PDF
    DAMSS-2023 is the 14th International Conference on Data Analysis Methods for Software Systems, held in Druskininkai, Lithuania. Every year at the same venue and time. The exception was in 2020, when the world was gripped by the Covid-19 pandemic and the movement of people was severely restricted. After a year’s break, the conference was back on track, and the next conference was successful in achieving its primary goal of lively scientific communication. The conference focuses on live interaction among participants. For better efficiency of communication among participants, most of the presentations are poster presentations. This format has proven to be highly effective. However, we have several oral sections, too. The history of the conference dates back to 2009 when 16 papers were presented. It began as a workshop and has evolved into a well-known conference. The idea of such a workshop originated at the Institute of Mathematics and Informatics, now the Institute of Data Science and Digital Technologies of Vilnius University. The Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea, which gained enthusiastic acceptance from both the Lithuanian and international scientific communities. This year’s conference features 84 presentations, with 137 registered participants from 11 countries. The conference serves as a gathering point for researchers from six Lithuanian universities, making it the main annual meeting for Lithuanian computer scientists. The primary aim of the conference is to showcase research conducted at Lithuanian and foreign universities in the fields of data science and software engineering. The annual organization of the conference facilitates the rapid exchange of new ideas within the scientific community. Seven IT companies supported the conference this year, indicating the relevance of the conference topics to the business sector. In addition, the conference is supported by the Lithuanian Research Council and the National Science and Technology Council (Taiwan, R. O. C.). The conference covers a wide range of topics, including Applied Mathematics, Artificial Intelligence, Big Data, Bioinformatics, Blockchain Technologies, Business Rules, Software Engineering, Cybersecurity, Data Science, Deep Learning, High-Performance Computing, Data Visualization, Machine Learning, Medical Informatics, Modelling Educational Data, Ontological Engineering, Optimization, Quantum Computing, Signal Processing. This book provides an overview of all presentations from the DAMSS-2023 conference

    A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications

    Full text link
    Enterprise financial risk analysis aims at predicting the enterprises' future financial risk.Due to the wide application, enterprise financial risk analysis has always been a core research issue in finance. Although there are already some valuable and impressive surveys on risk management, these surveys introduce approaches in a relatively isolated way and lack the recent advances in enterprise financial risk analysis. Due to the rapid expansion of the enterprise financial risk analysis, especially from the computer science and big data perspective, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing enterprise financial risk researches, as well as to summarize and interpret the mechanisms and the strategies of enterprise financial risk analysis in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. This paper provides a systematic literature review of over 300 articles published on enterprise risk analysis modelling over a 50-year period, 1968 to 2022. We first introduce the formal definition of enterprise risk as well as the related concepts. Then, we categorized the representative works in terms of risk type and summarized the three aspects of risk analysis. Finally, we compared the analysis methods used to model the enterprise financial risk. Our goal is to clarify current cutting-edge research and its possible future directions to model enterprise risk, aiming to fully understand the mechanisms of enterprise risk communication and influence and its application on corporate governance, financial institution and government regulation

    PREDICCIÓN DE RENUNCIA DE SOCIOS DE UNA COOPERATIVA UTILIZANDO TÉCNICAS SUPERVISADAS DE APRENDIZAJE AUTOMÁTICO

    Get PDF
    El presente trabajo de investigación tiene como objetivo la predicción de renuncia de socios de una cooperativa ubicada en la ciudad de Arequipa, mediante técnicas supervisadas de aprendizaje automático siguiendo una metodología personalizada. Se realizó el preprocesamiento de datos, se eligieron las técnicas idóneas para este caso de estudio y se aplicaron dichas técnicas con las librerías del lenguaje de programación Python. Como la cooperativa no tiene muchos datos y las técnicas requieren bastantes datos para una mejor precisión, se optó por utilizar datos generados sintéticamente correlacionados a los datos originales. Se hizo un análisis de los resultados de las técnicas con los datos reales y los datos sintéticos en el que se determinó que la mejor técnica para este caso es de potenciación de gradiente con un 90% de precisión. Finalmente, para la validación de las técnicas se hizo una prueba con dos casos reales, el primero de un socio que renunció a la cooperativa y el segundo con un socio que se mantuvo en la cooperativa, la técnica que obtuvo el resultado correcto fue la entrenada con los datos sintéticos. Palabras Clave Aprendizaje automático, renuncia de socios, cooperativa, aprendizaje supervisado, datos sintéticos.Tesi
    corecore