16 research outputs found

    A SLR on Customer Dropout Prediction

    Get PDF
    Dropout prediction is a problem that is being addressed with machine learning algorithms; thus, appropriate approaches to address the dropout rate are needed. The selection of an algorithm to predict the dropout rate is only one problem to be addressed. Other aspects should also be considered, such as which features should be selected and how to measure accuracy while considering whether the features are appropriate according to the business context in which they are employed. To solve these questions, the goal of this paper is to develop a systematic literature review to evaluate the development of existing studies and to predict the dropout rate in contractual settings using machine learning to identify current trends and research opportunities. The results of this study identify trends in the use of machine learning algorithms in different business areas and in the adoption of machine learning algorithms, including which metrics are being adopted and what features are being applied. Finally, some research opportunities and gaps that could be explored in future research are presented.info:eu-repo/semantics/publishedVersio

    A SLR on Customer Dropout Prediction

    Get PDF
    Dropout prediction is a problem that is being addressed with machine learning algorithms; thus, appropriate approaches to address the dropout rate are needed. The selection of an algorithm to predict the dropout rate is only one problem to be addressed. Other aspects should also be considered, such as which features should be selected and how to measure accuracy while considering whether the features are appropriate according to the business context in which they are employed. To solve these questions, the goal of this paper is to develop a systematic literature review to evaluate the development of existing studies and to predict the dropout rate in contractual settings using machine learning to identify current trends and research opportunities. The results of this study identify trends in the use of machine learning algorithms in different business areas and in the adoption of machine learning algorithms, including which metrics are being adopted and what features are being applied. Finally, some research opportunities and gaps that could be explored in future research are presented.info:eu-repo/semantics/publishedVersio

    Behavioral analysis in cybersecurity using machine learning: a study based on graph representation, class imbalance and temporal dissection

    Get PDF
    The main goal of this thesis is to improve behavioral cybersecurity analysis using machine learning, exploiting graph structures, temporal dissection, and addressing imbalance problems.This main objective is divided into four specific goals: OBJ1: To study the influence of the temporal resolution on highlighting micro-dynamics in the entity behavior classification problem. In real use cases, time-series information could be not enough for describing the entity behavior classification. For this reason, we plan to exploit graph structures for integrating both structured and unstructured data in a representation of entities and their relationships. In this way, it will be possible to appreciate not only the single temporal communication but the whole behavior of these entities. Nevertheless, entity behaviors evolve over time and therefore, a static graph may not be enoughto describe all these changes. For this reason, we propose to use a temporal dissection for creating temporal subgraphs and therefore, analyze the influence of the temporal resolution on the graph creation and the entity behaviors within. Furthermore, we propose to study how the temporal granularity should be used for highlighting network micro-dynamics and short-term behavioral changes which can be a hint of suspicious activities. OBJ2: To develop novel sampling methods that work with disconnected graphs for addressing imbalanced problems avoiding component topology changes. Graph imbalance problem is a very common and challenging task and traditional graph sampling techniques that work directly on these structures cannot be used without modifying the graph’s intrinsic information or introducing bias. Furthermore, existing techniques have shown to be limited when disconnected graphs are used. For this reason, novel resampling methods for balancing the number of nodes that can be directly applied over disconnected graphs, without altering component topologies, need to be introduced. In particular, we propose to take advantage of the existence of disconnected graphs to detect and replicate the most relevant graph components without changing their topology, while considering traditional data-level strategies for handling the entity behaviors within. OBJ3: To study the usefulness of the generative adversarial networks for addressing the class imbalance problem in cybersecurity applications. Although traditional data-level pre-processing techniques have shown to be effective for addressing class imbalance problems, they have also shown downside effects when highly variable datasets are used, as it happens in cybersecurity. For this reason, new techniques that can exploit the overall data distribution for learning highly variable behaviors should be investigated. In this sense, GANs have shown promising results in the image and video domain, however, their extension to tabular data is not trivial. For this reason, we propose to adapt GANs for working with cybersecurity data and exploit their ability in learning and reproducing the input distribution for addressing the class imbalance problem (as an oversampling technique). Furthermore, since it is not possible to find a unique GAN solution that works for every scenario, we propose to study several GAN architectures with several training configurations to detect which is the best option for a cybersecurity application. OBJ4: To analyze temporal data trends and performance drift for enhancing cyber threat analysis. Temporal dynamics and incoming new data can affect the quality of the predictions compromising the model reliability. This phenomenon makes models get outdated without noticing. In this sense, it is very important to be able to extract more insightful information from the application domain analyzing data trends, learning processes, and performance drifts over time. For this reason, we propose to develop a systematic approach for analyzing how the data quality and their amount affect the learning process. Moreover, in the contextof CTI, we propose to study the relations between temporal performance drifts and the input data distribution for detecting possible model limitations, enhancing cyber threat analysis.Programa de Doctorado en Ciencias y Tecnologías Industriales (RD 99/2011) Industria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011

    5th International Conference on Advanced Research Methods and Analytics (CARMA 2023)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 5th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.Martínez Torres, MDR.; Toral Marín, S. (2023). 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2023.2023.1700

    Multikonferenz Wirtschaftsinformatik (MKWI) 2016: Technische Universität Ilmenau, 09. - 11. März 2016; Band II

    Get PDF
    Übersicht der Teilkonferenzen Band II • eHealth as a Service – Innovationen für Prävention, Versorgung und Forschung • Einsatz von Unternehmenssoftware in der Lehre • Energieinformatik, Erneuerbare Energien und Neue Mobilität • Hedonische Informationssysteme • IKT-gestütztes betriebliches Umwelt- und Nachhaltigkeitsmanagement • Informationssysteme in der Finanzwirtschaft • IT- und Software-Produktmanagement in Internet-of-Things-basierten Infrastrukturen • IT-Beratung im Kontext digitaler Transformation • IT-Sicherheit für Kritische Infrastrukturen • Modellierung betrieblicher Informationssysteme – Konzeptuelle Modelle im Zeitalter der digitalisierten Wirtschaft (d!conomy) • Prescriptive Analytics in I

    Modeling Viewer and Influencer Behavior on Streaming Platforms

    Full text link
    The video streaming industry is growing rapidly, and consumers are increasingly using ad-supported streaming services. There are important questions related to the effect of ad schedules and video elements on viewer behavior that have not been adequately studied in the marketing literature. In my dissertation, I study these topics by applying causal and/or interpretable machine learning methods on behavioral data. In the first essay, “Finding the Sweet Spot: Ad Scheduling on Streaming Media”, I design an “optimal” ad schedule that balances the interest of the viewer (watching content) with that of the streaming platform (ad exposure). This is accomplished using a three-stage approach applied on a dataset of Hulu customers. In the first stage, I develop two metrics – Bingeability and Ad Tolerance – to capture the interplay between content consumption and ad exposure in a viewing session. Bingeability represents the number of completely viewed unique episodes of a show, while Ad Tolerance represents the willingness of a viewer to watch ads and subsequent content. In the second stage, I predict the value of the metrics for the next viewing session using the machine learning method – Extreme Gradient Boosting – while controlling for the non-randomness in ad delivery to a focal viewer using “instrumental variables” based on ad delivery patterns to other viewers. Using “feature importance analyses” and “partial dependence plots” I shed light on the importance and nature of the non-linear relationship with various feature sets, going beyond a purely black-box approach. Finally, in the third stage, I implement a novel constrained optimization procedure built around the causal predictions to provide an “optimal” ad-schedule for a viewer, while ensuring the level of ad exposure does not exceed her predicted Ad Tolerance. Under the optimized schedule, I find that “win-win” schedules are possible that allow for both an increase in content consumption and ad exposure. In the second essay, “Video Influencers: Unboxing the Mystique”, I study the relationship between advertising content in YouTube influencer videos (across text, audio and images) and marketing outcomes (views, interaction rates and sentiment). This is accomplished with the help of novel interpretable deep-learning architectures that avoid making a trade-off between predictive ability and interpretability. Specifically, I achieve high predictive performance by avoiding ex-ante feature engineering and achieve better interpretability by eliminating spurious relationships confounded by factors unassociated with “attention” paid to video elements. The attention mechanism in the Text and Audio models along with gradient maps in the Image model allow identification of video elements on which attention is paid while forming an association with an outcome. Such an ex-post analysis allows me to find statistically significant relationships between video elements and marketing outcomes that are supplemented by a significant increase in attention to video elements. By eliminating spurious relationships, I generate hypotheses that are more likely to have causal effects when tested in a field setting. For example, I find that mentioning a brand in the first 30 seconds of a video is on average associated with a significant increase in attention to the brand but a significant decrease in sentiment expressed towards the video. Overall, my dissertation provides solutions and identifies strategies that can improve the welfare of viewers, platform owners, influencers and brand partners. Policy makers also stand to gain from understanding the power exerted by different stakeholders over viewer behavior.PHDBusiness AdministrationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169824/1/prajaram_1.pd

    Scalable RFM-Enriched representation learning for churn prediction

    No full text
    Most of the recent studies on churn prediction in telco utilize social networks built on top of the call (and/or SMS) graphs to derive informative features. However, extracting features from large graphs, especially structural features, is an intricate process both from a methodological and computational perspective. Due to the former, feature extraction in the current literature has mainly been addressed in an ad-hoc and handcrafted manner. Due to the latter, the full potential of the structural information is unexploited. In this work, we incorporate both interaction and structural information by devising two different ways of enriching original graphs with interaction information, delineated by the well-known RFM model. We circumvent the process of extensive manual feature engineering by enriching the networks and improving the scalability of the renowned node2vec approach to learn node representations. The obtained results demonstrate that our enriched network outperforms baseline RFM-based methods.</p
    corecore