89 research outputs found

    Statistical modelling of clickstream behaviour to inform real-time advertising decisions

    Get PDF
    Online user browsing generates vast quantities of typically unexploited data. Investigating this data and uncovering the valuable information it contains can be of substantial value to online businesses, and statistics plays a key role in this process. The data takes the form of an anonymous digital footprint associated with each unique visitor, resulting in 10610^{6} unique profiles across 10710^{7} individual page visits on a daily basis. Exploring, cleaning and transforming data of this scale and high dimensionality (2TB+ of memory) is particularly challenging, and requires cluster computing. We outline a variable selection method to summarise clickstream behaviour with a single value, and make comparisons to other dimension reduction techniques. We illustrate how to apply generalised linear models and zero-inflated models to predict sponsored search advert clicks based on keywords. We consider the problem of predicting customer purchases (known as conversions), from the customer’s journey or clickstream, which is the sequence of pages seen during a single visit to a website. We consider each page as a discrete state with probabilities of transitions between the pages, providing the basis for a simple Markov model. Further, Hidden Markov models (HMMs) are applied to relate the observed clickstream to a sequence of hidden states, uncovering meta-states of user activity. We can also apply conventional logistic regression to model conversions in terms of summaries of the profile’s browsing behaviour and incorporate both into a set of tools to solve a wide range of conversion types where we can directly compare the predictive capability of each model. In real-time, predicting profiles that are likely to follow similar behaviour patterns to known conversions, will have a critical impact on targeted advertising. We illustrate these analyses with results from real data collected by an Audience Management Platform (AMP) - Carbon

    Distributed Load Testing by Modeling and Simulating User Behavior

    Get PDF
    Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system or its usage. Lack of adaptation can prevent the integration of load testing into system quality assurance, leading to an incomplete evaluation of system quality. The goal of this research is to improve the state of software engineering by addressing open challenges in load testing of human-machine systems with a novel process that a) models and classifies user behavior from streaming and aggregated log data, b) adapts to changes in system and user behavior, and c) generates distributed workload by realistically simulating user behavior. This research contributes a Learning, Online, Distributed Engine for Simulation and Testing based on the Operational Norms of Entities within a system (LODESTONE): a novel process to distributed load testing by modeling and simulating user behavior. We specify LODESTONE within the context of a human-machine system to illustrate distributed adaptation and execution in load testing processes. LODESTONE uses log data to generate and update user behavior models, cluster them into similar behavior profiles, and instantiate distributed workload on software systems. We analyze user behavioral data having differing characteristics to replicate human-machine interactions in a modern microservice environment. We discuss tools, algorithms, software design, and implementation in two different computational environments: client-server and cloud-based microservices. We illustrate the advantages of LODESTONE through a qualitative comparison of key feature parameters and experimentation based on shared data and models. LODESTONE continuously adapts to changes in the system to be tested which allows for the integration of load testing into the quality assurance process for cloud-based microservices

    Applications of Multi-Touch Attribution Modelling

    Get PDF
    The Digital landscape has evolved vastly since the early 2000s in terms of analytical tools and tracking software. With the Rise of 4G to 5G, smartphones have become the norm when surfing through the web. New problems arise in terms of measuring business performance like Cross-Channel and Multi-Channel Attribution. Companies are selling more products and services on their Websites and marketplaces than ever before. Brands must become digital natives and translate all of their offline business into the internet. When Brands invest in multiple marketing channels and those channels mix up in the Customer Journey, new measurement problems arise. Based on the current standard methodology on web analytics, companies track their conversions (signups, subscriptions, orders) and assign each channel’s attribution using simple heuristics. In other words, simple decision models. It has been vastly studied that single-touch attribution does not perform well under complex business scenarios like those observed nowadays. Attribution modeling has been a hot topic in the last decade due to the rise of Machine Learning and data mining. Nowadays, there are two current trends. The problem can be analyzed from a Machine Learning standpoint, understanding that it looks like a Classification problem with a Binary Outcome (0/1). On the other hand, Shapley Values and Game theory also adapt efficiently to the question, where every player gets credit for contributing to conversions. Given that there are different state-of-the-art models which perform better than others and that multiple papers are trying to improve robustness, predictive accuracy, interpretability, this thesis will focus primarily on applications and interpretability of the model. Most of today’s Marketing Managers and teams find it extremely hard to use and apply these types of models due to the complexity of the topic and black-box models, which have little to no interpretability. The idea is to encourage more companies into the MTA landscape to test their models and optimize them specifically for their industry in this work. Additionally, to my knowledge, there is no research on Markov Chains applied to Subscription Business Models that are substantially different from E-Commerce Customer Journeys.Por motivos relacionados con los derechos de autor este documento solo puede ser consultado en la Biblioteca Di Tella. Para reservar una cita podés ponerte en contacto con [email protected]. Si sos el autor de esta tesis y querés autorizar su publicación en este repositorio, podés ponerte en contacto con [email protected]

    Knowledge Representation of Requirements Documents Using Natural Language Processing

    Get PDF
    Complex systems such as automotive software systems are usually broken down into subsystems that are specified and developed in isolation and afterwards integrated to provide the functionality of the desired system. This results in a large number of requirements documents for each subsystem written by different people and in different departments. Requirements engineers are challenged by comprehending the concepts mentioned in a requirement because coherent information is spread over several requirements documents. In this paper, we describe a natural language processing pipeline that we developed to transform a set of heterogeneous natural language requirements into a knowledge representation graph. The graph provides an orthogonal view onto the concepts and relations written in the requirements. We provide a first validation of the approach by applying it to two requirements documents including more than 7,000 requirements from industrial systems. We conclude the paper by stating open challenges and potential application of the knowledge representation graph

    Combating Attacks and Abuse in Large Online Communities

    Get PDF
    Internet users today are connected more widely and ubiquitously than ever before. As a result, various online communities are formed, ranging from online social networks (Facebook, Twitter), to mobile communities (Foursquare, Waze), to content/interests based networks (Wikipedia, Yelp, Quora). While users are benefiting from the ease of access to information and social interactions, there is a growing concern for users' security and privacy against various attacks such as spam, phishing, malware infection and identity theft. Combating attacks and abuse in online communities is challenging. First, today’s online communities are increasingly dependent on users and user-generated content. Securing online systems demands a deep understanding of the complex and often unpredictable human behaviors. Second, online communities can easily have millions or even billions of users, which requires the corresponding security mechanisms to be highly scalable. Finally, cybercriminals are constantly evolving to launch new types of attacks. This further demands high robustness of security defenses. In this thesis, we take concrete steps towards measuring, understanding, and defending against attacks and abuse in online communities. We begin with a series of empirical measurements to understand user behaviors in different online services and the uniquesecurity and privacy challenges that users are facing with. This effort covers a broad set of popular online services including social networks for question and answering (Quora), anonymous social networks (Whisper), and crowdsourced mobile communities (Waze). Despite the differences of specific online communities, our study provides a first look at their user activity patterns based on empirical data, and reveals the need for reliable mechanisms to curate user content, protect privacy, and defend against emerging attacks. Next, we turn our attention to attacks targeting online communities, with focus on spam campaigns. While traditional spam is mostly generated by automated software, attackers today start to introduce "human intelligence" to implement attacks. This is maliciouscrowdsourcing (or crowdturfing) where a large group of real-users are organized to carry out malicious campaigns, such as writing fake reviews or spreading rumors on social media. Using collective human efforts, attackers can easily bypass many existing defenses (e.g.,CAPTCHA). To understand the ecosystem of crowdturfing, we first use measurements to examine their detailed campaign organization, workers and revenue. Based on insights from empirical data, we develop effective machine learning classifiers to detect crowdturfingactivities. In the meantime, considering the adversarial nature of crowdturfing, we also build practical adversarial models to simulate how attackers can evade or disrupt machine learning based defenses. To aid in this effort, we next explore using user behavior models to detect a wider range of attacks. Instead of making assumptions about attacker behavior, our idea is to model normal user behaviors and capture (malicious) behaviors that are deviated from norm. In this way, we can detect previously unknown attacks. Our behavior model is based on detailed clickstream data, which are sequences of click events generated by users when using the service. We build a similarity graph where each user is a node and the edges are weightedby clickstream similarity. By partitioning this graph, we obtain "clusters" of users with similar behaviors. We then use a small set of known good users to "color" these clusters to differentiate the malicious ones. This technique has been adopted by real-world social networks (Renren and LinkedIn), and already detected unexpected attacks. Finally, we extend clickstream model to understanding more-grained behaviors of attackers (and real users), and tracking how user behavior changes over time. In summary, this thesis illustrates a data-driven approach to understanding and defending against attacks and abuse in online communities. Our measurements have revealed new insights about how attackers are evolving to bypass existing security defenses today. Inaddition, our data-driven systems provide new solutions for online services to gain a deep understanding of their users, and defend them from emerging attacks and abuse

    Towards A Better Design of Online Marketplaces

    Full text link
    Online markets are staggering in volume and variety. These online marketplaces are transforming lifestyles, expanding the boundaries of conventional businesses, and reshaping labor force structures. To fully realize their potential, online marketplaces must be designed carefully. However, this is a significant challenge. This dissertation studies individual behavior and interactions in online marketplaces, and examines how to enhance efficiency and outcomes of these online marketplaces by providing actionable operational policy recommendations. An important question in the context of open-ended innovative service marketplaces is how to manage information when specifying design problems to achieve better outcomes. Chapter 1 investigates this problem in the context of online crowdsourcing contests where innovation seekers source innovative products (designs) from a crowd of competing solvers (designers). We propose and empirically test a theoretical model featuring different types of information in the problem specification (conceptual objectives, execution guidelines), and the corresponding impact on design processes and submission qualities. We find that, to maximize the best solution quality in crowdsourced design problems, seekers should always provide more execution guidelines, and only a moderate number of conceptual objectives. Building on the same research setting, Chapter 2 looks into another important yet challenging problem---how the innovation seeker should provide interim performance feedback to the solvers in online service marketplaces where seekers and solvers can interact dynamically. In particular, we study whether and when the seeker should provide such interim performance feedback. We empirically examine these research questions using a dataset from a crowdsourcing platform. We develop and estimate a dynamic structural model to understand contestants’ behavior, compare alternative feedback policies using counter-factual simulations, and find providing feedback throughout the contest may not be optimal. The late feedback policy, i.e., providing feedback only in the second half of the contest, leads to a better overall contest outcome. Moving to a wider application, Chapter 3 leverages consumer clickstream information in e-commerce marketplaces to help market organizers improve demand estimation and pricing decisions. These decisions can be challenging, as e-commerce marketplaces offer an astonishing variety of product choices and face extremely diversified consumer decision journeys. We provide a novel solution to these challenges by combining econometric and machine learning (Graphical Lasso) approaches, leveraging customer clickstream information to learn the product correlation network, and creating high-dimensional choice models that easily scale and allow for flexible substitution patterns. Our model offers better in- and out-of-sample demand forecasts and enhanced pricing recommendations in various synthetic datasets and in a real-world empirical setting.PHDBusiness AdministrationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163283/1/jiangzh_1.pd

    Maximizing User Engagement In Short Marketing Campaigns Within An Online Living Lab: A Reinforcement Learning Perspective

    Get PDF
    ABSTRACT MAXIMIZING USER ENGAGEMENT IN SHORT MARKETING CAMPAIGNS WITHIN AN ONLINE LIVING LAB: A REINFORCEMENT LEARNING PERSPECTIVE by ANIEKAN MICHAEL INI-ABASI August 2021 Advisor: Dr. Ratna Babu Chinnam Major: Industrial & Systems Engineering Degree: Doctor of Philosophy User engagement has emerged as the engine driving online business growth. Many firms have pay incentives tied to engagement and growth metrics. These corporations are turning to recommender systems as the tool of choice in the business of maximizing engagement. LinkedIn reported a 40% higher email response with the introduction of a new recommender system. At Amazon 35% of sales originate from recommendations, while Netflix reports that ‘75% of what people watch is from some sort of recommendation,’ with an estimated business value of 1billionperyear.Whiletheleadingcompanieshavebeenquitesuccessfulatharnessingthepowerofrecommenderstoboostuserengagementacrossthedigitalecosystem,smallandmediumbusinesses(SMB)arestrugglingwithdecliningengagementacrossmanychannelsascompetitionforuserattentionintensifies.TheSMBsoftenlackthetechnicalexpertiseandbigdatainfrastructurenecessarytooperationalizerecommendersystems.Thepurposeofthisstudyistoexplorethemethodsofbuildingalearningagentthatcanbeusedtopersonalizeapersuasiverequesttomaximizeuserengagementinadata−efficientsetting.Weframethetaskasasequentialdecision−makingproblem,modelledasMDP,andsolvedusingageneralizedreinforcementlearning(RL)algorithm.Weleverageanapproachthateliminatesoratleastgreatlyreducestheneedformassiveamountsoftrainingdata,thusmovingawayfromapurelydata−drivenapproach.Byincorporatingdomainknowledgefromtheliteratureonpersuasionintothemessagecomposition,weareabletotraintheRLagentinasampleefficientandoperantmanner.Inourmethodology,theRLagentnominatesacandidatefromacatalogofpersuasionprinciplestodrivehigheruserresponseandengagement.ToenabletheeffectiveuseofRLinourspecificsetting,wefirstbuildareducedstatespacerepresentationbycompressingthedatausinganexponentialmovingaveragescheme.AregularizedDQNagentisdeployedtolearnanoptimalpolicy,whichisthenappliedinrecommendingone(oracombination)ofsixuniversalprinciplesmostlikelytotriggerresponsesfromusersduringthenextmessagecycle.Inthisstudy,emailmessagingisusedasthevehicletodeliverpersuasionprinciplestotheuser.Atatimeofdecliningclick−throughrateswithmarketingemails,businessexecutivescontinuetoshowheightenedinterestintheemailchannelowingtohigher−than−usualreturnoninvestmentof1 billion per year. While the leading companies have been quite successful at harnessing the power of recommenders to boost user engagement across the digital ecosystem, small and medium businesses (SMB) are struggling with declining engagement across many channels as competition for user attention intensifies. The SMBs often lack the technical expertise and big data infrastructure necessary to operationalize recommender systems. The purpose of this study is to explore the methods of building a learning agent that can be used to personalize a persuasive request to maximize user engagement in a data-efficient setting. We frame the task as a sequential decision-making problem, modelled as MDP, and solved using a generalized reinforcement learning (RL) algorithm. We leverage an approach that eliminates or at least greatly reduces the need for massive amounts of training data, thus moving away from a purely data-driven approach. By incorporating domain knowledge from the literature on persuasion into the message composition, we are able to train the RL agent in a sample efficient and operant manner. In our methodology, the RL agent nominates a candidate from a catalog of persuasion principles to drive higher user response and engagement. To enable the effective use of RL in our specific setting, we first build a reduced state space representation by compressing the data using an exponential moving average scheme. A regularized DQN agent is deployed to learn an optimal policy, which is then applied in recommending one (or a combination) of six universal principles most likely to trigger responses from users during the next message cycle. In this study, email messaging is used as the vehicle to deliver persuasion principles to the user. At a time of declining click-through rates with marketing emails, business executives continue to show heightened interest in the email channel owing to higher-than-usual return on investment of 42 for every dollar spent when compared to other marketing channels such as social media. Coupled with the state space transformation, our novel regularized Deep Q-learning (DQN) agent was able to train and perform well based on a few observed users’ responses. First, we explored the average positive effect of using persuasion-based messages in a live email marketing campaign, without deploying a learning algorithm to recommend the influence principles. The selection of persuasion tactics was done heuristically, using only domain knowledge. Our results suggest that embedding certain principles of persuasion in campaign emails can significantly increase user engagement for an online business (and have a positive impact on revenues) without putting pressure on marketing or advertising budgets. During the study, the store had a customer retention rate of 76% and sales grew by a half-million dollars from the three field trials combined. The key assumption was that users are predisposed to respond to certain persuasion principles and learning the right principles to incorporate in the message header or body copy would lead to higher response and engagement. With the hypothesis validated, we set forth to build a DQN agent to recommend candidate actions from a catalog of persuasion principles most likely to drive higher engagement in the next messaging cycle. A simulation and a real live campaign are implemented to verify the proposed methodology. The results demonstrate the agent’s superior performance compared to a human expert and a control baseline by a significant margin (~ up to 300%). As the quest for effective methods and tools to maximize user engagement intensifies, our methodology could help to boost user engagement for struggling SMBs without prohibitive increase in costs, by enabling the targeting of messages (with the right persuasion principle) to the right user

    Using clickstream data to analyze online purchase intentions

    Get PDF
    Hoje em dia as técnicas de negócio tradicionais estão ultrapassadas devido à emergência de novos modelos de negócio, nomeadamente no espaço online através da Internet. Este novo espaço de comércio eletrónico difere substancialmente das atividades tradicionais que têm por bases espaços físicos. Assim, torna-se imperativo que as empresas adotem novas estratégias e sejam capazes de compreender as motivações que guiam os compradores online, caso pretendam suceder no competitivo ecossistema virtual.Os logs dos servidores são a principal fonte de informação, sobre os seus utilizadores, que as empresas dispõem. Estes ficheiros contêm detalhes sobre como cada cliente navegou pela loja eletrónica, mais ainda, através destes dados é possível reconstruir a sequência exata das páginas que cada um acedeu. Este tipo de dados, conhecidos como dados de clickstream, são fundamentais para conseguir compreender o comportamento dos utilizadores. Aliás, a análise e exploração desta informação são essenciais para melhorar a relação com os clientes.A análise de dados clickstream permite, acima de tudo, a compreensão de determindas intenções que motivam os utilizadores a realizar determinadas ações. A percentagem de conversão de utilizadores é uma das métricas mais conhecidas e que se relaciona diretamente com as intenções dos mesmos. Durante esta dissertação nós investigamos outro tipo de intenções, nomeadamente, fatores relacionados com os utilizadores que passam a ser compradores e ainda com a probabilidade de compra em tempo real. São utilizados dados concretos, provenientes de uma das maiores empresas europeias na área do retalho alimentar, para alimentar e avaliar diferentes modelos de data mining.Nowadays, traditional business techniques are almost deprecated due to the insurgence of the world of online virtual shopping, the so-called e-commerce. This new, in many ways, uncharted territory poses difficult challenges when it comes to apply marketing techniques especially traditional methods, as these are not effective when dealing with online customers. In this context, it is imperative that companies have a complete in-depth understanding of online behavior in order to succeed within this complex environment in which they compete.The server Web logs of each customer are the main sources of potentially useful information for online stores. These logs contain details on how each customer visited the online store, moreover, it is possible to reconstruct the sequence of accessed pages, the so-called clickstream data. This data is fundamental in depicting each customer's behavior. Analyzing and exploring this behavior is key to improve customer relationship management. The analysis of clickstream data allows for the understanding of customer intentions. One of the most studied measures regards customer conversion, that is, the percentage of customers that will actually perform a purchase during a specific online session. During this dissertation we investigate other relevant intentions, namely, customer purchasing engagement and real-time purchase likelihood. Actual data from a major European online grocery retail store will be used to support and evaluate different data mining models
    • …
    corecore