Search CORE

327 research outputs found

Statistical learning for predictive targeting in online advertising

Author: Fruergaard Bjarne Ørum
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Recommended from our members

The Cost of Sharing Information in a Social World

Author: Ramachandran Arthi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

With the increasing prevalence of large scale online social networks, the field has evolved from studying small scale networks and interactions to massive ones that encompass huge fractions of the world’s population. While many methods focus on techniques at scale applied to a single domain, methods that apply techniques across multiple domains are becoming increasingly important. These methods rely on understanding the complex relationships in the data. In the context of social networks, the big data available allows us to better model and analyze the flow of information within the network. The first part of this thesis discusses methods to more effectively learn and predict in a social network by leveraging information across multiple domains and types of data. We document a method to identify users from their access to content in a network and their click behavior. Even on a macro level, click behavior is often hard to obtain. We describe a technique to predict click behavior using other public information about the social network. Communication within a network inevitably has some bias that can be attributed to individual preferences and quality as well as the underlying structure of the network. The second part of the thesis characterizes the structural bias in a network by modeling the underlying information flow as a commodity of trade

Columbia University Academic Commons

Crowdsourcing with Sparsely Interacting Workers

Author: Ma Yao
Olshevsky Alex
Saligrama Venkatesh
Szepesvari Csaba
Publication venue
Publication date: 01/01/2017
Field of study

We consider estimation of worker skills from worker-task interaction data (with unknown labels) for the single-coin crowd-sourcing binary classification model in symmetric noise. We define the (worker) interaction graph whose nodes are workers and an edge between two nodes indicates whether or not the two workers participated in a common task. We show that skills are asymptotically identifiable if and only if an appropriate limiting version of the interaction graph is irreducible and has odd-cycles. We then formulate a weighted rank-one optimization problem to estimate skills based on observations on an irreducible, aperiodic interaction graph. We propose a gradient descent scheme and show that for such interaction graphs estimates converge asymptotically to the global minimum. We characterize noise robustness of the gradient scheme in terms of spectral properties of signless Laplacians of the interaction graph. We then demonstrate that a plug-in estimator based on the estimated skills achieves state-of-art performance on a number of real-world datasets. Our results have implications for rank-one matrix completion problem in that gradient descent can provably recover

W \times W

rank-one matrices based on

W+1

off-diagonal observations of a connected graph with a single odd-cycle

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

固有値分解とテンソル分解を用いた大規模グラフデータ分析に関する研究

Author: Maruhashi Koji
丸橋弘治
Publication venue
Publication date: 01/01/2014
Field of study

筑波大学 (University of Tsukuba)201

Tsukuba Repository

Efficient inference algorithms for network activities

Author: Tran Long Quoc
Publication venue: Georgia Institute of Technology
Publication date: 08/06/2015
Field of study

The real social network and associated communities are often hidden under the declared friend or group lists in social networks. We usually observe the manifestation of these hidden networks and communities in the form of recurrent and time-stamped individuals' activities in the social network. The inference of relationship between users/nodes or groups of users/nodes could be further complicated when activities are interval-censored, that is, when one only observed the number of activities that occurred in certain time windows. The same phenomenon happens in the online advertisement world where the advertisers often offer a set of advertisement impressions and observe a set of conversions (i.e. product/service adoption). In this case, the advertisers desire to know which advertisements best appeal to the customers and most importantly, their rate of conversions. Inspired by these challenges, we investigated inference algorithms that efficiently recover user relationships in both cases: time-stamped data and interval-censored data. In case of time-stamped data, we proposed a novel algorithm called NetCodec, which relies on a Hawkes process that models the intertwine relationship between group participation and between-user influence. Using Bayesian variational principle and optimization techniques, NetCodec could infer both group participation and user influence simultaneously with iteration complexity being O((N+I)G), where N is the number of events, I is the number of users, and G is the number of groups. In case of interval-censored data, we proposed a Monte-Carlo EM inference algorithm where we iteratively impute the time-stamped events using a Poisson process that has intensity function approximates the underlying intensity function. We show that that proposed simulated approach delivers better inference performance than baseline methods. In the advertisement problem, we propose a Click-to-Conversion delay model that uses Hawkes processes to model the advertisement impressions and thinned Poisson processes to model the Click-to-Conversion mechanism. We then derive an efficient Maximum Likelihood Estimator which utilizes the Minorization-Maximization framework. We verify the model against real life online advertisement logs in comparison with recent conversion rate estimation methods. To facilitate reproducible research, we also developed an open-source software package that focuses on various Hawkes processes proposed in the above mentioned works and prior works. We provided efficient parallel (multi-core) implementations of the inference algorithms using the Bayesian variational inference framework. To further speed up these inference algorithms, we also explored distributed optimization techniques for convex optimization under the distributed data situation. We formulate this problem as a consensus-constrained optimization problem and solve it with the alternating direction method for multipliers (ADMM). It turns out that using bipartite graph as communication topology exhibits the fastest convergence.Ph.D

Scholarly Materials And Research @ Georgia Tech

Estimating user interaction probability for non-guaranteed display advertising

Author: Williams Alan
Publication venue: University of Canterbury
Publication date: 01/01/2014
Field of study

Billions of advertisements are displayed to internet users every hour, a market worth approximately $110 billion in 2013. The process of displaying advertisements to internet users is managed by advertising exchanges, automated systems which match advertisements to users while balancing conflicting advertiser, publisher, and user objectives. Real-time bidding is a recent development in the online advertising industry that allows more than one exchange (or demand-side platform) to bid for the right to deliver an ad to a specific user while that user is loading a webpage, creating a liquid market for ad impressions. Real-time bidding accounted for around 10% of the German online advertising market in late 2013, a figure which is growing at an annual rate of around 40%. In this competitive market, accurately calculating the expected value of displaying an ad to a user is essential for profitability. In this thesis, we develop a system that significantly improves the existing method for estimating the value of displaying an ad to a user in a German advertising exchange and demand-side platform. The most significant calculation in this system is estimating the probability of a user interacting with an ad in a given context. We first implement a hierarchical main-effects and latent factor model which is similar enough to the existing exchange system to allow a simple and robust upgrade path, while improving performance substantially. We then use regularized generalized linear models to estimate the probability of an ad interaction occurring following an individual user impression event. We build a system capable of training thousands of campaign models daily, handling over 300 million events per day, 18 million recurrent users, and thousands of model dimensions. Together, these systems improve on the log-likelihood of the existing method by over 10%. We also provide an overview of the real-time bidding market microstructure in the German real- time bidding market in September and November 2013, and indicate potential areas for exploiting competitors’ behaviour, including building user features from real-time bid responses. Finally, for personal interest, we experiment with scalable k-nearest neighbour search algorithms, nonlinear dimension reduction, manifold regularization, graph clustering, and stochastic block model inference using the large datasets from the linear model

UC Research Repository

Quantum circuits with many photons on a programmable nanophotonic chip

Author: Arrazola J. M.
Bergholm V.
Bromley T. R.
Brádler K.
Collins M. J.
Dhand I.
Fumagalli A.
Gerrits T.
Goussev A.
Helt L. G.
Hundal J.
Isacsson T.
Israel R. B.
Izaac J.
Jahangiri S.
Janik R.
Killoran N.
Kumar S. P.
Lavoie J.
Lita A. E.
Mahler D. H.
Menotti M.
Morrison B.
Nam S. W.
Neuhaus L.
Qi H. Y.
Quesada N.
Repingon A.
Sabapathy K. K.
Schuld M.
Su D.
Swinarton J.
Száva A.
Tan K.
Tan P.
Vaidya V. D.
Vernon Z.
Zabaneh Z.
Zhang Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Growing interest in quantum computing for practical applications has led to a surge in the availability of programmable machines for executing quantum algorithms. Present day photonic quantum computers have been limited either to non-deterministic operation, low photon numbers and rates, or fixed random gate sequences. Here we introduce a full-stack hardware-software system for executing many-photon quantum circuits using integrated nanophotonics: a programmable chip, operating at room temperature and interfaced with a fully automated control system. It enables remote users to execute quantum algorithms requiring up to eight modes of strongly squeezed vacuum initialized as two-mode squeezed states in single temporal modes, a fully general and programmable four-mode interferometer, and genuine photon number-resolving readout on all outputs. Multi-photon detection events with photon numbers and rates exceeding any previous quantum optical demonstration on a programmable device are made possible by strong squeezing and high sampling rates. We verify the non-classicality of the device output, and use the platform to carry out proof-of-principle demonstrations of three quantum algorithms: Gaussian boson sampling, molecular vibronic spectra, and graph similarity

arXiv.org e-Print Archive

PolyPublie

Measuring Collective Attention in Online Content: Sampling, Engagement, and Network Effects

Author: Wu Siqi
Publication venue
Publication date: 01/01/2021
Field of study

The production and consumption of online content have been increasing rapidly, whereas human attention is a scarce resource. Understanding how the content captures collective attention has become a challenge of growing importance. In this thesis, we tackle this challenge from three fronts -- quantifying sampling effects of social media data; measuring engagement behaviors towards online content; and estimating network effects induced by the recommender systems. Data sampling is a fundamental problem. To obtain a list of items, one common method is sampling based on the item prevalence in social media streams. However, social data is often noisy and incomplete, which may affect the subsequent observations. For each item, user behaviors can be conceptualized as two steps -- the first step is relevant to the content appeal, measured by the number of clicks; the second step is relevant to the content quality, measured by the post-clicking metrics, e.g., dwell time, likes, or comments. We categorize online attention (behaviors) into two classes: popularity (clicking) and engagement (watching, liking, or commenting). Moreover, modern platforms use recommender systems to present the users with a tailoring content display for maximizing satisfaction. The recommendation alters the appeal of an item by changing its ranking, and consequently impacts its popularity. Our research is enabled by the data available from the largest video hosting site YouTube. We use YouTube URLs shared on Twitter as a sampling protocol to obtain a collection of videos, and we track their prevalence from 2015 to 2019. This method creates a longitudinal dataset consisting of more than 5 billion tweets. Albeit the volume is substantial, we find Twitter still subsamples the data. Our dataset covers about 80% of all tweets with YouTube URLs. We present a comprehensive measurement study of the Twitter sampling effects across different timescales and different subjects. We find that the volume of missing tweets can be estimated by Twitter rate limit messages, true entity ranking can be inferred based on sampled observations, and sampling compromises the quality of network and diffusion models. Next, we present the first large-scale measurement study of how users collectively engage with YouTube videos. We study the time and percentage of each video being watched. We propose a duration-calibrated metric, called relative engagement, which is correlated with recognized notion of content quality, stable over time, and predictable even before a video's upload. Lastly, we examine the network effects induced by the YouTube recommender system. We construct the recommendation network for 60,740 music videos from 4,435 professional artists. An edge indicates that the target video is recommended on the webpage of source video. We discover the popularity bias -- videos are disproportionately recommended towards more popular videos. We use the bow-tie structure to characterize the network and find that the largest strongly connected component consists of 23.1% of videos while occupying 82.6% of attention. We also build models to estimate the latent influence between videos and artists. By taking into account the network structure, we can predict video popularity 9.7% better than other baselines. Altogether, we explore the collective consuming patterns of human attention towards online content. Methods and findings from this thesis can be used by content producers, hosting sites, and online users alike to improve content production, advertising strategies, and recommender systems. We expect our new metrics, methods, and observations can generalize to other multimedia platforms such as the music streaming service Spotify

The Australian National University