26 research outputs found

    Stochastic Blockmodeling for Online Advertising

    Full text link
    Online advertising is an important and huge industry. Having knowledge of the website attributes can contribute greatly to business strategies for ad-targeting, content display, inventory purchase or revenue prediction. Classical inferences on users and sites impose challenge, because the data is voluminous, sparse, high-dimensional and noisy. In this paper, we introduce a stochastic blockmodeling for the website relations induced by the event of online user visitation. We propose two clustering algorithms to discover the instrinsic structures of websites, and compare the performance with a goodness-of-fit method and a deterministic graph partitioning method. We demonstrate the effectiveness of our algorithms on both simulation and AOL website dataset

    Statistical learning for predictive targeting in online advertising

    Get PDF

    Visual network storytelling

    Get PDF
    We love networks! Networks are powerful conceptual tools, encapsulating in a single item multiple affordances for computation (networks as graphs), visualization (networks as maps) and manipulation of data (networks as interfaces). In the field of mathematics, graph theory has been around since Euler’s walk on Königsberg’s bridges (Euler 1736). But it is not until the end of the last century that networks acquired a multidisciplinary popularity. Graph computation is certainly powerful, but it is also very demanding and for many years its advantages remained the privilege of scholars with solid mathematical fundamentals. In the last few decades, however, networks acquired a new set of affordances and reached a larger audience, thanks to the growing availability of tools to design them. Drawn on paper or screen, networks became easier to handle and obtained properties that calculation could not express. Far from being merely aesthetic, the graphical representation of networks has an intrinsic hermeneutic value. Networks can become maps and be read as such. Combining the computation power of graphs with the visual expressivity of maps and the interactivity of computer interface, networks can be used in Exploratory Data Analysis (Tukey, 1977). Navigating through data becomes so fluid that zooming in on a single data-point and out to a landscape of a million traces is just a click away. Increasingly specialized software has been designed to support the exploration of network data. Tools like Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek), NetDraw (sites.google.com/site/ netdrawsoftware), Ucinet (www.analytictech.com/ucinet), Guess (graphexploration.cond.org) and more recently Gephi (gephi.org) have progressively smoothed out the difficulties of graph mathematics, turning a complex mathematical formalism into a more user-friendly point-and-click interface (1) . If visual exploration of networks can output to confirmatory statistics, what about sharing one network exploration with others? We developed Manylines (https://github.com/medialab/manylines), a tool allowing you to share the visual analysis of a network with a wide audience by publishing it on the web. With Manylines, you can not only easily publish a network on the web but also share its exploration by describing the network’s visual key findings. Through a set of examples, we will illustrate how the narrative opportunities of Manylines can contribute to the enunciation of a visual grammar of networks. (1) A simple look at the URLs of the subsequent tools reveals the efforts deployed to make network-manipulation tools user-friendly and thereby available to a larger public

    Pattern Recognition on Random Graphs

    Get PDF
    The field of pattern recognition developed significantly in the 1960s, and the field of random graph inference has enjoyed much recent progress in both theory and application. This dissertation focuses on pattern recognition in the context of a particular family of random graphs, namely the stochastic blockmodels, from the two main perspectives of single graph inference and joint graph inference. Single graph inference is the performance of statistical inference on one single observed graph. Given a single graph realized from a stochastic blockmodel, we here consider the specific exploitation tasks of vertex classification, clustering, and nomination. Given an observed graph, vertex classification is the identification of the block labels of test vertices after learning from the training vertices. We propose a robust vertex classifier, which utilizes a representation of a test vertex as a sparse combination of the training vertices. Our proposed classifier is demonstrated to be robust against data contamination, and has superior performance over classical spectral-embedding classifiers in simulation and real data experiments. Vertex clustering groups vertices based on their similarities. We present a model-based clustering algorithm for graphs drawn from a stochastic blockmodel, and illustrate its usefulness on a case study in online advertising. We demonstrate the practical value of our vertex clustering method for efficiently delivering internet advertisements. Under the stochastic blockmodel framework, suppose one block is of particular interest. The task of vertex nomination is to create a nomination list so that vertices from the group of interest are concentrated abundantly near the top of the list. We present several vertex nomination schemes, and propose a vertex nomination scheme that is scalable for large graphs. We demonstrate the effectiveness of our methodology on simulation and real datasets. Next, we consider joint graph inference, which involves the joint space of multiple graphs; in this dissertation, we specifically consider joint graph inference on two graphs. Given two graphs, we consider the tasks of seeded graph matching for large graphs and joint vertex classification. Graph matching is the task of aligning two graphs so as to minimize the number of edge disagreements between them. We propose a scalable graph matching algorithm, which uses a divide-and-conquer approach to scale the state-of-the-art seeded graph matching algorithm to big graph data. We prove theoretical performance guarantees, and demonstrate desired properties such as scalability, robustness, accuracy and runtime in both simulated data and human brain connectome data. Within the joint graph inference framework, we present a case study on the paired chemical and electrical Caenorhabditis elegans neural connectomes. Motivated by the success of seeded graph matching on the paired connectomes, we propose joint vertex classification on the paired connectomes. We demonstrate that joint inference on the paired connectomes yields more accurate results than single inference on either connectome. This serves as a first step for providing a methodological and quantitative approach for understanding the coexistent significance of the chemical and electrical connectomes

    Rethinking the digital transformation using technology space analysis

    Get PDF
    The world is in the midst of a digital transformation. An intensified prevalence and use of digital technologies is fundamentally changing organizations and economies. However, the notion of 'digital transformation' is both theoretically and empirically underspecified. This paper rethinks the digital transformation narrative theoretically by embedding the concept in concurrent debates about technological revolutions and neo-Schumpeterian innovation theory. Empirically, the paper specifies the digital transformation by analysing the technological composition of key start-up and scale-up companies in the knowledge-intensive services sector. Undertaking a technology space analysis of 40,754 start-up and scale-up companies derived from the near real-time Dealroom.co database, we analyse which technologies and application domains are currently converging, distilling of key elements of the digital transformation. The paper concludes that the transmission of digital technologies is often indirect through ‘key enabling technology clusters’ that connect the technological vanguard to application domains
    corecore