    You, the Web and Your Device: Longitudinal Characterization of Browsing Habits

    Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is however challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This paper presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies (anonymized clickstreams are available to the public at http://bigdata.polito.it/clickstream).Comment: 30 page

    The Interaction Between Spatial Working and Reference Memory in Rats on a Radial Maze

    The present study investigated the interaction between working and reference spatial memory in an effort to develop an animal model of this interaction. Twelve male Long-Evans rats were tested on an eight-arm radial maze in a two-phase procedure. In the study phase, a rat was allowed to enter four randomly selected arms for a food reward placed at the end of each arm. The test phase allowed the rat access to all eight arms, but only the previously unentered arms contained food. Two of the correct test arms were defined as reference memory arms because they were always correct. The other two correct test arms were defined as working memory arms because they varied randomly among trials. The percentage of correct working memory and reference memory arm entries made in the first 4 choices in the test phase were recorded to find out if rats showed better working memory or reference memory and in what order they chose to visit working and reference memory arms under a variety of conditions. Further research will use this model to analyze what interventions can reduce the type of confusion in working and reference memory seen in human memory impairments

    Generalized Low Rank Models

    Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kk-means, kk-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.Comment: 84 pages, 19 figure

    State Management for Efficient Event Pattern Detection

    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    Scalable visual analytics over voluminous spatiotemporal data

    2018 Fall.Includes bibliographical references.Visualization is a critical part of modern data analytics. This is especially true of interactive and exploratory visual analytics, which encourages speedy discovery of trends, patterns, and connections in data by allowing analysts to rapidly change what data is displayed and how it is displayed. Unfortunately, the explosion of data production in recent years has led to problems of scale as storage, processing, querying, and visualization have struggled to keep pace with data volumes. Visualization of spatiotemporal data pose unique challenges, thanks in part to high-dimensionality in the input feature space, interactions between features, and the production of voluminous, high-resolution outputs. In this dissertation, we address challenges associated with supporting interactive, exploratory visualization of voluminous spatiotemporal datasets and underlying phenomena. This requires the visualization of millions of entities and changes to these entities as the spatiotemporal phenomena unfolds. The rendering and propagation of spatiotemporal phenomena must be both accurate and timely. Key contributions of this dissertation include: 1) the temporal and spatial coupling of spatially localized models to enable the visualization of phenomena at far greater geospatial scales; 2) the ability to directly compare and contrast diverging spatiotemporal outcomes that arise from multiple exploratory "what-if" queries; and 3) the computational framework required to support an interactive user experience in a heavily resource-constrained environment. We additionally provide support for collaborative and competitive exploration with multiple synchronized clients

    A Partially Randomized Approach to Trajectory Planning and Optimization for Mobile Robots with Flat Dynamics

    Motion planning problems are characterized by huge search spaces and complex obstacle structures with no concise mathematical expression. The fixed-wing airplane application considered in this thesis adds differential constraints and point-wise bounds, i. e. an infinite number of equality and inequality constraints. An optimal trajectory planning approach is presented, based on the randomized Rapidly-exploring Random Trees framework (RRT*). The local planner relies on differential flatness of the equations of motion to obtain tree branch candidates that automatically satisfy the differential constraints. Flat output trajectories, in this case equivalent to the airplane's flight path, are designed using Bézier curves. Segment feasibility in terms of point-wise inequality constraints is tested by an indicator integral, which is evaluated alongside the segment cost functional. Although the RRT* guarantees optimality in the limit of infinite planning time, it is argued by intuition and experimentation that convergence is not approached at a practically useful rate. Therefore, the randomized planner is augmented by a deterministic variational optimization technique. To this end, the optimal planning task is formulated as a semi-infinite optimization problem, using the intermediate result of the RRT(*) as an initial guess. The proposed optimization algorithm follows the feasible flavor of the primal-dual interior point paradigm. Discretization of functional (infinite) constraints is deferred to the linear subproblems, where it is realized implicitly by numeric quadrature. An inherent numerical ill-conditioning of the method is circumvented by a reduction-like approach, which tracks active constraint locations by introducing new problem variables. Obstacle avoidance is achieved by extending the line search procedure and dynamically adding obstacle-awareness constraints to the problem formulation. Experimental evaluation confirms that the hybrid approach is practically feasible and does indeed outperform RRT*'s built-in optimization mechanism, but the computational burden is still significant.Bewegungsplanungsaufgaben sind typischerweise gekennzeichnet durch umfangreiche Suchräume, deren vollständige Exploration nicht praktikabel ist, sowie durch unstrukturierte Hindernisse, für die nur selten eine geschlossene mathematische Beschreibung existiert. Bei der in dieser Arbeit betrachteten Anwendung auf Flächenflugzeuge kommen differentielle Randbedingungen und beschränkte Systemgrößen erschwerend hinzu. Der vorgestellte Ansatz zur optimalen Trajektorienplanung basiert auf dem Rapidly-exploring Random Trees-Algorithmus (RRT*), welcher die Suchraumkomplexität durch Randomisierung beherrschbar macht. Der spezifische Beitrag ist eine Realisierung des lokalen Planers zur Generierung der Äste des Suchbaums. Dieser erfordert ein flaches Bewegungsmodell, sodass differentielle Randbedingungen automatisch erfüllt sind. Die Trajektorien des flachen Ausgangs, welche im betrachteten Beispiel der Flugbahn entsprechen, werden mittels Bézier-Kurven entworfen. Die Einhaltung der Ungleichungsnebenbedingungen wird durch ein Indikator-Integral überprüft, welches sich mit wenig Zusatzaufwand parallel zum Kostenfunktional berechnen lässt. Zwar konvergiert der RRT*-Algorithmus (im probabilistischen Sinne) zu einer optimalen Lösung, jedoch ist die Konvergenzrate aus praktischer Sicht unbrauchbar langsam. Es ist daher naheliegend, den Planer durch ein gradientenbasiertes lokales Optimierungsverfahren mit besseren Konvergenzeigenschaften zu unterstützen. Hierzu wird die aktuelle Zwischenlösung des Planers als Initialschätzung für ein kompatibles semi-infinites Optimierungsproblem verwendet. Der vorgeschlagene Optimierungsalgorithmus erweitert das verbreitete innere-Punkte-Konzept (primal dual interior point method) auf semi-infinite Probleme. Eine explizite Diskretisierung der funktionalen Ungleichungsnebenbedingungen ist nicht erforderlich, denn diese erfolgt implizit durch eine numerische Integralauswertung im Rahmen der linearen Teilprobleme. Da die Methode an Stellen aktiver Nebenbedingungen nicht wohldefiniert ist, kommt zusätzlich eine Variante des Reduktions-Ansatzes zum Einsatz, bei welcher der Vektor der Optimierungsvariablen um die (endliche) Menge der aktiven Indizes erweitert wird. Weiterhin wurde eine Kollisionsvermeidung integriert, die in den Teilschritt der Liniensuche eingreift und die Problemformulierung dynamisch um Randbedingungen zur lokalen Berücksichtigung von Hindernissen erweitert. Experimentelle Untersuchungen bestätigen, dass die Ergebnisse des hybriden Ansatzes aus RRT(*) und numerischem Optimierungsverfahren der klassischen RRT*-basierten Trajektorienoptimierung überlegen sind. Der erforderliche Rechenaufwand ist zwar beträchtlich, aber unter realistischen Bedingungen praktisch beherrschbar

    Searching for patterns in Conway's Game of Life

    Conway’s Game of Life (Life) is a simple cellular automaton, discovered by John Conway in 1970, that exhibits complex emergent behavior. Life-enthusiasts have been looking for building blocks with specific properties (patterns) to answer unsolved problems in Life for the past five decades. Finding patterns in Life is difficult due to the large search space. Current search algorithms use an explorative approach based on the rules of the game, but this can only sample a small fraction of the search space. More recently, people have used Sat solvers to search for patterns. These solvers are not specifically tuned to this problem and thus waste a lot of time processing Life’s rules in an engine that does not understand them. We propose a novel Sat-based approach that replaces the binary tree used by traditional Sat solvers with a grid-based approach, complemented by an injection of Game of Life specific knowledge. This leads to a significant speedup in searching. As a fortunate side effect, our solver can be generalized to solve general Sat problems. Because it is grid-based, all manipulations are embarrassingly parallel, allowing implementation on massively parallel hardware

    Policy-Based Planning for Robust Robot Navigation

    This thesis proposes techniques for constructing and implementing an extensible navigation framework suitable for operating alongside or in place of traditional navigation systems. Robot navigation is only possible when many subsystems work in tandem such as localization and mapping, motion planning, control, and object tracking. Errors in any one of these subsystems can result in the robot failing to accomplish its task, oftentimes requiring human interventions that diminish the benefits theoretically provided by autonomous robotic systems. Our first contribution is Direction Approximation through Random Trials (DART), a method for generating human-followable navigation instructions optimized for followability instead of traditional metrics such as path length. We show how this strategy can be extended to robot navigation planning, allowing the robot to compute the sequence of control policies and switching conditions maximizing the likelihood with which the robot will reach its goal. This technique allows robots to select plans based on reliability in addition to efficiency, avoiding error-prone actions or areas of the environment. We also show how DART can be used to build compact, topological maps of its environments, offering opportunities to scale to larger environments. DART depends on the existence of a set of behaviors and switching conditions describing ways the robot can move through an environment. In the remainder of this thesis, we present methods for learning these behaviors and conditions in indoor environments. To support landmark-based navigation, we show how to train a Convolutional Neural Network (CNN) to distinguish between semantically labeled 2D occupancy grids generated from LIDAR data. By providing the robot the ability to recognize specific classes of places based on human labels, not only do we support transitioning between control laws, but also provide hooks for human-aided instruction and direction. Additionally, we suggest a subset of behaviors that provide DART with a sufficient set of actions to navigate in most indoor environments and introduce a method to learn these behaviors from teleloperated demonstrations. Our method learns a cost function suitable for integration into gradient-based control schemes. This enables the robot to execute behaviors in the absence of global knowledge. We present results demonstrating these behaviors working in several environments with varied structure, indicating that they generalize well to new environments. This work was motivated by the weaknesses and brittleness of many state-of-the-art navigation systems. Reliable navigation is the foundation of any mobile robotic system. It provides access to larger work spaces and enables a wide variety of tasks. Even though navigation systems have continued to improve, catastrophic failures can still occur (e.g. due to an incorrect loop closure) that limit their reliability. Furthermore, as work areas approach the scale of kilometers, constructing and operating on precise localization maps becomes expensive. These limitations prevent large scale deployments of robots outside of controlled settings and laboratory environments. The work presented in this thesis is intended to augment or replace traditional navigation systems to mitigate concerns about scalability and reliability by considering the effects of navigation failures for particular actions. By considering these effects when evaluating the actions to take, our framework can adapt navigation strategies to best take advantage of the capabilities of the robot in a given environment. A natural output of our framework is a topological network of actions and switching conditions, providing compact representations of work areas suitable for fast, scalable planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144073/1/rgoeddel_1.pd
