1,026 research outputs found

    Event detection from click-through data via query clustering

    Get PDF
    The web is an index of real-world events and lot of knowledge can be mined from the web resources and their derivatives. Event detection is one recent research topic triggered from the domain of web data mining with the increasing popularity of search engines. In the visitor-centric approach, the click-through data generated by the web search engines is the start up resource with the intuition: often such data is event-driven. In this thesis, a retrospective algorithm is proposed to detect such real-world events from the click-through data. This approach differs from the existing work as it: (i) considers the click-through data as collaborative query sessions instead of mere web logs and try to understand user behavior (ii) tries to integrate the semantics, structure, and content of queries and pages (iii) aims to achieve the overall objective via Query Clustering. The problem of event detection is transformed into query clustering by generating clusters - hybrid cover graphs; each hybrid cover graph corresponds to a real-world event. The evolutionary pattern for the co-occurrences of query-page pairs in a hybrid cover graph is imposed for the quality purpose over a moving window period. Also, the approach is experimentally evaluated on a commercial search engine\u27s data collected over 3 months with about 20 million web queries and page clicks from 650000 users. The results outperform the most recent work in this domain in terms of number of events detected, F-measures, entropy, recall etc. --Abstract, page iv

    Multiscale Snapshots: Visual Analysis of Temporal Summaries in Dynamic Graphs

    Full text link
    The overview-driven visual analysis of large-scale dynamic graphs poses a major challenge. We propose Multiscale Snapshots, a visual analytics approach to analyze temporal summaries of dynamic graphs at multiple temporal scales. First, we recursively generate temporal summaries to abstract overlapping sequences of graphs into compact snapshots. Second, we apply graph embeddings to the snapshots to learn low-dimensional representations of each sequence of graphs to speed up specific analytical tasks (e.g., similarity search). Third, we visualize the evolving data from a coarse to fine-granular snapshots to semi-automatically analyze temporal states, trends, and outliers. The approach enables to discover similar temporal summaries (e.g., recurring states), reduces the temporal data to speed up automatic analysis, and to explore both structural and temporal properties of a dynamic graph. We demonstrate the usefulness of our approach by a quantitative evaluation and the application to a real-world dataset.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appea

    A primer on provenance

    Get PDF
    Better understanding data requires tracking its history and context.</jats:p

    Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation

    Full text link
    Computing the shortest path between two given locations in a road network is an important problem that finds applications in various map services and commercial navigation products. The state-of-the-art solutions for the problem can be divided into two categories: spatial-coherence-based methods and vertex-importance-based approaches. The two categories of techniques, however, have not been compared systematically under the same experimental framework, as they were developed from two independent lines of research that do not refer to each other. This renders it difficult for a practitioner to decide which technique should be adopted for a specific application. Furthermore, the experimental evaluation of the existing techniques, as presented in previous work, falls short in several aspects. Some methods were tested only on small road networks with up to one hundred thousand vertices; some approaches were evaluated using distance queries (instead of shortest path queries), namely, queries that ask only for the length of the shortest path; a state-of-the-art technique was examined based on a faulty implementation that led to incorrect query results. To address the above issues, this paper presents a comprehensive comparison of the most advanced spatial-coherence-based and vertex-importance-based approaches. Using a variety of real road networks with up to twenty million vertices, we evaluated each technique in terms of its preprocessing time, space consumption, and query efficiency (for both shortest path and distance queries). Our experimental results reveal the characteristics of different techniques, based on which we provide guidelines on selecting appropriate methods for various scenarios.Comment: VLDB201
    corecore