1,026 research outputs found
Event detection from click-through data via query clustering
The web is an index of real-world events and lot of knowledge can be mined from the web resources and their derivatives. Event detection is one recent research topic triggered from the domain of web data mining with the increasing popularity of search engines. In the visitor-centric approach, the click-through data generated by the web search engines is the start up resource with the intuition: often such data is event-driven. In this thesis, a retrospective algorithm is proposed to detect such real-world events from the click-through data. This approach differs from the existing work as it: (i) considers the click-through data as collaborative query sessions instead of mere web logs and try to understand user behavior (ii) tries to integrate the semantics, structure, and content of queries and pages (iii) aims to achieve the overall objective via Query Clustering. The problem of event detection is transformed into query clustering by generating clusters - hybrid cover graphs; each hybrid cover graph corresponds to a real-world event. The evolutionary pattern for the co-occurrences of query-page pairs in a hybrid cover graph is imposed for the quality purpose over a moving window period. Also, the approach is experimentally evaluated on a commercial search engine\u27s data collected over 3 months with about 20 million web queries and page clicks from 650000 users. The results outperform the most recent work in this domain in terms of number of events detected, F-measures, entropy, recall etc. --Abstract, page iv
Multiscale Snapshots: Visual Analysis of Temporal Summaries in Dynamic Graphs
The overview-driven visual analysis of large-scale dynamic graphs poses a
major challenge. We propose Multiscale Snapshots, a visual analytics approach
to analyze temporal summaries of dynamic graphs at multiple temporal scales.
First, we recursively generate temporal summaries to abstract overlapping
sequences of graphs into compact snapshots. Second, we apply graph embeddings
to the snapshots to learn low-dimensional representations of each sequence of
graphs to speed up specific analytical tasks (e.g., similarity search). Third,
we visualize the evolving data from a coarse to fine-granular snapshots to
semi-automatically analyze temporal states, trends, and outliers. The approach
enables to discover similar temporal summaries (e.g., recurring states),
reduces the temporal data to speed up automatic analysis, and to explore both
structural and temporal properties of a dynamic graph. We demonstrate the
usefulness of our approach by a quantitative evaluation and the application to
a real-world dataset.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to
appea
A primer on provenance
Better understanding data requires tracking its history and context.</jats:p
Shortest Path and Distance Queries on Road Networks: An Experimental Evaluation
Computing the shortest path between two given locations in a road network is
an important problem that finds applications in various map services and
commercial navigation products. The state-of-the-art solutions for the problem
can be divided into two categories: spatial-coherence-based methods and
vertex-importance-based approaches. The two categories of techniques, however,
have not been compared systematically under the same experimental framework, as
they were developed from two independent lines of research that do not refer to
each other. This renders it difficult for a practitioner to decide which
technique should be adopted for a specific application. Furthermore, the
experimental evaluation of the existing techniques, as presented in previous
work, falls short in several aspects. Some methods were tested only on small
road networks with up to one hundred thousand vertices; some approaches were
evaluated using distance queries (instead of shortest path queries), namely,
queries that ask only for the length of the shortest path; a state-of-the-art
technique was examined based on a faulty implementation that led to incorrect
query results. To address the above issues, this paper presents a comprehensive
comparison of the most advanced spatial-coherence-based and
vertex-importance-based approaches. Using a variety of real road networks with
up to twenty million vertices, we evaluated each technique in terms of its
preprocessing time, space consumption, and query efficiency (for both shortest
path and distance queries). Our experimental results reveal the characteristics
of different techniques, based on which we provide guidelines on selecting
appropriate methods for various scenarios.Comment: VLDB201
- …