5,185 research outputs found
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
Representing Dataset Quality Metadata using Multi-Dimensional Views
Data quality is commonly defined as fitness for use. The problem of
identifying quality of data is faced by many data consumers. Data publishers
often do not have the means to identify quality problems in their data. To make
the task for both stakeholders easier, we have developed the Dataset Quality
Ontology (daQ). daQ is a core vocabulary for representing the results of
quality benchmarking of a linked dataset. It represents quality metadata as
multi-dimensional and statistical observations using the Data Cube vocabulary.
Quality metadata are organised as a self-contained graph, which can, e.g., be
embedded into linked open datasets. We discuss the design considerations, give
examples for extending daQ by custom quality metrics, and present use cases
such as analysing data versions, browsing datasets by quality, and link
identification. We finally discuss how data cube visualisation tools enable
data publishers and consumers to analyse better the quality of their data.Comment: Preprint of a paper submitted to the forthcoming SEMANTiCS 2014, 4-5
September 2014, Leipzig, German
Constrained set-up of the tGAP structure for progressive vector data transfer
A promising approach to submit a vector map from a server to a mobile client is to send a coarse representation first, which then is incrementally refined. We consider the problem of defining a sequence of such increments for areas of different land-cover classes in a planar partition. In order to submit well-generalised datasets, we propose a method of two stages: First, we create a generalised representation from a detailed dataset, using an optimisation approach that satisfies certain cartographic constraints. Second, we define a sequence of basic merge and simplification operations that transforms the most detailed dataset gradually into the generalised dataset. The obtained sequence of gradual transformations is stored without geometrical redundancy in a structure that builds up on the previously developed tGAP (topological Generalised Area Partitioning) structure. This structure and the algorithm for intermediate levels of detail (LoD) have been implemented in an object-relational database and tested for land-cover data from the official German topographic dataset ATKIS at scale 1:50 000 to the target scale 1:250 000. Results of these tests allow us to conclude that the data at lowest LoD and at intermediate LoDs is well generalised. Applying specialised heuristics the applied optimisation method copes with large datasets; the tGAP structure allows users to efficiently query and retrieve a dataset at a specified LoD. Data are sent progressively from the server to the client: First a coarse representation is sent, which is refined until the requested LoD is reached
Exploring user and system requirements of linked data visualization through a visual dashboard approach
One of the open problems in SemanticWeb research is which tools should be provided to users to explore linked data. This is even more urgent now that massive amount of linked data is being released by governments worldwide. The development of single dedicated visualization applications is increasing, but the problem of exploring unknown linked data to gain a good understanding of what is contained is still open. An effective generic solution must take into account the user’s point of view, their tasks and interaction, as well as the system’s capabilities and the technical constraints the technology imposes. This paper is a first step in understanding the implications of both, user and system by evaluating our dashboard-based approach. Though we observe a high user acceptance of the dashboard approach, our paper also highlights technical challenges arising out of complexities involving current infrastructure that need to be addressed while visualising linked data. In light of the findings, guidelines for the development of linked data visualization (and manipulation) are provided
On-the-fly Historical Handwritten Text Annotation
The performance of information retrieval algorithms depends upon the
availability of ground truth labels annotated by experts. This is an important
prerequisite, and difficulties arise when the annotated ground truth labels are
incorrect or incomplete due to high levels of degradation. To address this
problem, this paper presents a simple method to perform on-the-fly annotation
of degraded historical handwritten text in ancient manuscripts. The proposed
method aims at quick generation of ground truth and correction of inaccurate
annotations such that the bounding box perfectly encapsulates the word, and
contains no added noise from the background or surroundings. This method will
potentially be of help to historians and researchers in generating and
correcting word labels in a document dynamically. The effectiveness of the
annotation method is empirically evaluated on an archival manuscript collection
from well-known publicly available datasets
Big Data Visualization Tools
Data visualization is the presentation of data in a pictorial or graphical
format, and a data visualization tool is the software that generates this
presentation. Data visualization provides users with intuitive means to
interactively explore and analyze data, enabling them to effectively identify
interesting patterns, infer correlations and causalities, and supports
sense-making activities.Comment: This article appears in Encyclopedia of Big Data Technologies,
Springer, 201
- …