13,545 research outputs found
Adaptive development and maintenance of user-centric software systems
A software system cannot be developed without considering the various facets of its environment. Stakeholders – including the users that play a central role – have their needs, expectations, and perceptions of a system. Organisational and technical aspects of the environment are constantly changing. The ability to adapt a software system and its requirements to its environment throughout its
full lifecycle is of paramount importance in a constantly changing environment. The continuous involvement of users is as important as the constant evaluation of the system and the observation of evolving environments. We present a methodology for adaptive software systems development and
maintenance. We draw upon a diverse range of accepted methods including participatory design, software architecture, and evolutionary design. Our focus is on user-centred software systems
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Observation Centric Sensor Data Model
Management of sensor data requires metadata to understand the semantics of observations. While e-science researchers have high demands on metadata, they are selective in entering metadata. The claim in this paper is to focus on the essentials, i.e., the actual observations being described by location, time, owner, instrument, and measurement. The applicability of this approach is demonstrated in two very different case studies
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
Declarative Cleaning, Analysis, and Querying of Graph-structured Data
Much of today's data including social, biological, sensor, computer, and transportation network data is naturally modeled and represented by graphs. Typically, data describing these networks is observational, and thus noisy and incomplete. Therefore, methods for efficiently managing graph-structured data of this nature are needed, especially with the abundance and increasing sizes of such data. In my dissertation, I develop declarative methods to perform cleaning, analysis and querying of graph-structured data efficiently. For declarative cleaning of graph-structured data, I identify a set of primitives to support the extraction and inference of the underlying true network from observational data, and describe a framework that enables a network analyst to easily implement and combine new extraction and cleaning techniques. The task specification language is based on Datalog with a set of extensions designed to enable different graph cleaning primitives. For declarative analysis, I introduce 'ego-centric pattern census queries', a new type of graph analysis query that supports searching for structural patterns in every node's neighborhood and reporting their counts for further analysis. I define an SQL-based declarative language to support this class of queries, and develop a series of efficient query evaluation algorithms for it. Finally, I present an approach for querying large uncertain graphs that supports reasoning about uncertainty of node attributes, uncertainty of edge existence, and a new type of uncertainty, called identity linkage uncertainty, where a group of nodes can potentially refer to the same real-world entity. I define a probabilistic graph model to capture all these types of uncertainties, and to resolve identity linkage merges. I propose 'context-aware path indexing' and 'join-candidate reduction' methods to efficiently enable subgraph matching queries over large uncertain graphs of this type
- …