12,178 research outputs found
Context Trees: Augmenting Geospatial Trajectories with Context
Exposing latent knowledge in geospatial trajectories has the potential to
provide a better understanding of the movements of individuals and groups.
Motivated by such a desire, this work presents the context tree, a new
hierarchical data structure that summarises the context behind user actions in
a single model. We propose a method for context tree construction that augments
geospatial trajectories with land usage data to identify such contexts. Through
evaluation of the construction method and analysis of the properties of
generated context trees, we demonstrate the foundation for understanding and
modelling behaviour afforded. Summarising user contexts into a single data
structure gives easy access to information that would otherwise remain latent,
providing the basis for better understanding and predicting the actions and
behaviours of individuals and groups. Finally, we also present a method for
pruning context trees, for use in applications where it is desirable to reduce
the size of the tree while retaining useful information
SQL Query Completion for Data Exploration
Within the big data tsunami, relational databases and SQL are still there and
remain mandatory in most of cases for accessing data. On the one hand, SQL is
easy-to-use by non specialists and allows to identify pertinent initial data at
the very beginning of the data exploration process. On the other hand, it is
not always so easy to formulate SQL queries: nowadays, it is more and more
frequent to have several databases available for one application domain, some
of them with hundreds of tables and/or attributes. Identifying the pertinent
conditions to select the desired data, or even identifying relevant attributes
is far from trivial. To make it easier to write SQL queries, we propose the
notion of SQL query completion: given a query, it suggests additional
conditions to be added to its WHERE clause. This completion is semantic, as it
relies on the data from the database, unlike current completion tools that are
mostly syntactic. Since the process can be repeated over and over again --
until the data analyst reaches her data of interest --, SQL query completion
facilitates the exploration of databases. SQL query completion has been
implemented in a SQL editor on top of a database management system. For the
evaluation, two questions need to be studied: first, does the completion speed
up the writing of SQL queries? Second , is the completion easily adopted by
users? A thorough experiment has been conducted on a group of 70 computer
science students divided in two groups (one with the completion and the other
one without) to answer those questions. The results are positive and very
promising
Natural hybridization between Populus nigra L. and P. x canadensis Moench. Hybrid offspring competes for niches along the Rhine river in the Netherlands
Black poplar (Populus nigra L.) is a major species for European riparian forests but its abundance has decreased over the decades due to human influences. For restoration of floodplain woodlands, the remaining black poplar stands may act as source population. A potential problem is that P. nigra and Populus deltoides have contributed to many interspecific hybrids, which have been planted in large numbers. As these Populus x canadensis clones have the possibility to intercross with wild P. nigra trees, their offspring could establish themselves along European rivers. In this study, we have sampled 44 poplar seedlings and young trees that occurred spontaneously along the Rhine river and its tributaries in the Netherlands. Along these rivers, only a few native P. nigra L. populations exist in combination with many planted cultivated P. x canadensis trees. By comparison to reference material from P. nigra, P. deltoides and P. x canadensis, species-specific AFLP bands and microsatellite alleles indicated that nearly half of the sampled trees were not pure P. nigra but progeny of natural hybridisation that had colonised the Rhine river banks. The posterior probability method as implemented in NewHybrids using microsatellite data was the superior method in establishing the most likely parentage. The results of this study indicate that offspring of hybrid cultivated poplars compete for the same ecological niche as native black poplars
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
- …