Search CORE

12,178 research outputs found

Context Trees: Augmenting Geospatial Trajectories with Context

Author: Griffiths Nathan
Sanchez Victor
Thomason Alasdair
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2016
Field of study

Exposing latent knowledge in geospatial trajectories has the potential to provide a better understanding of the movements of individuals and groups. Motivated by such a desire, this work presents the context tree, a new hierarchical data structure that summarises the context behind user actions in a single model. We propose a method for context tree construction that augments geospatial trajectories with land usage data to identify such contexts. Through evaluation of the construction method and analysis of the properties of generated context trees, we demonstrate the foundation for understanding and modelling behaviour afforded. Summarising user contexts into a single data structure gives easy access to information that would otherwise remain latent, providing the basis for better understanding and predicting the actions and behaviours of individuals and groups. Finally, we also present a method for pruning context trees, for use in applications where it is desirable to reduce the size of the tree while retaining useful information

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

SQL Query Completion for Data Exploration

Author: Guilly Marie Le
Petit Jean-Marc
Scuturici Vasile-Marian
Publication venue
Publication date: 07/02/2018
Field of study

Within the big data tsunami, relational databases and SQL are still there and remain mandatory in most of cases for accessing data. On the one hand, SQL is easy-to-use by non specialists and allows to identify pertinent initial data at the very beginning of the data exploration process. On the other hand, it is not always so easy to formulate SQL queries: nowadays, it is more and more frequent to have several databases available for one application domain, some of them with hundreds of tables and/or attributes. Identifying the pertinent conditions to select the desired data, or even identifying relevant attributes is far from trivial. To make it easier to write SQL queries, we propose the notion of SQL query completion: given a query, it suggests additional conditions to be added to its WHERE clause. This completion is semantic, as it relies on the data from the database, unlike current completion tools that are mostly syntactic. Since the process can be repeated over and over again -- until the data analyst reaches her data of interest --, SQL query completion facilitates the exploration of databases. SQL query completion has been implemented in a SQL editor on top of a database management system. For the evaluation, two questions need to be studied: first, does the completion speed up the writing of SQL queries? Second , is the completion easily adopted by users? A thorough experiment has been conducted on a group of 70 computer science students divided in two groups (one with the completion and the other one without) to answer those questions. The results are positive and very promising

arXiv.org e-Print Archive

HAL

Hal-Diderot

Natural hybridization between Populus nigra L. and P. x canadensis Moench. Hybrid offspring competes for niches along the Rhine river in the Netherlands

Author: Arens P.F.P.
Beringen R.
Schoot J., van der
Smulders M.J.M.
Vanden Broeck A.
Volosyanchuk R.
Vosman B.
Publication venue
Publication date: 01/01/2008
Field of study

Black poplar (Populus nigra L.) is a major species for European riparian forests but its abundance has decreased over the decades due to human influences. For restoration of floodplain woodlands, the remaining black poplar stands may act as source population. A potential problem is that P. nigra and Populus deltoides have contributed to many interspecific hybrids, which have been planted in large numbers. As these Populus x canadensis clones have the possibility to intercross with wild P. nigra trees, their offspring could establish themselves along European rivers. In this study, we have sampled 44 poplar seedlings and young trees that occurred spontaneously along the Rhine river and its tributaries in the Netherlands. Along these rivers, only a few native P. nigra L. populations exist in combination with many planted cultivated P. x canadensis trees. By comparison to reference material from P. nigra, P. deltoides and P. x canadensis, species-specific AFLP bands and microsatellite alleles indicated that nearly half of the sampled trees were not pure P. nigra but progeny of natural hybridisation that had colonised the Rhine river banks. The posterior probability method as implemented in NewHybrids using microsatellite data was the superior method in establishing the most likely parentage. The results of this study indicate that offspring of hybrid cultivated poplars compete for the same ecological niche as native black poplars

Wageningen University & Research Publications

Anytime Hierarchical Clustering

Author: Arslan Omur
Koditschek Daniel E.
Publication venue
Publication date: 13/04/2014
Field of study

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Data mining based cyber-attack detection

Author: Tianfield Huaglory
Publication venue
Publication date: 31/05/2017
Field of study

ResearchOnline@GCU

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive