15,915 research outputs found
Interactive Constrained Association Rule Mining
We investigate ways to support interactive mining sessions, in the setting of
association rule mining. In such sessions, users specify conditions (queries)
on the associations to be generated. Our approach is a combination of the
integration of querying conditions inside the mining phase, and the incremental
querying of already generated associations. We present several concrete
algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second
International Conference on Knowledge Discovery and Data Mining (DaWaK 2000
Distributed System Contract Monitoring
The use of behavioural contracts, to specify, regulate and verify systems, is
particularly relevant to runtime monitoring of distributed systems. System
distribution poses major challenges to contract monitoring, from
monitoring-induced information leaks to computation load balancing,
communication overheads and fault-tolerance. We present mDPi, a location-aware
process calculus, for reasoning about monitoring of distributed systems. We
define a family of Labelled Transition Systems for this calculus, which allow
formal reasoning about different monitoring strategies at different levels of
abstractions. We also illustrate the expressivity of the calculus by showing
how contracts in a simple contract language can be synthesised into different
mDPi monitors.Comment: In Proceedings FLACOS 2011, arXiv:1109.239
A Data Transformation System for Biological Data Sources
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data
A platform for discovering and sharing confidential ballistic crime data.
Criminal investigations generate large volumes of complex data that detectives have to analyse and understand. This data tends to be "siloed" within individual jurisdictions and re-using it in other investigations can be difficult. Investigations into trans-national crimes are hampered by the problem of discovering relevant data held by agencies in other countries and of sharing those data. Gun-crimes are one major type of incident that showcases this: guns are easily moved across borders and used in multiple crimes but finding that a weapon was used elsewhere in Europe is difficult. In this paper we report on the Odyssey Project, an EU-funded initiative to mine, manipulate and share data about weapons and crimes. The project demonstrates the automatic combining of data from disparate repositories for cross-correlation and automated analysis. The data arrive from different cultural/domains with multiple reference models using real-time data feeds and historical databases
Dealing with temporal inconsistency in automated computer forensic profiling
Computer profiling is the automated forensic examination of a computer system in order to provide a human investigator with a characterisation of the activities that have taken place on that system. As part of this process, the logical components of the computer system – components such as users, files and applications - are enumerated and the relationships between them discovered and reported. This information is enriched with traces of historical activity drawn from system logs and from evidence of events found in the computer file system. A potential problem with the use of such information is that some of it may be inconsistent and contradictory thus compromising its value. This work examines the impact of temporal inconsistency in such information and discusses two types of temporal inconsistency that may arise – inconsistency arising out of the normal errant behaviour of a computer system, and inconsistency arising out of deliberate tampering by a suspect – and techniques for dealing with inconsistencies of the latter kind. We examine the impact of deliberate tampering through experiments conducted with prototype computer profiling software. Based on the results of these experiments, we discuss techniques which can be employed in computer profiling to deal with such temporal inconsistencies
A Requirement-centric Approach to Web Service Modeling, Discovery, and Selection
Service-Oriented Computing (SOC) has gained considerable popularity for implementing Service-Based Applications (SBAs) in a flexible\ud
and effective manner. The basic idea of SOC is to understand users'\ud
requirements for SBAs first, and then discover and select relevant\ud
services (i.e., that fit closely functional requirements) and offer\ud
a high Quality of Service (QoS). Understanding usersÂ’ requirements\ud
is already achieved by existing requirement engineering approaches\ud
(e.g., TROPOS, KAOS, and MAP) which model SBAs in a requirement-driven\ud
manner. However, discovering and selecting relevant and high QoS\ud
services are still challenging tasks that require time and effort\ud
due to the increasing number of available Web services. In this paper,\ud
we propose a requirement-centric approach which allows: (i) modeling\ud
usersÂ’ requirements for SBAs with the MAP formalism and specifying\ud
required services using an Intentional Service Model (ISM); (ii)\ud
discovering services by querying the Web service search engine Service-Finder\ud
and using keywords extracted from the specifications provided by\ud
the ISM; and(iii) selecting automatically relevant and high QoS services\ud
by applying Formal Concept Analysis (FCA). We validate our approach\ud
by performing experiments on an e-books application. The experimental\ud
results show that our approach allows the selection of relevant and\ud
high QoS services with a high accuracy (the average precision is\ud
89.41%) and efficiency (the average recall is 95.43%)
Knowledge Rich Natural Language Queries over Structured Biological Databases
Increasingly, keyword, natural language and NoSQL queries are being used for
information retrieval from traditional as well as non-traditional databases
such as web, document, image, GIS, legal, and health databases. While their
popularity are undeniable for obvious reasons, their engineering is far from
simple. In most part, semantics and intent preserving mapping of a well
understood natural language query expressed over a structured database schema
to a structured query language is still a difficult task, and research to tame
the complexity is intense. In this paper, we propose a multi-level
knowledge-based middleware to facilitate such mappings that separate the
conceptual level from the physical level. We augment these multi-level
abstractions with a concept reasoner and a query strategy engine to dynamically
link arbitrary natural language querying to well defined structured queries. We
demonstrate the feasibility of our approach by presenting a Datalog based
prototype system, called BioSmart, that can compute responses to arbitrary
natural language queries over arbitrary databases once a syntactic
classification of the natural language query is made
vSPARQL: A View Definition Language for the Semantic Web
Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages
ExoData: A python package to handle large exoplanet catalogue data
Exoplanet science often involves using the system parameters of real
exoplanets for tasks such as simulations, fitting routines, and target
selection for proposals. Several exoplanet catalogues are already well
established but often lack a version history and code friendly interfaces.
Software that bridges the barrier between the catalogues and code enables users
to improve the specific repeatability of results by facilitating the retrieval
of exact system parameters used in an articles results along with unifying the
equations and software used. As exoplanet science moves towards large data,
gone are the days where researchers can recall the current population from
memory. An interface able to query the population now becomes invaluable for
target selection and population analysis. ExoData is a Python interface and
exploratory analysis tool for the Open Exoplanet Catalogue. It allows the
loading of exoplanet systems into Python as objects (Planet, Star, Binary etc)
from which common orbital and system equations can be calculated and measured
parameters retrieved. This allows researchers to use tested code of the common
equations they require (with units) and provides a large science input
catalogue of planets for easy plotting and use in research. Advanced querying
of targets are possible using the database and Python programming language.
ExoData is also able to parse spectral types and fill in missing parameters
according to programmable specifications and equations. Examples of use cases
are integration of equations into data reduction pipelines, selecting planets
for observing proposals and as an input catalogue to large scale simulation and
analysis of planets.Comment: 22 pages, 3 figures, 9 tables. Accepted by Computer Physics
Communication
- …