2,472 research outputs found
An Integrated Framework for Treebanks and Multilayer Annotations
Treebank formats and associated software tools are proliferating rapidly,
with little consideration for interoperability. We survey a wide variety of
treebank structures and operations, and show how they can be mapped onto the
annotation graph model, and leading to an integrated framework encompassing
tree and non-tree annotations alike. This development opens up new
possibilities for managing and exploiting multilayer annotations.Comment: 8 page
Algorithms that Remember: Model Inversion Attacks and Data Protection Law
Many individuals are concerned about the governance of machine learning
systems and the prevention of algorithmic harms. The EU's recent General Data
Protection Regulation (GDPR) has been seen as a core tool for achieving better
governance of this area. While the GDPR does apply to the use of models in some
limited situations, most of its provisions relate to the governance of personal
data, while models have traditionally been seen as intellectual property. We
present recent work from the information security literature around `model
inversion' and `membership inference' attacks, which indicate that the process
of turning training data into machine learned systems is not one-way, and
demonstrate how this could lead some models to be legally classified as
personal data. Taking this as a probing experiment, we explore the different
rights and obligations this would trigger and their utility, and posit future
directions for algorithmic governance and regulation.Comment: 15 pages, 1 figur
The INCF Digital Atlasing Program: Report on Digital Atlasing Standards in the Rodent Brain
The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.

Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals
Survey of time series database technology
This report has been prepared by Epimorphics Ltd. as part of the ENTRAIN project (NERC grant number NE/S016244/1) which is a feasibility project within the “NERC Constructing a Digital Environment Strategic Priorities Fund Programme”. The Centre for Ecology and Hydrology(CEH) is a research organisation focusing on land and freshwater ecosystems and their interaction with the atmosphere. The organization manages a number of sensor networks to monitor the environment, and also handles large databases of 3rd party data (e.g. river flows measured by the Environment Agency and equivalents in Scotland and Wales). Data from these networks is stored and made available to users, both internally (through direct query of databases, and externally via web-services). The ENTRAIN project aims to address a number of issues in relation to sensor data storage and integration, using a number of hydrological datasets to help define use cases: COSMOS-UK (a network of ~50 sites measuring soil moisture and meteorological variables at 1-30 minute resolutions); the CEH Greenhouse Gas (GHG) network (~15 sites measuring sub-second fluxes of gases and moisture, subsequently processed up to 30-minute aggregations); the Thames Initiative (a database of weekly and hourly water quality samples from sites around the Thames basin). In addition this report considers the UK National River Flow Archive, a database of daily river flows and catchment rainfall derived by regional environmental agencies from 15-minute measurements of river levels and flows. CEH commissioned this report to survey alternative technologies for storing sensor data that scale better, could manage larger data volumes more easily and less expensively, and that might be readily deployed on different infrastructures
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …