322 research outputs found
DAS: a data management system for instrument tests and operations
The Data Access System (DAS) is a metadata and data management software
system, providing a reusable solution for the storage of data acquired both
from telescopes and auxiliary data sources during the instrument development
phases and operations. It is part of the Customizable Instrument WorkStation
system (CIWS-FW), a framework for the storage, processing and quick-look at the
data acquired from scientific instruments. The DAS provides a data access layer
mainly targeted to software applications: quick-look displays, pre-processing
pipelines and scientific workflows. It is logically organized in three main
components: an intuitive and compact Data Definition Language (DAS DDL) in XML
format, aimed for user-defined data types; an Application Programming Interface
(DAS API), automatically adding classes and methods supporting the DDL data
types, and providing an object-oriented query language; a data management
component, which maps the metadata of the DDL data types in a relational Data
Base Management System (DBMS), and stores the data in a shared (network) file
system. With the DAS DDL, developers define the data model for a particular
project, specifying for each data type the metadata attributes, the data format
and layout (if applicable), and named references to related or aggregated data
types. Together with the DDL user-defined data types, the DAS API acts as the
only interface to store, query and retrieve the metadata and data in the DAS
system, providing both an abstract interface and a data model specific one in
C, C++ and Python. The mapping of metadata in the back-end database is
automatic and supports several relational DBMSs, including MySQL, Oracle and
PostgreSQL.Comment: Accepted for pubblication on ADASS Conference Serie
Web-based interface for environmental niche modelling
Marine species are subject to anthropogenic impacts, such as noise pollution,
marine litter, and direct impact collisions. While there are efforts in the marine
community and crowd-sourcing to report the occurrence of marine species, not
enough projects explore the prediction of where such animals may be.
This dissertation analyzes the state of the art in species distribution model ing (SDM) systems, capable of reporting and predicting marine biodiversity. The
proposal implements the algorithms for predicting species through publicly avail able repositories of data, provides means to ease the upload and management of
occurrence points as well as methods for prediction analysis. A web-based user
interface is proposed using Ecological Niche Modelling (ENM) as an automated
alerting mechanism towards ecological awareness.
Performed user studies evaluate marine biodiversity concerns from fisherman
and whale-watching sea-vessels, assessing attitudes, threats, values, and motiva tion of both samples. Further, biologists and experts on ENMs will evaluate the
workflow and interface, reviewing the proposal’s potential to enable ecologists
to create solutions for their custom problems using simple protocols without the
need for any third-party entities and extensive knowledge in programming.Espécies marinhas estão sujeitas a impactos antropogênicos, tais como poluição
sonora, lixo marinho, e colisões com tráfego marinho. Apesar de existirem al guns esforços da comunidade marinha e crowdsourcing relativamente ao registo
de ocorrências de biodiversidade marinha, não existem projeto suficientes que
exploram as previsões de onde estas espécies poderão estar.
Esta dissertação analisa o estado da arte em sistemas de modelação de dis tribuição de espécies, capazes de relatar e prever biodiversidade marinha. A pro posta implementa os algoritmos para prever espécies por meio de repositórios
consolidados de dados disponíveis online, fornece meios para facilitar o carrega mento e gestão de pontos de ocorrência, bem como métodos para análise das
previsões. Uma interface web de utilizador é proposta utilizando Ecological Niche
Modeling como um mecanismo de alerta automatizado para incrementar a con sciência ecológica.
Os estudos do sistema irão avaliar as preocupações relativas a biodiversi dade marinha de embarcações de pesca e navios de observação de baleias. Desta
forma é possível determinar atitudes, ameaças, valores e motivação de ambas
as amostras para com a biodiversidade marinha. Além disso, biólogos e espe cialistas nesta tipologia de sistemas, avaliarão o fluxo de trabalho e a inter face desenvolvida, avaliando o potencial do sistema, permitindo aos ecologistas
criar soluções personalizados através de protocolos simples, sem a necessidade
de quaisquer entidades terceirizadas e conhecimento em programação
The conceptual design of SeamFrame
This project deliverable provides the underlying architecture of a concept for linking models and databases and it provides the design of SeamFrame, delivering its architecture to provide an integration framework for models and simulation algorithms, supported by procedures for data handling and spatial representation, quality control, output visualization and documentatio
Algebra of core concept transformations: Procedural meta-data for geographic information
Transformations are essential for dealing with geographic information. They are involved not only in converting between geodata formats and reference systems, but also in turning geodata into useful information according to some purpose. However, since a transformation can be implemented in various formats and tools, its function and purpose usually remains hidden underneath the technicalities of a workflow. To automate geographic information procedures, we therefore need to model the transformations implemented by workflows on a conceptual level, as a form of procedural knowledge. Although core concepts of spatial information provide a useful level of description in this respect, we currently lack a model for the space of possible transformations between such concepts. In this article, we present the algebra of core concept transformations (CCT). It consists of a type hierarchy which models core concepts as relations, and a set of basic transformations described in terms of function signatures that use such types. Type inference allows us to enrich GIS workflows with abstract machine-readable metadata, by compiling algebraic tool descriptions. This allows us to automatically infer goal concepts across workflows and to query over such concepts across raster and vector implementations. We evaluate the algebra over a set of expert GIS workflows taken from online tutorials
Machine Learning with Time Series: A Taxonomy of Learning Tasks, Development of a Unified Framework, and Comparative Benchmarking of Algorithms
Time series data is ubiquitous in real-world applications. Such data gives rise to distinct but closely related learning tasks (e.g. time series classification, regression or forecasting). In contrast to the more traditional cross-sectional setting, these tasks are often not fully formalized. As a result, different tasks can become conflated under the same name, algorithms are often applied to the wrong task, and performance estimates are are potentially unreliable. In practice, software frameworks such as scikit-learn have become essential tools for data science. However, most existing frameworks focus on cross-sectional data. To our know- ledge, no comparable frameworks exist for temporal data. Moreover, despite the importance of these framework, their design principles have never been fully understood. Instead, discussions often concentrate on the usage and features, while almost completely ignoring the design. To address these issues, we develop in this thesis (i) a formal taxonomy of learning tasks, (ii) novel design principles for ML toolboxes and (iii) a new unified framework for ML with time series. The framework has been implemented in an open-source Python package called sktime. The design principles are derived from existing state-of-the-art toolboxes and classical software design practices, using a domain-driven approach and a novel scientific type system. We show that these principles cannot just explain key aspects of existing frameworks, but also guide the development of new ones like sktime. Finally, we use sktime to reproduce and extend the M4 competition, one of the major comparative benchmarking studies for forecasting. Reproducing the competition allows us to verify the published results and illustrate sktime’s effectiveness. Extending the competition enables us to explore the potential of previously unstudied ML models. We find that, on a subset of the M4 data, simple ML models implemented in sktime can match the state-of-the-art performance of the hand-crafted M4 winner models
The SPOSAD Architectural Style for Multi-tenant Software Applications
Keywords-Software architecture; Software quality; Software performance; Software maintenance Abstract—A multi-tenant software application is a special type of highly scalable, hosted software, in which the ap-plication and its infrastructure are shared among multiple tenants to save development and maintenance costs. The limited understanding of the underlying architectural concepts still prevents many software architects from designing such a system. Existing documentation on multi-tenant software architectures is either technology-specific or database-centric. A more technology-independent perspective is required to enable wide-spread adoption of multi-tenant architectures. We propose the SPOSAD architectural style, which describes the components, connectors, and data elements of a multi-tenant architecture as well as constraints imposed on these elements. This paper describes the benefits of a such an architecture and the trade-offs for the related design decisions. To evaluate our proposal, we illustrate how concepts of the style help to make current Platform-as-a-Service (PaaS) environments, such as Force.com, Windows Azure, and Google App Engine scalable and customizable
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …