322 research outputs found

    DAS: a data management system for instrument tests and operations

    Full text link
    The Data Access System (DAS) is a metadata and data management software system, providing a reusable solution for the storage of data acquired both from telescopes and auxiliary data sources during the instrument development phases and operations. It is part of the Customizable Instrument WorkStation system (CIWS-FW), a framework for the storage, processing and quick-look at the data acquired from scientific instruments. The DAS provides a data access layer mainly targeted to software applications: quick-look displays, pre-processing pipelines and scientific workflows. It is logically organized in three main components: an intuitive and compact Data Definition Language (DAS DDL) in XML format, aimed for user-defined data types; an Application Programming Interface (DAS API), automatically adding classes and methods supporting the DDL data types, and providing an object-oriented query language; a data management component, which maps the metadata of the DDL data types in a relational Data Base Management System (DBMS), and stores the data in a shared (network) file system. With the DAS DDL, developers define the data model for a particular project, specifying for each data type the metadata attributes, the data format and layout (if applicable), and named references to related or aggregated data types. Together with the DDL user-defined data types, the DAS API acts as the only interface to store, query and retrieve the metadata and data in the DAS system, providing both an abstract interface and a data model specific one in C, C++ and Python. The mapping of metadata in the back-end database is automatic and supports several relational DBMSs, including MySQL, Oracle and PostgreSQL.Comment: Accepted for pubblication on ADASS Conference Serie

    Web-based interface for environmental niche modelling

    Get PDF
    Marine species are subject to anthropogenic impacts, such as noise pollution, marine litter, and direct impact collisions. While there are efforts in the marine community and crowd-sourcing to report the occurrence of marine species, not enough projects explore the prediction of where such animals may be. This dissertation analyzes the state of the art in species distribution model ing (SDM) systems, capable of reporting and predicting marine biodiversity. The proposal implements the algorithms for predicting species through publicly avail able repositories of data, provides means to ease the upload and management of occurrence points as well as methods for prediction analysis. A web-based user interface is proposed using Ecological Niche Modelling (ENM) as an automated alerting mechanism towards ecological awareness. Performed user studies evaluate marine biodiversity concerns from fisherman and whale-watching sea-vessels, assessing attitudes, threats, values, and motiva tion of both samples. Further, biologists and experts on ENMs will evaluate the workflow and interface, reviewing the proposal’s potential to enable ecologists to create solutions for their custom problems using simple protocols without the need for any third-party entities and extensive knowledge in programming.Espécies marinhas estão sujeitas a impactos antropogênicos, tais como poluição sonora, lixo marinho, e colisões com tráfego marinho. Apesar de existirem al guns esforços da comunidade marinha e crowdsourcing relativamente ao registo de ocorrências de biodiversidade marinha, não existem projeto suficientes que exploram as previsões de onde estas espécies poderão estar. Esta dissertação analisa o estado da arte em sistemas de modelação de dis tribuição de espécies, capazes de relatar e prever biodiversidade marinha. A pro posta implementa os algoritmos para prever espécies por meio de repositórios consolidados de dados disponíveis online, fornece meios para facilitar o carrega mento e gestão de pontos de ocorrência, bem como métodos para análise das previsões. Uma interface web de utilizador é proposta utilizando Ecological Niche Modeling como um mecanismo de alerta automatizado para incrementar a con sciência ecológica. Os estudos do sistema irão avaliar as preocupações relativas a biodiversi dade marinha de embarcações de pesca e navios de observação de baleias. Desta forma é possível determinar atitudes, ameaças, valores e motivação de ambas as amostras para com a biodiversidade marinha. Além disso, biólogos e espe cialistas nesta tipologia de sistemas, avaliarão o fluxo de trabalho e a inter face desenvolvida, avaliando o potencial do sistema, permitindo aos ecologistas criar soluções personalizados através de protocolos simples, sem a necessidade de quaisquer entidades terceirizadas e conhecimento em programação

    The conceptual design of SeamFrame

    Get PDF
    This project deliverable provides the underlying architecture of a concept for linking models and databases and it provides the design of SeamFrame, delivering its architecture to provide an integration framework for models and simulation algorithms, supported by procedures for data handling and spatial representation, quality control, output visualization and documentatio

    Algebra of core concept transformations: Procedural meta-data for geographic information

    Get PDF
    Transformations are essential for dealing with geographic information. They are involved not only in converting between geodata formats and reference systems, but also in turning geodata into useful information according to some purpose. However, since a transformation can be implemented in various formats and tools, its function and purpose usually remains hidden underneath the technicalities of a workflow. To automate geographic information procedures, we therefore need to model the transformations implemented by workflows on a conceptual level, as a form of procedural knowledge. Although core concepts of spatial information provide a useful level of description in this respect, we currently lack a model for the space of possible transformations between such concepts. In this article, we present the algebra of core concept transformations (CCT). It consists of a type hierarchy which models core concepts as relations, and a set of basic transformations described in terms of function signatures that use such types. Type inference allows us to enrich GIS workflows with abstract machine-readable metadata, by compiling algebraic tool descriptions. This allows us to automatically infer goal concepts across workflows and to query over such concepts across raster and vector implementations. We evaluate the algebra over a set of expert GIS workflows taken from online tutorials

    Machine Learning with Time Series: A Taxonomy of Learning Tasks, Development of a Unified Framework, and Comparative Benchmarking of Algorithms

    Get PDF
    Time series data is ubiquitous in real-world applications. Such data gives rise to distinct but closely related learning tasks (e.g. time series classification, regression or forecasting). In contrast to the more traditional cross-sectional setting, these tasks are often not fully formalized. As a result, different tasks can become conflated under the same name, algorithms are often applied to the wrong task, and performance estimates are are potentially unreliable. In practice, software frameworks such as scikit-learn have become essential tools for data science. However, most existing frameworks focus on cross-sectional data. To our know- ledge, no comparable frameworks exist for temporal data. Moreover, despite the importance of these framework, their design principles have never been fully understood. Instead, discussions often concentrate on the usage and features, while almost completely ignoring the design. To address these issues, we develop in this thesis (i) a formal taxonomy of learning tasks, (ii) novel design principles for ML toolboxes and (iii) a new unified framework for ML with time series. The framework has been implemented in an open-source Python package called sktime. The design principles are derived from existing state-of-the-art toolboxes and classical software design practices, using a domain-driven approach and a novel scientific type system. We show that these principles cannot just explain key aspects of existing frameworks, but also guide the development of new ones like sktime. Finally, we use sktime to reproduce and extend the M4 competition, one of the major comparative benchmarking studies for forecasting. Reproducing the competition allows us to verify the published results and illustrate sktime’s effectiveness. Extending the competition enables us to explore the potential of previously unstudied ML models. We find that, on a subset of the M4 data, simple ML models implemented in sktime can match the state-of-the-art performance of the hand-crafted M4 winner models

    The SPOSAD Architectural Style for Multi-tenant Software Applications

    Full text link
    Keywords-Software architecture; Software quality; Software performance; Software maintenance Abstract—A multi-tenant software application is a special type of highly scalable, hosted software, in which the ap-plication and its infrastructure are shared among multiple tenants to save development and maintenance costs. The limited understanding of the underlying architectural concepts still prevents many software architects from designing such a system. Existing documentation on multi-tenant software architectures is either technology-specific or database-centric. A more technology-independent perspective is required to enable wide-spread adoption of multi-tenant architectures. We propose the SPOSAD architectural style, which describes the components, connectors, and data elements of a multi-tenant architecture as well as constraints imposed on these elements. This paper describes the benefits of a such an architecture and the trade-offs for the related design decisions. To evaluate our proposal, we illustrate how concepts of the style help to make current Platform-as-a-Service (PaaS) environments, such as Force.com, Windows Azure, and Google App Engine scalable and customizable

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    corecore