123 research outputs found

    SciQL, A query language for science applications

    Get PDF
    Scientific applications are still poorly served by contemporary relational database systems. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. Time has come to rectify this with SciQL, a SQL-query language for science applications with arrays as first class citizens. It provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying storage representation. The language extends value-based grouping in SQL with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between its index attributes. It leads to a generalization of window-based query processing. The SciQL architecture benefits from a column store system with an adaptive storage scheme, including keeping multiple representations around for reduced impedance mismatch. This paper is focused on the language features, its architectural consequences and extensive examples of its intended use

    DESIGN AND DEVELOPMENT OF THE AFRICAN PLASMODIUM FALCIPARUM DATABASE – (afriPFdb)

    Get PDF
    The detailed investigation of mantle structure from the dispersion of surface waves is a young but vigorous field of study. Observations have been accumulating rapidly in the past few years because of the wide-spread installation of long-period instruments. Modern methods of data analysis used in conjunction with high-speed digital computers have made it possible to determine dispersion with greater precision and over a broader spectrum than has previously been possible. Observations now extend out to the fundamental periods of free oscillations of the whole earth. Interpretation has lagged behind observation because of the difficulties inherent in the problem of dispersion over realistic models of a spherical earth. This problem is now well in hand and dispersion appropriate to the standard earth models suggested by earlier body waves studies has been calculated. Even with digital computers, however, the computations are so formidable that until recently only the most tentative efforts have been made to modify the standard earth structures to give a more satisfactory fit to the data. A review as recent as the one by Bolt in the preceding volume of this series was, of necessity, limited to a discussion of the various standard earth models with no attempt made to use the full power of surface waves as an independent technique. Recent developments have made detailed surface wave interpretations possible and new information, rather than generalized verification of old information, should be rapidly forthcoming. Project Mohole and the International Upper Mantle Project have focused the attention of many earth scientists on the upper mantle. Because of this renewed emphasis present information and speculation on the properties of the mantle based on a variety of sources is summarized and re-examined in some detail. This provides the guide-lines for potentially fruitful further research and points out the nature of some of the discrepancies and limitations in our present knowledge that may be resolved by the surface wave method

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    Modeling contaminant transport and fate and subsequent impacts on ecosystems

    Get PDF
    Assessing risks associated with the release of metals into the environment and managing remedial activities requires simulation tools that depict speciation and risk with accurate mechanistic models and well-defined transport parameters. Such tools need to address the following processes: (1) aqueous speciation, (2) distribution mechanisms, (3) transport, and (4) ecological risk. The primary objective of this research is to develop a simulation tool that accounts for these processes. Speciation in the aqueous phase can be assessed with geochemical equilibrium models, such as MINEQL+. Furthermore, metal distribution can be addressed mechanistically. Studies with Pb sorption to amorphous aluminum (HAG), iron (HFO), and manganese (HMO) oxides, as well as oxide coatings, demonstrated that intraparticle diffusion is the rate-limiting mechanism in the sorption process, where best-fit surface diffusivities ranged from 10-18 to 10-15 cm2 s-1 Intraparticle surface diffusion was incorporated into the Groundwater Modeling System (GMS) to accurately simulate metal contaminant mobility where oxides are present. In the model development, the parabolic concentration layer approximation and the operator split technique were used to solve the microscopic diffusion equation coupled with macroscopic advection and dispersion. The resulting model was employed for simulating Sr90 mobility at the U.S. Department of Energy (DOE) Hanford Site. The Sr90 plume is observed to be migrating out of the 100-N area extending into other areas of the Hanford Site and beyond. Once bioavailability is understood, static or dynamic ecological risk assessments can be conducted. Employing the ERA model, a static ecological risk assessment for exposure to depleted uranium (DU) at Aberdeen and Yuma Proving Grounds (APG and YPG) revealed that a reduction in plant root weight is considered likely to occur. For most terrestrial animals at YPG, the predicted DU dose is less than that which would result in a decrease in offspring. However, for the lesser long-nosed bat, reproductive effects are expected to occur through the reduction in size and weight of offspring. At APG, based on very limited data, it is predicted that uranium uptake will not likely affect survival of terrestrial animals and aquatic species. In model validation, sampling of pocket mice, kangaroo rat, white-throated woodrat, deer, and milfoil showed that body burden concentrations fall into the distributions simulated at both sites. This static risk assessment provides a solid background for applying the dynamic approach. Overall, this research contributes to a holistic approach in developing accurate mechanistic models for simulating metal contaminant mobility and bioavailability in subsurface environments

    Uncertainty analysis in ecological risk assessment modeling

    Get PDF
    A probabilistic approach employing Monte Carlo simulations for assessing parameter and risks as probabilistic distributions was used in an ecological risk assessment (ERA) model to characterize risk and address uncertainty. This study addresses the following sources of uncertainty: parameter inputs in the ERA models, risk algorithms and uncertain input concentrations. To achieve this objective, both sensitivity and uncertainty analyses are being conducted. Monte Carlo simulations were used for generating probabilistic distributions of parameter and model uncertainty. All sensitivity, uncertainty, and variability analyses were coded in Visual Basic as part of the ERA model software version 2001, which was developed under the Sustainable Green Manufacturing (SGM) program. This simulation tool includes a Window\u27s based interface, an interactive and modifiable database management system (DBMS) that addresses the food web at trophic levels, and a comprehensive evaluation of exposure pathways. To verify this model, ecological risks from Cr, Ta, Mo and DU exposure at the U.S. Army Yuma Proving Ground (YPG) and Aberdeen Proving Ground (APG) were assessed and characterized. For the case of DU exposure to YPG terrestrial plants, the overall distributions for DU uptake for plants suggest 90% likelihood in reduction in root weight. For most terrestrial animals at YPG, the dose is less than that resulting in a decrease in offspring. At APG, DU exposure potentially poses little risk for terrestrial animals, which is no observable impact on receptor\u27s reproduction or development. DU potentially poses lower risks to aquatic species at APG as well. The overall risk posed by the metals followed the order of Mo\u3eCr\u3eTa for both YPG and APG sits. Blacktailed-jackrabbits, lesser long-nosed bats, mule deer and cactus mice, at YPG site, are expected to have a reduction in size and weight of offspring. Terrestrial plants are likely to exhibit a reduction in root weight. For APG site, the vulnerable receptors are white-footed mice, white-tailed deer, and cottontail rabbits. For terrestrial plants, the risk result suggests a reduction in root weight. Aquatic species did not show any observable risk from Mo, Cr, and Ta in the terms of survival, growth and mortality

    Antares :a scalable, efficient platform for stream, historic, combined and geospatial querying

    Get PDF
    PhD ThesisTraditional methods for storing and analysing data are proving inadequate for processing \Big Data". This is due to its volume, and the rate at which it is being generated. The limitations of current technologies are further exacerbated by the increased demand for applications which allow users to access and interact with data as soon as it is generated. Near real-time analysis such as this can be partially supported by stream processing systems, however they currently lack the ability to store data for e cient historic processing: many applications require a combination of near real-time and historic data analysis. This thesis investigates this problem, and describes and evaluates a novel approach for addressing it. Antares is a layered framework that has been designed to exploit and extend the scalability of NoSQL databases to support low latency querying and high throughput rates for both stream and historic data analysis simultaneously. Antares began as a company funded project, sponsored by Red Hat the motivation was to identify a new technology which could provide scalable analysis of data, both stream and historic. The motivation for this was to explore new methods for supporting scale and e ciency, for example a layered approach. A layered approach would exploit the scale of historic stores and the speed of in-memory processing. New technologies were investigates to identify current mechanisms and suggest a means of improvement. Antares supports a layered approach for analysis, the motivation for the platform was to provide scalable, low latency querying of Twitter data for other researchers to help automate analysis. Antares needed to provide temporal and spatial analysis of Twitter data using the timestamp and geotag. The approach used Twitter as a use case and derived requirements from social scientists for a broader research project called Tweet My Street. Many data streaming applications have a location-based aspect, using geospatial data to enhance the functionality they provide. However geospatial data is inherently di - cult to process at scale due to its multidimensional nature. To address these di culties, - i - this thesis proposes Antares as a new solution to providing scalable and e cient mechanisms for querying geospatial data. The thesis describes the design of Antares and evaluates its performance on a range of scenarios taken from a real social media analytics application. The results show signi cant performance gains when compared to existing approaches, for particular types of analysis. The approach is evaluated by executing experiments across Antares and similar systems to show the improved results. Antares demonstrates a layered approach can be used to improve performance for inserts and searches as well as increasing the ingestion rate of the system

    LinkedScales : bases de dados em multiescala

    Get PDF
    Orientador: André SantanchèTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: As ciências biológicas e médicas precisam cada vez mais de abordagens unificadas para a análise de dados, permitindo a exploração da rede de relacionamentos e interações entre elementos. No entanto, dados essenciais estão frequentemente espalhados por um conjunto cada vez maior de fontes com múltiplos níveis de heterogeneidade entre si, tornando a integração cada vez mais complexa. Abordagens de integração existentes geralmente adotam estratégias especializadas e custosas, exigindo a produção de soluções monolíticas para lidar com formatos e esquemas específicos. Para resolver questões de complexidade, essas abordagens adotam soluções pontuais que combinam ferramentas e algoritmos, exigindo adaptações manuais. Abordagens não sistemáticas dificultam a reutilização de tarefas comuns e resultados intermediários, mesmo que esses possam ser úteis em análises futuras. Além disso, é difícil o rastreamento de transformações e demais informações de proveniência, que costumam ser negligenciadas. Este trabalho propõe LinkedScales, um dataspace baseado em múltiplos níveis, projetado para suportar a construção progressiva de visões unificadas de fontes heterogêneas. LinkedScales sistematiza as múltiplas etapas de integração em escalas, partindo de representações brutas (escalas mais baixas), indo gradualmente para estruturas semelhantes a ontologias (escalas mais altas). LinkedScales define um modelo de dados e um processo de integração sistemático e sob demanda, através de transformações em um banco de dados de grafos. Resultados intermediários são encapsulados em escalas reutilizáveis e transformações entre escalas são rastreadas em um grafo de proveniência ortogonal, que conecta objetos entre escalas. Posteriormente, consultas ao dataspace podem considerar objetos nas escalas e o grafo de proveniência ortogonal. Aplicações práticas de LinkedScales são tratadas através de dois estudos de caso, um no domínio da biologia -- abordando um cenário de análise centrada em organismos -- e outro no domínio médico -- com foco em dados de medicina baseada em evidênciasAbstract: Biological and medical sciences increasingly need a unified, network-driven approach for exploring relationships and interactions among data elements. Nevertheless, essential data is frequently scattered across sources with multiple levels of heterogeneity. Existing data integration approaches usually adopt specialized, heavyweight strategies, requiring a costly upfront effort to produce monolithic solutions for handling specific formats and schemas. Furthermore, such ad-hoc strategies hamper the reuse of intermediary integration tasks and outcomes. This work proposes LinkedScales, a multiscale-based dataspace designed to support the progressive construction of a unified view of heterogeneous sources. It departs from raw representations (lower scales) and goes towards ontology-like structures (higher scales). LinkedScales defines a data model and a systematic, gradual integration process via operations over a graph database. Intermediary outcomes are encapsulated as reusable scales, tracking the provenance of inter-scale operations. Later, queries can combine both scale data and orthogonal provenance information. Practical applications of LinkedScales are discussed through two case studies on the biology domain -- addressing an organism-centric analysis scenario -- and the medical domain -- focusing on evidence-based medicine dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação141353/2015-5CAPESCNP
    corecore