564,054 research outputs found
OpenAIREplus
Directions the outcomes of the OpenAIRE project, which
implements the EC Open Access (OA) pilot. Capitalizing on the OpenAIRE
infrastructure, built for managing FP7 and ERC funded articles, and the
associated supporting mechanism of the European Helpdesk System,
OpenAIREplus will “develop an open access, participatory infrastructure for
scientific information”. It will significantly expand its base of harvested
publications to also include all OA publications indexed by the DRIVER
infrastructure (more than 270 validated institutional repositories) and any other
repository containing “peer-reviewed literature” that complies with certain
standards. It will also generically harvest and index the metadata of scientific
datasets in selected diverse OA thematic data repositories. It will support the
concept of linked publications by deploying novel services for “linking peer-
reviewed literature and associated data sets and collections”, from link
discovery based on diverse forms of mining (textual, usage, etc.), to storage,
visual representation, and on-line exploration. It will offer both user-level
services to experts and “non-scientists” alike as well as programming interfaces
for “providers of value-added services” to build applications on its content.
Deposited articles and data will be openly accessible through an enhanced
version of the OpenAIRE portal, together with any available relevant
information on associated project funding and usage statistics. OpenAIREplus
will retain its European footprint, engaging people and scientific repositories in
almost all 27 EU member states and beyond. The technical work will be
complemented by a suite of studies and associated research efforts that will
partly proceed in collaboration with “different European initiatives” and
investigate issues of “intellectual property rights, efficient financing models,
and standards”.Acknowledgments. This work was supported in part by Open Access Infrastructure
for Research in Europe (OpenAIRE) EU project, the Bulgarian National Science Fund
under the Project D002-308 "Automated Metadata Generating for e-Documents
Specifications and Standards"
Hydrological modelling in a "big data" era: a proof of concept of hydrological models as web services
Dealing with the massive increase in global data availability of all sorts is increasingly being known as big data science. Indeed, largely leveraged by the internet, a new resource of data sets emerges that are so large and heterogeneous that they become awkward to work with. New algorithms, methods and models are needed to filter such data to find trends, test hypotheses, make predictions and quantify uncertainties. As a considerable share of the data relate to environmental processes (e.g., satellite images, distributed sensor networks), this evolution provides exciting challenges for environmental sciences, and hydrology in particular. Web-enabled models are a promising approach to process large and distributed data sets, and to provide tailored products for a variety of end-users. It will also allow hydrological models to be used as building blocks in larger earth system simulation systems. However, in order to do so we need to reconsider the ways that hydrological models are built, results are made available, and uncertainties are quantified. We present the results of an experimental proof of concept of a hydrological modelling web-service to process heterogeneous hydrological data sets. The hydrological model itself consists of a set of conceptual model routines implemented with on a common platform. This framework is linked to global and local data sets through web standards provided by the Open Geospatial Consortium, as well as to a web interface that enables an end-user to request stream flow simulations from a self-defined location. In essence, the proof-of-concept can be seen as an implementation of the Models of Everywhere concept introduced by Beven in 2007. Although the setup is operational and effectively simulates stream flow, we identify several bottlenecks for optimal hydrological simulation in a web-context. The major challenges we identify are related to (1) model selection; (2) uncertainty quantification, and (3) user interaction and scenario analysis. Model selection is inherent to hydrological modelling, because of the large spatial and temporal variability of processes, which inhibits the use of one optimal model structure. However, in a web context it becomes paramount that such selection is automatic, yet objective and transparent. Similarly, uncertainty quantification is a mainstream practice in hydrological modelling, but in a web-context uncertainty analysis face unprecedented challenges in terms of tracking uncertainties throughout a possibly geographically distributed workflow, as well as dealing with an extreme heterogeneity of data availability. Lastly, the ability of end-users to interact directly with hydrological models poses specific challenges in terms of mapping user scenarios (e.g., a scenario of land-use change) into the model parameter space for prediction and uncertainty quantification. The setup has been used in several scientific experiments, including the large-scale UK consortium project on an Environmental Virtual Observatory pilot
NASA GeneLab Concept of Operations
NASA's GeneLab aims to greatly increase the number of scientists that are using data from space biology investigations on board ISS, emphasizing a systems biology approach to the science. When completed, GeneLab will provide the integrated software and hardware infrastructure, analytical tools and reference datasets for an assortment of model organisms. GeneLab will also provide an environment for scientists to collaborate thereby increasing the possibility for data to be reused for future experimentation. To maximize the value of data from life science experiments performed in space and to make the most advantageous use of the remaining ISS research window, GeneLab will apply an open access approach to conducting spaceflight experiments by generating, and sharing the datasets derived from these biological studies in space.Onboard the ISS, a wide variety of model organisms will be studied and returned to Earth for analysis. Laboratories on the ground will analyze these samples and provide genomic, transcriptomic, metabolomic and proteomic data. Upon receipt, NASA will conduct data quality control tasks and format raw data returned from the omics centers into standardized, annotated information sets that can be readily searched and linked to spaceflight metadata. Once prepared, the biological datasets, as well as any analysis completed, will be made public through the GeneLab Space Bioinformatics System webb as edportal. These efforts will support a collaborative research environment for spaceflight studies that will closely resemble environments created by the Department of Energy (DOE), National Center for Biotechnology Information (NCBI), and other institutions in additional areas of study, such as cancer and environmental biology. The results will allow for comparative analyses that will help scientists around the world take a major leap forward in understanding the effect of microgravity, radiation, and other aspects of the space environment on model organisms. These efforts will speed the process of scientific sharing, iteration, and discovery
Viewpoints: A high-performance high-dimensional exploratory data analysis tool
Scientific data sets continue to increase in both size and complexity. In the
past, dedicated graphics systems at supercomputing centers were required to
visualize large data sets, but as the price of commodity graphics hardware has
dropped and its capability has increased, it is now possible, in principle, to
view large complex data sets on a single workstation. To do this in practice,
an investigator will need software that is written to take advantage of the
relevant graphics hardware. The Viewpoints visualization package described
herein is an example of such software. Viewpoints is an interactive tool for
exploratory visual analysis of large, high-dimensional (multivariate) data. It
leverages the capabilities of modern graphics boards (GPUs) to run on a single
workstation or laptop. Viewpoints is minimalist: it attempts to do a small set
of useful things very well (or at least very quickly) in comparison with
similar packages today. Its basic feature set includes linked scatter plots
with brushing, dynamic histograms, normalization and outlier detection/removal.
Viewpoints was originally designed for astrophysicists, but it has since been
used in a variety of fields that range from astronomy, quantum chemistry, fluid
dynamics, machine learning, bioinformatics, and finance to information
technology server log mining. In this article, we describe the Viewpoints
package and show examples of its usage.Comment: 18 pages, 3 figures, PASP in press, this version corresponds more
closely to that to be publishe
LAGOVirtual: A Collaborative Environment for the Large Aperture GRB Observatory
We present the LAGOVirtual Project: an ongoing project to develop platform to
collaborate in the Large Aperture GRB Observatory (LAGO). This continental-wide
observatory is devised to detect high energy (around 100 GeV) component of
Gamma Ray Bursts, by using the single particle technique in arrays of Water
Cherenkov Detectors (WCD) at high mountain sites (Chacaltaya, Bolivia, 5300 m
a.s.l., Pico Espejo, Venezuela, 4750 m a.s.l., Sierra Negra, Mexico, 4650 m
a.s.l). This platform will allow LAGO collaboration to share data, and computer
resources through its different sites. This environment has the possibility to
generate synthetic data by simulating the showers through AIRES application and
to store/preserve distributed data files collected by the WCD at the LAGO
sites. The present article concerns the implementation of a prototype of
LAGO-DR adapting DSpace, with a hierarchical structure (i.e. country,
institution, followed by collections that contain the metadata and data files),
for the captured/simulated data. This structure was generated by using the
community, sub-community, collection, item model; available at the DSpace
software. Each member institution-country of the project has the appropriate
permissions on the system to publish information (descriptive metadata and
associated data files). The platform can also associate multiple files to each
item of data (data from the instruments, graphics, postprocessed-data, etc.).Comment: Second EELA-2 Conference Choroni, Venezuela, November 25th to 27th
200
Linked Data - the story so far
The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward
Creating information delivery specifications using linked data
The use of Building Information Management (BIM) has become mainstream in many countries. Exchanging data in open standards like the Industry Foundation Classes (IFC) is seen as the only workable solution for collaboration. To define information needs for collaboration, many organizations are now documenting what kind of data they need for their purposes. Currently practitioners define their requirements often a) in a format that cannot be read by a computer; b) by creating their own definitions that are not shared. This paper proposes a bottom up solution for the definition of new building concepts a property. The authors have created a prototype implementation and will elaborate on the capturing of information specifications in the future
A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing
This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process<br/
- …