2,579 research outputs found

    User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project

    Full text link
    This work discusses how the MPContribs framework in the Materials Project (MP) allows user-contributed data to be shown and analyzed alongside the core MP database. The Materials Project is a searchable database of electronic structure properties of over 65,000 bulk solid materials that is accessible through a web-based science-gateway. We describe the motivation for enabling user contributions to the materials data and present the framework's features and challenges in the context of two real applications. These use-cases illustrate how scientific collaborations can build applications with their own "user-contributed" data using MPContribs. The Nanoporous Materials Explorer application provides a unique search interface to a novel dataset of hundreds of thousands of materials, each with tables of user-contributed values related to material adsorption and density at varying temperature and pressure. The Unified Theoretical and Experimental x-ray Spectroscopy application discusses a full workflow for the association, dissemination and combined analyses of experimental data from the Advanced Light Source with MP's theoretical core data, using MPContribs tools for data formatting, management and exploration. The capabilities being developed for these collaborations are serving as the model for how new materials data can be incorporated into the Materials Project website with minimal staff overhead while giving powerful tools for data search and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing Environments Workshop (2015), to be published in "Concurrency in Computation: Practice and Experience

    Development of computational tools for the analysis of 2D-nuclear magnetic resonance data

    Get PDF
    Dissertação de mestrado em BioinformaticsMetabolomics is one of the omics’ sciences that has been gaining a lot of interest due to its potential on correlating an organism’s biochemical activity and its phenotype. The applications of metabolomics are being extended as new techniques reveal new information on metabolic profiles and molecules, thus elucidating biological, chemical and functional knowledge. The main techniques that collect data are based on mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy. The last one has the advantage of analyzing a sample in vivo without damaging it and while its sensitivity is pointed out as a disadvantage, multidimensional NMR delivers a solution to this issue. It adds layers of information, generating new data that requires advanced bioinformatics methods in order to extract biological meaning. Since multidimensional NMR has different approaches within itself, the need to estab lish an integrated framework that allows a researcher to load its data and extract relevant knowledge has become more imperative over the years. Also, establishing common data analysis pipelines on one-dimensional and multidimensional NMR remains a challenge in current scientific research hindering reproducibility across research groups. In recent work from the host group, specmine, an R package for metabolomics and spectral data analysis/mining, has been developed to wrap and deliver key metabolomic methods that allow a researcher to perform a complete analysis. In this dissertation, tools integrated in specmine were developed to read, visualize and analyze two-dimensional (2D) NMR. A new specmine structure was created for this type of data, easing interpretation and data visualization. In terms of visualization a novel approach towards three-dimensional environments enables users to interact with their data allowing peak hovering or identification of rich resonance regions. The selection of which samples to plot, when the user does not specify an input, is based on a signal-to-noise ratio scale which plots samples with opposite signal-to-noise ratios. A method to perform peak detection on 2D NMR based on local maximum search was implemented to obtain a data structure that best benefits from specmine’s functionalities. These include preprocessing, univariate and multivariate analysis as well as machine learning and feature selection methods. The 2D NMR functions were validated using experimental data from two scientific papers, available on metabolomic databases and applying the necessary preprocessing steps to compare spectra and results. These data originated two case studies from different NMR sources, Bruker and Varian, which reinforces specmine’s flexibility. The case studies were carried out using mainly specmine and other packages for specific processing steps, such as, probabilistic quotient normalization. A pipeline to analyze 2D NMR was added to specmine, in a form of a vignette, to provide a guideline for the newly developed functionalities.A metabolómica é uma das ciências ómicas que tem vindo a ganhar muito interesse devido ao seu potencial para correlacionar a atividade bioquímica de um organismo com o seu fenótipo. As aplicações da metabolómica estão em constante crescimento à medida que novas técnicas revelam nova informação sobre perfis metabólicos e moleculares, elucidando conhecimento biológico, químico e funcional. As principais técnicas para recolher este tipo de dados são baseadas em espectrometria de massa e em ressonância magnética nuclear (RMN). Esta última tem a vantagem de analisar uma amostra in vivo sem a danificar e enquanto a sensibilidade da mesma tem sido apontada como uma desvantagem, surge a abordagem de RMN multidimensional melhorando a versão tradicional. Através da medição de outros núcleos adiciona camadas de informação, gerando um novo tipo de dados que requere métodos bioinformáticos avançados para se extrair significado biológico. A existência de várias abordagens para realizar RMN multidimensional leva à crescente necessidade da existência de uma ferramenta que integre este tipo de dados, de forma a permitir ao investigador executar a sua análise de forma eficaz. Adicionalmente, a consolidação de pipelines comuns para analisar dados de RMN uni- e multidimensional permanece um desafio a investigação científica, dificultando a reprodutibilidade de resultados por diferentes grupos de investigação. Em trabalhos recentes do grupo de acolhimento foi desenvolvido um package para o programa R focado na metabolómica e na análise/mineração de dados. Este package, specmine, tem sido melhorado desde o seu desenvolvimento funcionando como uma ferramenta que engloba diferentes métodos permitindo uma análise total a um determinado conjunto de dados. Baseado neste package, mais recentemente foi desenvolvida uma plataforma web integrada, WebSpecmine, com o mesmo propósito que providencia ao utilizador uma interface de utilizador mais fácil e amigável. Nesta dissertação, ferramentas que permitem a leitura, visualização e análise de NMR bidimensional (2D) foram desenvolvidas tendo em conta a sua integração no specmine. Uma nova estrutura foi adicionada ao package, facilitando a interpretação e esquemetazição dos dados. Quanto a visualização, uma abordagem inovadora para ambientes tridimensionais permite ao utilizador interagir com os seus dados através da identificação de regiões espectrais de interesse ou reconhecimento de picos. A visualização de espectros 2D, sem especificação por parte do utilizador, tem por base uma escala de relação sinal/ruído que permite numa primeira instância visualizar as amostras com uma maior e menor diferença entre sinal e ruído. Foi também implementado um método para realizar a deteção de picos em RMN 2D baseado na procura por valores máximos locais. Esta operação tem por objectivo obter uma estrutura de dados simplificada que melhor beneficia das funcionalidades do specmine. Estas incluem operações de pré-processamento, análises uni- e multivariada, métodos de seleção de variáveis e aprendizagem máquina. As funções desenvolvidas para RMN 2D foram validadas com dados experimentais recolhidos de dois artigos científicos, disponíveis em bases de dados de metabolómica e sobre os quais foram aplicados os passos de pré-processamento que permitissem a comparação de resultados. Estes dados originaram dois casos de estudos que abordavam diferentes instrumentos utilizados em RMN, Bruker e Varian, reforçando desta forma a flexibilidade do specmine relativamente as tipologias de dados capazes de serem lidas. Estes casos foram realizados utilizando principalmente o specmine, no entanto, a utilização de packages externos foi necessária para passos de processamento específicos, como por exemplo, a normalização por quociente probabilístico. Uma pipeline para analise de dados RMN 2D foi adicionada ao specmine, sob a forma de vignette, um formato de documentação longa adequado a packages implementados no programa R. Desta forma e proporcionado ao utilizador um conjunto de procedimentos, orientados a utilização correta das funcionalidades implementadas

    The dotTHz Project: A Standard Data Format for Terahertz Time-Domain Data and Elementary Data Processing Tools

    Full text link
    From investigating molecular vibrations to observing galaxies, terahertz technology has found extensive applications in research and development over the past three decades. Terahertz time-domain spectroscopy and imaging have experienced significant growth and now dominate spectral observations ranging from 0.1 to 10 THz. However, the lack of standardised protocols for data processing, dissemination, and archiving poses challenges in collaborating and sharing terahertz data between research groups. To tackle these challenges, we present the dotTHz project, which introduces a standardised terahertz data format and the associated open-source tools for processing and interpretation of dotTHz files. The dotTHz project aims to facilitate seamless data processing and analysis by providing a common framework. All software components are released under the MIT licence through GitHub repositories to encourage widespread adoption, modification, and collaboration. We invite the terahertz community to actively contribute to the dotTHz project, fostering the development of additional tools that encompass a greater breadth and depth of functionality. By working together, we can establish a comprehensive suite of resources that benefit the entire terahertz community

    A global soil spectral calibration library and estimation service

    Get PDF
    There is growing global interest in the potential for soil reflectance spectroscopy to fill an urgent need for more data on soil properties for improved decision-making on soil security at local to global scales. This is driven by the capability of soil spectroscopy to estimate a wide range of soil properties from a rapid, inexpensive, and highly reproducible measurement using only light. However, several obstacles are preventing wider adoption of soil spectroscopy. The biggest obstacles are the large variation in the soil analytical methods and operating procedures used in different laboratories, poor reproducibility of analyses within and amongst laboratories and a lack of soil physical archives. In addition, adoption is hindered by the expense and complexity of building soil spectral libraries and calibration models. The Global Soil Spectral Calibration Library and Estimation Service is proposed to overcome these obstacles by providing a freely available estimation service based on an open, high quality and diverse spectral calibration library and the extensive soil archives of the Kellogg Soil Survey Laboratory (KSSL) of the Natural Resources Conservation Service of the United States Department of Agriculture (USDA). The initiative is supported by the Global Soil Laboratory Network (GLOSOLAN) of the Global Soil Partnership and the Soil Spectroscopy for Global Good network, which provide additional support through dissemination of standards, capacity development and research. This service is a global public good which stands to benefit soil assessments globally, but especially developing countries where soil data and resources for conventional soil analyses are most limited

    NFDI-Neuro: building a community for neuroscience research data management in Germany

    Get PDF
    Increasing complexity and volume of research data pose increasing challenges for scientists to manage their data efficiently. At the same time, availability and reuse of research data are becoming more and more important in modern science. The German government has established an initiative to develop research data management (RDM) and to increase accessibility and reusability of research data at the national level, the Nationale Forschungsdateninfrastruktur (NFDI). The NFDI Neuroscience (NFDI-Neuro) consortium aims to represent the neuroscience community in this initiative. Here, we review the needs and challenges in RDM faced by researchers as well as existing and emerging solutions and benefits, and how the NFDI in general and NFDI-Neuro specifically can support a process for making these solutions better available to researchers. To ensure development of sustainable research data management practices, both technical solutions and engagement of the scientific community are essential. NFDI-Neuro is therefore focusing on community building just as much as on improving the accessibility of technical solutions

    2011 Strategic roadmap for Australian research infrastructure

    Get PDF
    The 2011 Roadmap articulates the priority research infrastructure areas of a national scale (capability areas) to further develop Australia’s research capacity and improve innovation and research outcomes over the next five to ten years. The capability areas have been identified through considered analysis of input provided by stakeholders, in conjunction with specialist advice from Expert Working Groups   It is intended the Strategic Framework will provide a high-level policy framework, which will include principles to guide the development of policy advice and the design of programs related to the funding of research infrastructure by the Australian Government. Roadmapping has been identified in the Strategic Framework Discussion Paper as the most appropriate prioritisation mechanism for national, collaborative research infrastructure. The strategic identification of Capability areas through a consultative roadmapping process was also validated in the report of the 2010 NCRIS Evaluation. The 2011 Roadmap is primarily concerned with medium to large-scale research infrastructure. However, any landmark infrastructure (typically involving an investment in excess of $100 million over five years from the Australian Government) requirements identified in this process will be noted. NRIC has also developed a ‘Process to identify and prioritise Australian Government landmark research infrastructure investments’ which is currently under consideration by the government as part of broader deliberations relating to research infrastructure. NRIC will have strategic oversight of the development of the 2011 Roadmap as part of its overall policy view of research infrastructure

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    Review:New sensors and data-driven approaches—A path to next generation phenomics

    Get PDF
    At the 4th International Plant Phenotyping Symposium meeting of the International Plant Phenotyping Network (IPPN) in 2016 at CIMMYT in Mexico, a workshop was convened to consider ways forward with sensors for phenotyping. The increasing number of field applications provides new challenges and requires specialised solutions. There are many traits vital to plant growth and development that demand phenotyping approaches that are still at early stages of development or elude current capabilities. Further, there is growing interest in low-cost sensor solutions, and mobile platforms that can be transported to the experiments, rather than the experiment coming to the platform. Various types of sensors are required to address diverse needs with respect to targets, precision and ease of operation and readout. Converting data into knowledge, and ensuring that those data (and the appropriate metadata) are stored in such a way that they will be sensible and available to others now and for future analysis is also vital. Here we are proposing mechanisms for “next generation phenomics” based on our learning in the past decade, current practice and discussions at the IPPN Symposium, to encourage further thinking and collaboration by plant scientists, physicists and engineering experts
    corecore