6,185 research outputs found
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical
workflows whose component modules, both composite and atomic, have been
annotated with keywords. Since keyword search does not use the graph structure
of a workflow, we develop a model of workflows using context-free bag grammars.
We then give efficient polynomial-time algorithms that, given a workflow and a
keyword query, determine whether some execution of the workflow matches the
query. Based on these algorithms we develop a search and ranking solution that
efficiently retrieves the top-k grammars from a repository. Finally, we propose
a novel result presentation method for grammars matching a keyword query, based
on representative parse-trees. The effectiveness of our approach is validated
through an extensive experimental evaluation
Cosmological Simulations on a Grid of Computers
The work presented in this paper aims at restricting the input parameter
values of the semi-analytical model used in GALICS and MOMAF, so as to derive
which parameters influence the most the results, e.g., star formation, feedback
and halo recycling efficiencies, etc. Our approach is to proceed empirically:
we run lots of simulations and derive the correct ranges of values. The
computation time needed is so large, that we need to run on a grid of
computers. Hence, we model GALICS and MOMAF execution time and output files
size, and run the simulation using a grid middleware: DIET. All the complexity
of accessing resources, scheduling simulations and managing data is harnessed
by DIET and hidden behind a web portal accessible to the users.Comment: Accepted and Published in AIP Conference Proceedings 1241, 2010,
pages 816-82
Taxonomies for Development
{Excerpt} Organizations spend millions of dollars on management systems without commensurate investments in the categorization needed to organize the information they rest on. Taxonomy work is strategic work: it enables efficient and interoperable retrieval and sharing of data, information, and knowledge by building needs and natural workflows in intuitive structures.
Bible readers think that taxonomy is the worldâs oldest profession. Whatever the case, the word is now synonymous with any hierarchical system of classification that orders domains of inquiry into groups and signifies natural relationships among these. (A taxonomic scheme is often depicted as a âtreeâ and individual taxonomic units as âbranchesâ in the tree.) Almost anything can be classified according to some taxonomic scheme. Resulting catalogs provide conceptual frameworks for miscellaneous purposes including knowledge identification, creation, storage, sharing, and use, including related decision making
Recommended from our members
Reading Lists in Cambridge: A Standard System?
In June 2008 a committee of librarians from across the University convened to investigate ways of improving library services, with particular regard to elearning and the provision of services to undergraduates. Reading lists quickly emerged as the major factor in undergraduate library use, as influential on the types of resources used by undergraduates, and as an area where there was potential for an improvement to the student experience. One of the committeeâs recommendations was that an application be made for an Arcadia Fellowship to investigate issues surrounding the adoption of a standard system for dealing with reading lists. The proposal was felt to map well onto the core issues highlighted by the Arcadia Programme â particularly Changes in Higher Education, New generations of library users, Technology and Changing academic workflows. This report is the result of that Fellowship.The Arcadia Programme has been generously funded by a Grant from the Arcadia Fund http://www.arcadiafund.org.uk
Bioinformatics process management: information flow via a computational journal
This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespreadâad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
WikiPathways: building research communities on biological pathways.
Here, we describe the development of WikiPathways (http://www.wikipathways.org), a public wiki for pathway curation, since it was first published in 2008. New features are discussed, as well as developments in the community of contributors. New features include a zoomable pathway viewer, support for pathway ontology annotations, the ability to mark pathways as private for a limited time and the availability of stable hyperlinks to pathways and the elements therein. WikiPathways content is freely available in a variety of formats such as the BioPAX standard, and the content is increasingly adopted by external databases and tools, including Wikipedia. A recent development is the use of WikiPathways as a staging ground for centrally curated databases such as Reactome. WikiPathways is seeing steady growth in the number of users, page views and edits for each pathway. To assess whether the community curation experiment can be considered successful, here we analyze the relation between use and contribution, which gives results in line with other wiki projects. The novel use of pathway pages as supplementary material to publications, as well as the addition of tailored content for research domains, is expected to stimulate growth further
Updates in metabolomics tools and resources: 2014-2015
Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resourcesâin the form of tools, software, and databasesâis currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table
A Consumers Guide to Grants Management Systems 2016
This report has been released by Grants Managers Network (GMN) and Technology Affinity Group (TAG), with research conducted by Idealware. The report compares 29 grants management systems across 174 requirements criteria, looks at what each system does, and compares the strengths and weaknesses of each system available to grantmakers. The report looks at how they stack up against high-level categories and details the functionality of each system against specific criteria important to the grant-making community
- âŠ