1,243 research outputs found
High-performance integrated virtual environment (HIVE) tools and applications for big data analysis
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis
Recommended from our members
Adding Phylogenies to QGIS and Lifemapper for Evolutionary Studies of Species Diversity
Phylogenetic data from the “Tree of Life” have explicit spatial and temporal components when paired with species distribution and ecological data for testing contributions to biological community assembly at different geographic scales of species interaction. Important questions in biology about the degree of niche suitability and whether the history of a community’s assembly for an area can affect whether the species in a community are more or less phylogenetically related can be answered using several different spatially-filtered measures of phylogenetic diversity. Phylogenetic analyses which support the description of ecological processes are usually achieved in a handful of software libraries that are narrowly focused on a single set of tasks. Very few applications scale to large datasets and most do not have an explicit spatial component without relying on external visualization packages. This prompted us to explore bringing phylogenetic data into an open-source GIS environment. The Lifemapper Macroecology/Range & Diversity QGIS plug-in is a custom plug-in which we use to calculate and map biodiversity indices that describe range-diversity relationships derived from large multi-species datasets. We describe extensions to that plug-in which expand the Lifemapper set of ecological tools to link phylogenies to spatially derived ’diversity field’ statistics that describe the phylogenetic composition of natural communities
The distributed ASCI supercomputer project
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project
MAGDA: A Mobile Agent based Grid Architecture
Mobile agents mean both a technology
and a programming paradigm. They allow for a
flexible approach which can alleviate a number
of issues present in distributed and Grid-based
systems, by means of features such as migration,
cloning, messaging and other provided mechanisms.
In this paper we describe an architecture
(MAGDA – Mobile Agent based Grid Architecture)
we have designed and we are currently
developing to support programming and execution
of mobile agent based application upon Grid
systems
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Performance formula-based optimal deployments of multilevel indices for service retrieval.
There are many different index structures for service repositories, such as sequential index, inverted index, and multilevel indices that include three deployments. Different service sets maybe have different characteristics that may affect performance from different aspects. For a given service set, which index structure is the most optimal one? To address these issues, this paper analyses five indexing models and proposes expectation of traversed service count to estimate performance of service retrieval. Based on these expectation formulas, an optimal deployment method can be identified to maximize efficiency of service retrieval. Our experiments first validate correctness of the proposed formulas and then validate the effective of the optimal method.UK-China Knowledge Economy Education Partnershi
- …