17,254 research outputs found
Recommended from our members
Computational Strategies for Scalable Genomics Analysis.
The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications
Counting absolute number of molecules using unique molecular identifiers
Advances in molecular biology have made it easy to identify different DNA or RNA species and to copy them. Identification of nucleic acid species can be accomplished by reading the DNA sequence; currently millions of molecules can be sequenced in a single day using massively parallel sequencing. Efficient copying of DNA-molecules of arbitrary sequence was made possible by molecular cloning, and the polymerase chain reaction. Differences in the relative abundance of a large number of different sequences between two or more samples can in turn be measured using microarray hybridization and/or tag sequencing. However, determining the relative abundance of two different species and/or the absolute number of molecules present in a single sample has proven much more challenging. This is because it is hard to detect individual molecules without copying them, and even harder to make defined number of copies of molecules. We show here that this limitation can be overcome by using unique molecular identifiers (umis), which make each molecule in the sample distinct
Large Eddy Simulations of gaseous flames in gas turbine combustion chambers
Recent developments in numerical schemes, turbulent combustion models and the regular increase of computing power allow Large Eddy Simulation (LES) to be applied to real industrial burners. In this paper, two types of LES in complex geometry combustors and of specific interest for aeronautical gas turbine burners are reviewed: (1) laboratory-scale combustors, without compressor or turbine, in which advanced measurements are possible and (2) combustion chambers of existing engines operated in realistic operating conditions. Laboratory-scale burners are designed to assess modeling and funda- mental flow aspects in controlled configurations. They are necessary to gauge LES strategies and identify potential limitations. In specific circumstances, they even offer near model-free or DNS-like LES computations. LES in real engines illustrate the potential of the approach in the context of industrial burners but are more difficult to validate due to the limited set of available measurements. Usual approaches for turbulence and combustion sub-grid models including chemistry modeling are first recalled. Limiting cases and range of validity of the models are specifically recalled before a discussion on the numerical breakthrough which have allowed LES to be applied to these complex cases. Specific issues linked to real gas turbine chambers are discussed: multi-perforation, complex acoustic impedances at inlet and outlet, annular chambers.. Examples are provided for mean flow predictions (velocity, temperature and species) as well as unsteady mechanisms (quenching, ignition, combustion instabil- ities). Finally, potential perspectives are proposed to further improve the use of LES for real gas turbine combustor designs
ToPoliNano: Nanoarchitectures Design Made Real
Many facts about emerging nanotechnologies are yet to be assessed. There are still major concerns, for instance, about maximum achievable device density, or about which architecture is best fit for a specific application. Growing complexity requires taking into account many aspects of technology, application and architecture at the same time. Researchers face problems that are not new per se, but are now subject to very different constraints, that need to be captured by design tools. Among the emerging nanotechnologies, two-dimensional nanowire based arrays represent promising nanostructures, especially for massively parallel computing architectures. Few attempts have been done, aimed at giving the possibility to explore architectural solutions, deriving information from extensive and reliable nanoarray characterization. Moreover, in the nanotechnology arena there is still not a clear winner, so it is important to be able to target different technologies, not to miss the next big thing. We present a tool, ToPoliNano, that enables such a multi-technological characterization in terms of logic behavior, power and timing performance, area and layout constraints, on the basis of specific technological and topological descriptions. This tool can aid the design process, beside providing a comprehensive simulation framework for DC and timing simulations, and detailed power analysis. Design and simulation results will be shown for nanoarray-based circuits. ToPoliNano is the first real design tool that tackles the top down design of a circuit based on emerging technologie
Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization
We describe the Microbial Community Reconstruction ({\bf MCR}) Problem, which
is fundamental for microbiome analysis. In this problem, the goal is to
reconstruct the identity and frequency of species comprising a microbial
community, using short sequence reads from Massively Parallel Sequencing (MPS)
data obtained for specified genomic regions. We formulate the problem
mathematically as a convex optimization problem and provide sufficient
conditions for identifiability, namely the ability to reconstruct species
identity and frequency correctly when the data size (number of reads) grows to
infinity. We discuss different metrics for assessing the quality of the
reconstructed solution, including a novel phylogenetically-aware metric based
on the Mahalanobis distance, and give upper-bounds on the reconstruction error
for a finite number of reads under different metrics. We propose a scalable
divide-and-conquer algorithm for the problem using convex optimization, which
enables us to handle large problems (with species). We show using
numerical simulations that for realistic scenarios, where the microbial
communities are sparse, our algorithm gives solutions with high accuracy, both
in terms of obtaining accurate frequency, and in terms of species phylogenetic
resolution.Comment: To appear in SPIRE 1
Experimental Progress in Computation by Self-Assembly of DNA Tilings
Approaches to DNA-based computing by self-assembly require the
use of D. T A nanostructures, called tiles, that have efficient chemistries, expressive
computational power: and convenient input and output (I/O) mechanisms.
We have designed two new classes of DNA tiles: TAO and TAE, both
of which contain three double-helices linked by strand exchange. Structural
analysis of a TAO molecule has shown that the molecule assembles efficiently
from its four component strands. Here we demonstrate a novel method for
I/O whereby multiple tiles assemble around a single-stranded (input) scaffold
strand. Computation by tiling theoretically results in the formation of structures
that contain single-stranded (output) reported strands, which can then
be isolated for subsequent steps of computation if necessary. We illustrate the
advantages of TAO and TAE designs by detailing two examples of massively
parallel arithmetic: construction of complete XOR and addition tables by linear
assemblies of DNA tiles. The three helix structures provide flexibility for
topological routing of strands in the computation: allowing the implementation
of string tile models
A new tool for the performance analysis of massively parallel computer systems
We present a new tool, GPA, that can generate key performance measures for
very large systems. Based on solving systems of ordinary differential equations
(ODEs), this method of performance analysis is far more scalable than
stochastic simulation. The GPA tool is the first to produce higher moment
analysis from differential equation approximation, which is essential, in many
cases, to obtain an accurate performance prediction. We identify so-called
switch points as the source of error in the ODE approximation. We investigate
the switch point behaviour in several large models and observe that as the
scale of the model is increased, in general the ODE performance prediction
improves in accuracy. In the case of the variance measure, we are able to
justify theoretically that in the limit of model scale, the ODE approximation
can be expected to tend to the actual variance of the model
- âŠ