1,118 research outputs found
Computational Methods for Gene Expression and Genomic Sequence Analysis
Advances in technologies currently produce more and more cost-effective, high-throughput, and large-scale biological data. As a result, there is an urgent need for developing efficient computational methods for analyzing these massive data. In this dissertation, we introduce methods to address several important issues in gene expression and genomic sequence analysis, two of the most important areas in bioinformatics.Firstly, we introduce a novel approach to predicting patterns of gene response to multiple treatments in case of small sample size. Researchers are increasingly interested in experiments with many treatments such as chemicals compounds or drug doses. However, due to cost, many experiments do not have large enough samples, making it difficult for conventional methods to predict patterns of gene response. Here we introduce an approach which exploited dependencies of pairwise comparisons outcomes and resampling techniques to predict true patterns of gene response in case of insufficient samples. This approach deduced more and better functionally enriched gene clusters than conventional methods. Our approach is therefore useful for multiple-treatment studies which have small sample size or contain highly variantly expressed genes.Secondly, we introduce a novel method for aligning short reads, which are DNA fragments extracted across genomes of individuals, to reference genomes. Results from short read alignment can be used for many studies such as measuring gene expression or detecting genetic variants. Here we introduce a method which employed an iterated randomized algorithm based on FM-index, an efficient data structure for full-text indexing, to align reads to the reference. This method improved alignment performance across a wide range of read lengths and error rates compared to several popular methods, making it a good choice for community to perform short read alignment.Finally, we introduce a novel approach to detecting genetic variants such as SNPs (single nucleotide polymorphisms) or INDELs (insertions/deletions). This study has great significance in a wide range of areas, from bioinformatics and genetic research to medical field. For example, one can predict how genomic changes are related to phenotype in their organism of interest, or associate genetic changes to disease risk or medical treatment efficacy. Here we introduce a method which leveraged known genetic variants existing in well-established databases to improve accuracy of detecting variants. This method had higher accuracy than several state-of-the-art methods in many cases, especially for detecting INDELs. Our method therefore has potential to be useful in research and clinical applications which rely on identifying genetic variants accurately
Parallel computations based on domain decompositions and integrated radial basis functions for fluid flow problems
The thesis reports a contribution to the development of parallel algorithms based on Domain Decomposition (DD) method and Compact Local Integrated Radial Basis Function (CLIRBF) method. This development aims to solve large scale
fluid flow problems more efficiently by using parallel high performance computing (HPC). With the help of the DD method, one big problem can be separated into sub-problems and solved on parallel machines. In terms of numerical analysis, for each sub-problem, the overall condition number of the system matrix is significantly reduced. This is one of the main reasons for the stability, high
accuracy and efficiency of parallel algorithms. The developed methods have been successfully applied to solve several benchmark problems with both rectangular
and non-rectangular boundaries.
In parallel computation, there is a challenge called Distributed Termination Detection (DTD) problem. DTD concerns the discovery whether all processes in a
distributed system have finished their job. In a distributed system, this problem is not a trivial problem because there is neither a global synchronised clock nor
a shared memory. Taking into account the specific requirement of parallel algorithms, a new algorithm is proposed and called the Bitmap DTD. This algorithm
is designed to work with DD method for solving Partial Differential Equations (PDEs). The Bitmap DTD algorithm is inspired by the Credit/Recovery DTD class (or weight-throw). The distinguishing feature of this algorithm is the use of a bitmap to carry the snapshot of the system from process to process. The proposed algorithm possesses characteristics as follows. (i) It allows any process to
detect termination (symmetry); (ii) it does not require any central control agent (decentralisation); (iii) termination detection delay is of the order of the diameter of the network; and (iv) the message complexity of the proposed algorithm is optimal.
In the first attempt, the combination of the DD method and CLIRBF based collocation approach yields an effective parallel algorithm to solve PDEs. This approach has enabled not only the problem to be solved separately in each subdomain by a Central Processing Unit (CPU) but also compact local stencils to be independently treated. The present algorithm has achieved high throughput
in solving large scale problems. The procedure is illustrated by several numerical examples including the benchmark lid-driven cavity flow problem.
A new parallel algorithm is developed using the Control Volume Method (CVM) for the solution of PDEs. The goal is to develop an efficient parallel algorithm
especially for problems with non-rectangular domains. When combined with CLIRBF approach, the resultant method can produce high-order accuracy and economical solution for problems with complex boundary. The algorithm is verified
by solving two benchmark problems including the square lid-driven cavity flow and the triangular lid-driven cavity flow. In both cases, the accuracy is in great agreement with benchmark values. In terms of efficiency, the results show that the method has a very high efficiency profile and for some specific cases a super-linear speed-up is achieved.
Although overlapping method yields a straightforward implementation and stable convergence, overlapping of sub-domains makes it less applicable for complex
domains. The method even generates more computing overhead for each subdomain as the overlapping area grows. Hence, a parallel algorithm based on non-overlapping DD and CLIRBF has been developed for solving Navier-Stokes
equations where a CLIRBF scheme is used to solve the problem in each subdomain. A relaxation factor is employed for the transmission conditions at the interface of sub-domains to ensure the convergence of the iterative method while the Bitmap DTD algorithm is used to achieve the global termination. The parallel algorithm is demonstrated through two fluid flow problems, namely the natural
convection in concentric annuli (Boussinesq fluids) and the lid-driven cavity flow (viscous fluids). The results confirm the high efficiency of the present method in
comparison with a sequential algorithm. A super-linear efficiency is also observed for a range of numbers of CPUs.
Finally, when comparing the overlapping and non-overlapping parallel algorithms, it is found that the non-overlapping one is less stable. The numerical results show that the non-overlapping method is not able to converge for high Reynolds number while overlapping method reaches the same convergence profile as the sequential CLIRBF method. Thus, in this research when dealing with non-Newtonian
fluids and large scale problems, the overlapping method is preferred to the nonoverlapping one. The flow of Oldroyd-B fluid through a planar contraction was considered as a benchmark problem. In this problem, the singularity of stress at the re-entrant corners always poses difficulty to numerical methods in obtaining stable solutions at high Weissenberg numbers. In this work, a high resolution
simulation of the flow is obtained and the contour of streamline is shown to be in great agreement with other results
A High-Throughput Hardware Implementation of NAT Traversal For IPSEC VPN
In this paper, we present a high-throughput FPGA implementation of IPSec core. The core supports both NAT and non-NAT mode and can be used in high speed security gateway devices. Although IPSec ESP is very computing intensive for its cryptography process, our implementation shows that it can achieve high throughput and low lantency. The system is realized on the Zynq XC7Z045 from Xilinx and was verified and tested in practice. Results show that the design can gives a peak throughput of 5.721 Gbps for the IPSec ESP tunnel mode in NAT mode and 7.753 Gbps in non-NAT mode using one single AES encrypt core. We also compare the performance of the core when running in other mode of encryption
Modal decomposition technique for multimode fibers
International audienceWe propose a new solution for modal decomposition in multimode fibers, based on a spectral and spatial imaging technique. The appearance of spurious modes in the spectral and spatial processing of the images at the output of the fiber under test when it has more than two modes is demonstrated theoretically. The new method, which allows us to identify spurious modes, is more accurate, simpler, and faster than previously reported methods. For demonstration, measurements in a standard step-index multimode fiber and a small-core microstructured fiber are carried out successfully
Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds
Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids
Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds
Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids
Toward production of jet fuel functionality in oilseeds: identification of FatB acyl-acyl carrier protein thioesterases and evaluation of combinatorial expression strategies in \u3ci\u3eCamelina\u3c/i\u3e seeds
Seeds of members of the genus Cuphea accumulate medium-chain fatty acids (MCFAs; 8:0–14:0). MCFA- and palmitic acid- (16:0) rich vegetable oils have received attention for jet fuel production, given their similarity in chain length to Jet A fuel hydrocarbons. Studies were conducted to test genes, including those from Cuphea, for their ability to confer jet fuel-type fatty acid accumulation in seed oil of the emerging biofuel crop Camelina sativa. Transcriptomes from Cuphea viscosissima and Cuphea pulcherrima developing seeds that accumulate \u3e90% of C8 and C10 fatty acids revealed three FatB cDNAs (CpuFatB3, CvFatB1, and CpuFatB4) expressed predominantly in seeds and structurally divergent from typical FatB thioesterases that release 16:0 from acyl carrier protein (ACP). Expression of CpuFatB3 and CvFatB1 resulted in Camelina oil with capric acid (10:0), and CpuFatB4 expression conferred myristic acid (14:0) production and increased 16:0. Co-expression of combinations of previously characterized Cuphea and California bay FatBs produced Camelina oils with mixtures of C8–C16 fatty acids, but amounts of each fatty acid were less than obtained by expression of individual FatB cDNAs. Increases in lauric acid (12:0) and 14:0, but not 10:0, in Camelina oil and at the sn-2 position of triacylglycerols resulted from inclusion of a coconut lysophosphatidic acid acyltransferase specialized for MCFAs. RNA interference (RNAi) suppression of Camelina β-ketoacyl-ACP synthase II, however, reduced 12:0 in seeds expressing a 12:0-ACP-specific FatB. Camelina lines presented here provide platforms for additional metabolic engineering targeting fatty acid synthase and specialized acyltransferases for achieving oils with high levels of jet fuel-type fatty acids
- …