7,013 research outputs found

    Compressed Genotyping

    Full text link
    Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from Cystic Fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective genotyping protocol. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue on Molecular Biology and Neuroscienc

    Metabolomics : a tool for studying plant biology

    Get PDF
    In recent years new technologies have allowed gene expression, protein and metabolite profiles in different tissues and developmental stages to be monitored. This is an emerging field in plant science and is applied to diverse plant systems in order to elucidate the regulation of growth and development. The goal in plant metabolomics is to analyze, identify and quantify all low molecular weight molecules of plant organisms. The plant metabolites are extracted and analyzed using various sensitive analytical techniques, usually mass spectrometry (MS) in combination with chromatography. In order to compare the metabolome of different plants in a high through-put manner, a number of biological, analytical and data processing steps have to be performed. In the work underlying this thesis we developed a fast and robust method for routine analysis of plant metabolite patterns using Gas Chromatography-Mass Spectrometry (GC/MS). The method was performed according to Design of Experiment (DOE) to investigate factors affecting the extraction and derivatization of the metabolites from leaves of the plant Arabidopsis thaliana. The outcome of metabolic analysis by GC/MS is a complex mixture of approximately 400 overlapping peaks. Resolving (deconvoluting) overlapping peaks is time-consuming, difficult to automate and additional processing is needed in order to compare samples. To avoid deconvolution being a major bottleneck in high through-put analyses we developed a new semi-automated strategy using hierarchical methods for processing GC/MS data that can be applied to all samples simultaneously. The two methods include base-line correction of the non-processed MS-data files, alignment, time-window determinations, Alternating Regression and multivariate analysis in order to detect metabolites that differ in relative concentrations between samples. The developed methodology was applied to study the effects of the plant hormone GA on the metabolome, with specific emphasis on auxin levels in Arabidopsis thaliana mutants defective in GA biosynthesis and signalling. A large series of plant samples was analysed and the resulting data were processed in less than one week with minimal labour; similar to the time required for the GC/MS analyses of the samples

    A Cost Impact Assessment Tool for PFS Logistics Consulting

    Get PDF
    Response surface methodology (RSM) is used for optimality analysis of the cost parameters in mixed integer linear programming. This optimality analysis goes beyond traditional sensitivity and parametric analysis in allowing investigation of the optimal objective function value response over pre-specified ranges on multiple problem parameters. Design of experiments and least squares regression are used to indicate which cost parameters have the greatest impact on the optimal objective function value total cost-and to approximate the optimal total cost surface over the specified ranges on the parameters. The mixed integer linear programming problems of interest are the large-scale problems in supply chain optimization also known as facility location and allocation problems. Furthermore, this optimality analysis technique applies to optimality analysis of costs or right-hand-side elements in continuous linear programs and optimality analysis of costs in mixed of pure integer linear programs. A system which automates this process for supply chain optimization at PFS Logistics Consulting is also detailed, along with description of its application and impact in their daily operations

    A new pooling strategy for high-throughput screening: the Shifted Transversal Design

    Get PDF
    BACKGROUND: In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem. RESULTS: In this paper, we present a new non-adaptive combinatorial pooling design: the "shifted transversal design" (STD). It relies on arithmetics, and rests on two intuitive ideas: minimizing the co-occurrence of objects, and constructing pools of constant-sized intersections. We prove that it allows unambiguous decoding of noisy experimental observations. This design is highly flexible, and can be tailored to function robustly in a wide range of experimental settings (i.e., numbers of objects, fractions of positives, and expected error-rates). Furthermore, we show that our design compares favorably, in terms of efficiency, to the previously described non-adaptive combinatorial pooling designs. CONCLUSION: This method is currently being validated by field-testing in the context of yeast-two-hybrid interactome mapping, in collaboration with Marc Vidal's lab at the Dana Farber Cancer Institute. Many similar projects could benefit from using the Shifted Transversal Design

    Statistical Mutation Calling from Sequenced Overlapping DNA Pools in TILLING Experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>TILLING (Targeting induced local lesions IN genomes) is an efficient reverse genetics approach for detecting induced mutations in pools of individuals. Combined with the high-throughput of next-generation sequencing technologies, and the resolving power of overlapping pool design, TILLING provides an efficient and economical platform for functional genomics across thousands of organisms.</p> <p>Results</p> <p>We propose a probabilistic method for calling TILLING-induced mutations, and their carriers, from high throughput sequencing data of overlapping population pools, where each individual occurs in two pools. We assign a probability score to each sequence position by applying Bayes' Theorem to a simplified binomial model of sequencing error and expected mutations, taking into account the coverage level. We test the performance of our method on variable quality, high-throughput sequences from wheat and rice mutagenized populations.</p> <p>Conclusions</p> <p>We show that our method effectively discovers mutations in large populations with sensitivity of 92.5% and specificity of 99.8%. It also outperforms existing SNP detection methods in detecting real mutations, especially at higher levels of coverage variability across sequenced pools, and in lower quality short reads sequence data. The implementation of our method is available from: <url>http://www.cs.ucdavis.edu/filkov/CAMBa/</url>.</p

    Process development using oscillatory baffled mesoreactors

    Get PDF
    PhD ThesisThe mesoscale oscillatory baffled reactor (meso-OBR) is a flow chemistry platform whose niche is the ability to convert long residence time batch processes to continuous processes. This reactor can rapidly screen reaction kinetics or optimise a reaction in flow with minimal waste. In this work, several areas were identified that could be addressed to broaden the applicability of this platform. Four main research themes were subsequently formulated and explored: (I) development of deeper understanding of the fluid mechanics in meso-OBRs, (II) development of a new hybrid heat pipe meso-OBR for improved thermal management, (III) further improvement of continuous screening using meso-OBRs by removing the solvent and employing better experiment design methodologies, and (IV) exploration of 3D printing for rapid reactor development. I. The flow structures in a meso-OBR containing different helical baffle geometries were studied using computational fluid dynamics simulations, validated by particle image velocimetry (PIV) experiments for the first time. It was demonstrated, using new quantification methods for the meso-OBR, that when using helical baffles swirling is responsible for providing a wider operating window for plug flow than other baffle designs. Further, a new flow regime resembling a Taylor-Couette flow was discovered that further improved the plug flow response. This new double vortex regime could conceivably improve multiphase mixing and enable flow measurements (e.g. using thermocouples inside the reactor) to be conducted without degrading the mixing condition. This work also provides a new framework for validating simulated OBR flows using PIV, by quantitatively comparing turbulent flow features instead of qualitatively comparing average velocity fields. II. A new hybrid heat pipe meso-OBR (HPOBR) was prototyped to provide better thermal control of the meso-OBR by exploiting the rapid and isothermal properties of the heat pipe. This new HPOBR was compared with a jacketed meso-OBR (JOBR) for the thermal control of an exothermic imination reaction conducted without a solvent. Without a solvent or thermal control scheme, this reaction exceeded the boiling point of one of the reactants. A central composite experiment design explored the effects of reactant net flow rate, oscillation intensity and cooling capacity on the thermal and chemical response of the reaction. The HPOBR was able to passively control the temperature below the boiling point of the reactant at all conditions through heat spreading. Overall, a combined 260-fold improvement in throughput was demonstrated compared to a reactor requiring the use of a solvent. Thus, this ii wholly new reactor design provides a new approach to achieving green chemistry that could be theoretically easily adapted to other reactions. III. Analysis of in situ Fourier transform infrared (FTIR) spectroscopic data also suggested that the reaction kinetics of this solventless imination case study could be screened for the first time using the HPOBR and JOBR. This was tested by applying flow-screening protocols that adjusted the reactant molar ratio, residence time, and temperature in a single flow experiment. Both reactor configurations were able to screen the Arrhenius kinetics parameters (pre-exponential factors, activation energies, and equilibrium constants) of both steps of the imination reaction. By defining experiment conditions using design of experiments (DoE) methodologies, a theoretical 70+% reduction in material usage/time requirement for screening was achieved compared to the previous state-of-the-art screening using meso-OBRs in the literature. Additionally, it was discovered that thermal effects on the reaction could be inferred by changing other operating conditions such as molar ratio and residence time. This further simplifies the screening protocols by eliminating the need for active temperature control strategies (such as a jacket). IV. Finally, potential application areas for further development of the meso-OBR platform using 3D printing were devised. These areas conformed to different “hierarchies” of complexity, from new baffle structures (simplest) to entirely new methods for achieving mixing (most complex). This latter option was adopted as a case study, where the passively generated pulsatile flows of fluidic oscillators were tested for the first time as a means for improving plug flow. Improved plug flow behaviour was indeed demonstrated in three different standard reactor geometries (plain, baffled and coiled tubes), where it could be inferred that axial dispersion was decoupled from the secondary flows in an analogous manner to the OBR. The results indicate that these devices could be the basis for a new flow chemistry platform that requires no moving parts, which would be appealing for various industrial applications. It is concluded that, for the meso-OBR platform to remain relevant in the next era of tailor-made reactors (with rapid uptake of 3D printing), the identified areas where 3D printing could benefit the meso-OBR should be further explored

    Logical analysis of sample pooling for qualitative analytical testing

    Get PDF
    When the prevalence of positive samples in a whole population is low, the pooling of samples to detect them has been widely used for epidemic control. However, its usefulness for applying analytical screening procedures in food safety (microbiological or allergen control), fraud detection or environmental monitoring is also evident. The expected number of tests per individual sample that is necessary to identify all ‘positives’ is a measure of the efficiency of a sample pooling strategy. Reducing this figure is key to an effective use of available resources in environmental control and food safety. This reduction becomes critical when the availability of analytical tests is limited, as the SARS-CoV-2 pandemic showed. The outcome of the qualitative analytical test is binary. Therefore, the operation governing the outcome of the pooled samples is not an algebraic sum of the individual results but the logical operator (‘or’ in natural language). Consequently, the problem of using pooled samples to identify positive samples naturally leads to proposing a system of logical equations. Therefore, this work suggests a new strategy of sample pooling based on: i) A half-fraction of a Placket-Burman design to make the pooled samples and ii) The logical resolution, not numerical, to identify the positive samples from the outcomes of the analysis of the pooled samples. For a prevalence of ‘positive’ equal to 0.05 and 10 original samples to be pooled, the algorithm presented here results in an expected value per individual equal to 0.37, meaning a 63% reduction in the expected number of tests per individual sample. With sensitivities and specificities of the analytical test ranging from 0.90 to 0.99, the expected number of tests per individual ranges from 0.332 to 0.416, always higher than other pooled testing algorithms. In addition, the accuracy of the algorithm proposed is better or similar to that of other published algorithms, with an expected number of hits ranging from 99.16 to 99.90%. The procedure is applied to the detection of food samples contaminated with a pathogen (Listeria monocytogenes) and others contaminated with an allergen (Pistachio) by means of Polymerase Chain Reaction, PCR, test.This work was supported by Consejería de Educación de la Junta de Castilla y León through project BU052P20 co-financed with European Regional Development Funds. The authors thank Dr. Laura Rubio for applying the double-blind protocol to dope the samples and AGROLAB S.L.U, Burgos (Spain) for the careful preparation of the pooled samples

    Identification of crop cultivars with consistently high lignocellulosic sugar release requires the use of appropriate statistical design and modelling

    Get PDF
    Background In this study, a multi-parent population of barley cultivars was grown in the field for two consecutive years and then straw saccharification (sugar release by enzymes) was subsequently analysed in the laboratory to identify the cultivars with the highest consistent sugar yield. This experiment was used to assess the benefit of accounting for both the multi-phase and multi-environment aspects of large-scale phenotyping experiments with field-grown germplasm through sound statistical design and analysis. Results Complementary designs at both the field and laboratory phases of the experiment ensured that non-genetic sources of variation could be separated from the genetic variation of cultivars, which was the main target of the study. The field phase included biological replication and plot randomisation. The laboratory phase employed re-randomisation and technical replication of samples within a batch, with a subset of cultivars chosen as duplicates that were randomly allocated across batches. The resulting data was analysed using a linear mixed model that incorporated field and laboratory variation and a cultivar by trial interaction, and ensured that the cultivar means were more accurately represented than if the non-genetic variation was ignored. The heritability detected was more than doubled in each year of the trial by accounting for the non-genetic variation in the analysis, clearly showing the benefit of this design and approach. Conclusions The importance of accounting for both field and laboratory variation, as well as the cultivar by trial interaction, by fitting a single statistical model (multi-environment trial, MET, model), was evidenced by the changes in list of the top 40 cultivars showing the highest sugar yields. Failure to account for this interaction resulted in only eight cultivars that were consistently in the top 40 in different years. The correspondence between the rankings of cultivars was much higher at 25 in the MET model. This approach is suited to any multi-phase and multi-environment population-based genetic experiment

    Locating Arrays: Construction, Analysis, and Robustness

    Get PDF
    abstract: Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough experimentation on all of them is impossible. An initial screening step is thus necessary to determine which factors are relevant to the system's performance and which factors can be eliminated from experimentation. Factors may impact system performance in different ways. A factor at a specific level may significantly affect performance as a main effect, or in combination with other main effects as an interaction. For screening, it is necessary both to identify the presence of these effects and to locate the factors responsible for them. A locating array is a relatively new experimental design that causes every main effect and interaction to occur and distinguishes all sets of d main effects and interactions from each other in the tests where they occur. This design is therefore helpful in screening complex systems. The process of screening using locating arrays involves multiple steps. First, a locating array is constructed for all possibly significant factors. Next, the system is executed for all tests indicated by the locating array and a response is observed. Finally, the response is analyzed to identify the significant system factors for future experimentation. However, simply constructing a reasonably sized locating array for a large system is no easy task and analyzing the response of the tests presents additional difficulties due to the large number of possible predictors and the inherent imbalance in the experimental design itself. Further complications can arise from noise in the system or errors in testing. This thesis has three contributions. First, it provides an algorithm to construct locating arrays using the Lovász Local Lemma with Moser-Tardos resampling. Second, it gives an algorithm to analyze the system response efficiently. Finally, it studies the robustness of the analysis to the heavy-hitters assumption underlying the approach as well as to varying amounts of system noise.Dissertation/ThesisMasters Thesis Computer Engineering 201
    corecore