1,222 research outputs found

    Alignment uncertainty, regressive alignment and large scale deployment

    Get PDF
    A multiple sequence alignment (MSA) provides a description of the relationship between biological sequences where columns represent a shared ancestry through an implied set of evolutionary events. The majority of research in the field has focused on improving the accuracy of alignments within the progressive alignment framework and has allowed for powerful inferences including phylogenetic reconstruction, homology modelling and disease prediction. Notwithstanding this, when applied to modern genomics datasets - often comprising tens of thousands of sequences - new challenges arise in the construction of accurate MSA. These issues can be generalised to form three basic problems. Foremost, as the number of sequences increases, progressive alignment methodologies exhibit a dramatic decrease in alignment accuracy. Additionally, for any given dataset many possible MSA solutions exist, a problem which is exacerbated with an increasing number of sequences due to alignment uncertainty. Finally, technical difficulties hamper the deployment of such genomic analysis workflows - especially in a reproducible manner - often presenting a high barrier for even skilled practitioners. This work aims to address this trifecta of problems through a web server for fast homology extension based MSA, two new methods for improved phylogenetic bootstrap supports incorporating alignment uncertainty, a novel alignment procedure that improves large scale alignments termed regressive MSA and finally a workflow framework that enables the deployment of large scale reproducible analyses across clusters and clouds titled Nextflow. Together, this work can be seen to provide both conceptual and technical advances which deliver substantial improvements to existing MSA methods and the resulting inferences.Un alineament de seqüència múltiple (MSA) proporciona una descripció de la relació entre seqüències biològiques on les columnes representen una ascendència compartida a través d'un conjunt implicat d'esdeveniments evolutius. La majoria de la investigació en el camp s'ha centrat a millorar la precisió dels alineaments dins del marc d'alineació progressiva i ha permès inferències poderoses, incloent-hi la reconstrucció filogenètica, el modelatge d'homologia i la predicció de malalties. Malgrat això, quan s'aplica als conjunts de dades de genòmica moderns, que sovint comprenen desenes de milers de seqüències, sorgeixen nous reptes en la construcció d'un MSA precís. Aquests problemes es poden generalitzar per formar tres problemes bàsics. En primer lloc, a mesura que augmenta el nombre de seqüències, les metodologies d'alineació progressiva presenten una disminució espectacular de la precisió de l'alineació. A més, per a un conjunt de dades, existeixen molts MSA com a possibles solucions un problema que s'agreuja amb un nombre creixent de seqüències a causa de la incertesa d'alineació. Finalment, les dificultats tècniques obstaculitzen el desplegament d'aquests fluxos de treball d'anàlisi genòmica, especialment de manera reproduïble, sovint presenten una gran barrera per als professionals fins i tot qualificats. Aquest treball té com a objectiu abordar aquesta trifecta de problemes a través d'un servidor web per a l'extensió ràpida d'homologia basada en MSA, dos nous mètodes per a la millora de l'arrencada filogenètica permeten incorporar incertesa d'alineació, un nou procediment d'alineació que millora els alineaments a gran escala anomenat MSA regressivu i, finalment, un marc de flux de treball permet el desplegament d'anàlisis reproduïbles a gran escala a través de clústers i computació al núvol anomenat Nextflow. En conjunt, es pot veure que aquest treball proporciona tant avanços conceptuals com tècniques que proporcionen millores substancials als mètodes MSA existents i les conseqüències resultants

    An investigation of the airflow in mushroom growing structures, the development of an improved, three-dimensional solution technique for fluid flow and its evaluation for the modelling of mushroom growing structures

    Get PDF
    This thesis is an examination of the airflows in mushroom growing rooms. An experimental investigation of the nature of the flows in Irish tunnels showed them to be of low magnitude at the crop but controllable in principle for single layer growing. It was found that stratification of the airflow in growing tunnels could cause severe reductions in cropping surface airspeed and the operation of the heating system was identified as the main source of this. An alternative air distribution system was shown to have the potential to overcome the effects of heating. Airflow for three level growing systems in tunnels was found to be non-uniform and the use of wall-mounted deflecting plates was shown to have the potential to correct this. The provision of air flow solutions for the wide range of new growing systems would be difficult using empirical methods alone and therefore a modelling approach was sought to complement and aid the experimental work. The initial modelling work was carried out in two dimensions with TEACH-T code (SIMPLE flow solver) to calculate the turbulent flow. The code was extended to three dimensions because it was not possible to model usefully in a two-dimensional approximation. Convergence times for the SIMPLE solver were found to be excessively long. Trial applications of multi-level acceleration produced approximately 15% savings in computational effort so a new solver was investigated. The CELS (Coupled Equation Line Solver) method had been reported as superior to SIMPLE in two dimensions and already has a multi-level technique to accelerate convergence, i.e. Additive Correction Multigrid (ACM). CELS was first applied in two dimensions in order to test its usefulness with the turbulence model in the equation set. Improvements in the time to convergence, relative to SIMPLE, justified its extension to three dimensions. The Additive Correction Multi grid technique also produced significant improvements and this was extended to three dimensions. CELS3D is essentially a plane solver applied to a three-dimensional grid and a number of procedures for its application were investigated. All produced savings relative to the SIMPLE solver. The QUICK differencing scheme was incorporated in the TEACH-based code and CELS3D was tested with various geometries and values of the Reynolds number. The best results gave a 79% reduction in the time to convergence of the solver. The ACM technique in three dimensions was investigated but no useful savings in computational effort were made. In the application to mushroom growing structures, the principles of the application of CELS3D to flows around obstructions in the flow domain were examined and the difficulties identified. A solution was found but its implementation proved impractical for all but the simplest cases
    • …
    corecore