46 research outputs found

    cDNA2Genome: A tool for mapping and annotating cDNAs

    Get PDF
    BACKGROUND: In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. RESULTS: cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. CONCLUSIONS: cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs

    Bioinformatic workflows : G-PIPE as an implementation

    Full text link
    We present G-PIPE, a graphic pipeline generator for PISE that allows the definition of pipelines, parameterization of its component methods, and storage of metadata in XML formats. Our implementation goes beyond macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. We also discuss the role of ontologies as as guidance systems in order to provide users with the possibility to define abstract work-flows, and execute them. A relevant baseline ontology is presented. Availability: http://if-web.imb.uq.edu.a

    GOPET: A tool for automated predictions of Gene Ontology terms

    Get PDF
    BACKGROUND: Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. DESCRIPTION: We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO). Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool). It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via CONCLUSION: Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user

    Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator

    Get PDF
    BACKGROUND: Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. RESULTS: We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability: (interactive), (download). CONCLUSION: From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools

    ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ

    Get PDF
    The wealth of transcript information that has been made publicly available in recent years has led to large pools of individual web sites offering access to bioinformatics software. However, finding out which services exist, what they can or cannot do, how to use them and how to feed results from one service to the next one in the right format can be very time and resource consuming, especially for non-experts

    Learning to Map Natural Language to Executable Programs Over Databases

    Get PDF
    Natural language is a fundamental form of information and communication and is becoming the next frontier in computer interfaces. As the amount of data available online has increased exponentially, so has the need for Natural Language Interfaces (NLIs, which is not used for natural language inference in this thesis) to connect the data and the user by easily using natural language, significantly promoting the possibility and efficiency of information access for many users besides data experts. All consumer-facing software will one day have a dialogue interface, and this is the next vital leap in the evolution of search engines. Such intelligent dialogue systems should understand the meaning of language grounded in various contexts and generate effective language responses in different forms for information requests and human-computer communication.Developing these intelligent systems is challenging due to (1) limited benchmarks to drive advancements, (2) alignment mismatches between natural language and formal programs, (3) lack of trustworthiness and interpretability, (4) context dependencies in both human conversational interactions and the target programs, and (5) joint language understanding between dialog questions and NLI environments (e.g. databases and knowledge graphs). This dissertation presents several datasets, neural algorithms, and language models to address these challenges for developing deep learning technologies for conversational natural language interfaces (more specifically, NLIs to Databases or NLIDB). First, to drive advancements towards neural-based conversational NLIs, we design and propose several complex and cross-domain NLI benchmarks, along with introducing several datasets. These datasets enable training large, deep learning models. The evaluation is done on unseen databases. (e.g., about course arrangement). Systems must generalize well to not only new SQL queries but also to unseen database schemas to perform well on these tasks. Furthermore, in real-world applications, users often access information in a multi-turn interaction with the system by asking a sequence of related questions. The users may explicitly refer to or omit previously mentioned entities and constraints and may introduce refinements, additions, or substitutions to what has already been said. Therefore, some of them require systems to model dialog dynamics and generate natural language explanations for user verification. The full dialogue interaction with the system’s responses is also important as this supports clarifying ambiguous questions, verifying returned results, and notifying users of unanswerable or unrelated questions. A robust dialogue-based NLI system that can engage with users by forming its responses has thus become an increasingly necessary component for the query process. Moreover, this thesis presents the development of scalable algorithms designed to parse complex and sequential questions to formal programs (e.g., mapping questions to SQL queries that can execute against databases). We propose a novel neural model that utilizes type information from knowledge graphs to better understand rare entities and numbers in natural language questions. We also introduce a neural model based on syntax tree neural networks, which was the first methodology proposed for generating complex programs from language. Finally, language modeling creates contextualized vector representations of words by training a model to predict the next word given context words, which are the basis of deep learning for NLP. Recently, pre-trained language models such as BERT and RoBERTa achieve tremendous success in many natural language processing tasks such as text understanding and reading comprehension. However, most language models are pre-trained only on free-text such as Wikipedia articles and Books. Given that language in semantic parsing is usually related to some formal representations such as logic forms and SQL queries and has to be grounded in structural environments (e.g., databases), we propose better language models for NLIs by enforcing such compositional interpolation in them. To show they could better jointly understand dialog questions and NLI environments (e.g. databases and knowledge graphs), we show that these language models achieve new state-of-the-art results for seven representative tasks on semantic parsing, dialogue state tracking, and question answering. Also, our proposed pre-training method is much more effective than other prior work

    Motivational techniques that aid drivers to choose unselfish routes

    Get PDF
    指導教員:角 

    Interaction of multiphase fluids and solids: theory, algorithms and applications

    Get PDF
    Programa Oficial de Doutoramento en Enxeñaría Civil. 5011V01Tesis por compendio de publicación[Abstract] The work presented in this thesis is devoted to the study and numerical simulation of Fluid-Structure Interaction (FSI) problems involving complex uids. The nonlinear and time dependent nature of FSI problems makes the analytical solution very difficult or even impossible to obtain, requiring the use of experimental analysis and/or numerical simulations. This fact has prompted the development of a great variety of numerical models for the interaction of uids and solid structures. However, most of the efforts have been focused on classical uids governed by the Navier-Stokes equations, which cannot capture the physical mechanisms behind complex uids. Here, we try to fill this gap by proposing several models for the interplay of solids and multi-phase or multi-component ows. The proposed models are then applied to particular problems that spark interest in fields, such as engineering, microfabrication and chemistry. In this work, the behavior of the structure is described by the nonlinear equations of elastodynamics and treated as an hyperelastic solid. Two different constitutive theories are employed, a Neo-Hookean model with dilatational penalty and a Saint Venant-Kirchhoff model. The description of complex uids is based on the diffuse-interface or phase-field method. In particular, two approaches are adopted. The first one is based on the Navier- Stokes-Korteweg equations, which describe compressible uids that are composed by two phases of the same component that may undergo phase transformation, such as water vapor and liquid water. We use this model to study the in uence of surface active agents in droplet coalescence and show that droplet motion may be driven by strain gradients -tensotaxis- of the underlying substrate. We also show several problems of phase-changedriven implosion, in which a thin structure collapses due to the condensation of a uid. The second approach is based on the Cahn-Hilliard model, which we couple with the incompressible Navier-Stokes equations. We adopt an stabilization based on the residualbased variational multiscale formulation. This results in a model that describes twocomponent immiscible ows with surface tension. The potential of this model is illustrated by solving several elastocapillary problems in two and three dimensions including capillary origami, the static wetting of soft substrates and the deformation of micropillars As FSI technique, we adopt a moving mesh or boundary-fitted approach with matching discretization at the uid-structure interface. This choice permits to strongly impose the kinematic compatibility conditions and results in more accurate solutions at the uid-solid interface. In particular, we use the Lagrangian description to derive the semi-discrete form of the solid equations and the Arbitrary Lagrangian-Eulerian (ALE) description for the uid domain. This means that the uid mesh needs to be updated to accommodate the motion of the structure. For this purpose, we solve an additional linear elasticity problem subject to displacement boundary conditions coming from the motion of the solid. For the spatial discretization of the solid and uid domains, we adopt Isogeometric Analysis (IGA) based on Non-Uniform Rational B-Splines (NURBS), a generalization of the finite-element method that posseses higher-order global continuity and allows for a more precise geometric representation of complex objects. Regarding the time integration, we use a generalized-[alfa] scheme. The nonlinear system of equations is solved using a Newton-Raphson iteration procedure, which leads to a two-stage predictor-multicorrector algorithm. The resulting linear system is solved using a preconditioned GMRES method. A quasi-direct monolithic formulation is adopted for the solution of the FSI problem, that is, the fluid and solid equations are solved in a coupled fashion, while the mesh motion is solved separately using as input, data from the fluid-solid solve.[Resumen] El trabajo presentado en esta tesis está destinado al estudio y simulación numérica de problemas de interacción fluido-estructura (FSI de sus siglas en inglés) que involucran fluidos complejos. La naturaleza no lineal y dependiente del tiempo de lm; problemas FSI hace que su solución analítica sea muy difícil o incluso imposible de obtener, requiriendo el uso del análisis experimental yjo de simulaciones numéricas. Este hecho ha impulsado el desarrollo de una gran variedad de modelos numéricos para la interacción de fluidos y estructuras sólidas. Sin embargo, la mayoría de los esfuerzos se han centrado en fluidos clásicos gobernados por las ecuaciones de Na.vier-Stokes, las cuales no so11 capaces de capturar los mecanismos físicos detrás de los fluidos complejos. En este trabajo, intentamos rellenar ese hueco proponiendo varios modelos para la interacción de sólidos y fluidos multifase y multicomponente. Los modelos propuestos son aplicados a problemas particulares que desatan gran interés en campos como la ingeniería, la microfabricación y la química. En este trabajo. el comportamiento de la estructura está descrito por las ecuaciones de la elastodinámica no lineal y es tratado como un sólido hiperelástico. Se emplean dos teorías constitutivas diferentes, un modelo Neo-Hookeauo y un modelo Saint Venant. La descripción de los fluídos complejos está basada en el método de los campos de fase o método de interfaz difusa. En concreto, se adoptan dos técnicas diferentes. La primera se basa en las ecuaciones de Navier-Stokes-Korteweg, las cuales describen fluidos compresibles que están compuestos por dos fases de un mismo componente como, por ejemplo, agua líquida y vapor de agua. Usamos este modelo para estudiar el papel de los tensoactivos en la coalescencia de gotas y mostrar que el movimiento de gota.." puede desencadenarse por gradientes de deformación -tensotaxis- del substrato en el que se apoyan. Mostramos también varios ejemplos de implosión accionada por cambios de fase. en la cual una estructura delgada colapsa debido a la condensación de un fluido. La segunda técnica se basa en el modelo de Cahn-Hilliard, el cual acoplamos con las ecuaciones de Navicr-Stokes incompresibles. En este modelo adoptamos una estabilización basada en la formulación variacional multiescala. Esto resulta en un modelo que describe flujos inmiscibles de dos componentes con tensión superficial. Ilustramos el potencial de este modelo resolviendo varios problemas de elastocapilaridad en dos y tres dimensiones incluyendo origamis por capilaridad, la deformación estática de substratos blandos con gotas o la deformación de micropilares. Corno técnica FSI, adoptamos un método de malla móvil con discretización compatible en la interfaz sólido-fluido. Esta elección permite imponer de forma fuerte las condiciones de compatibilidad cinemática y da lugar a resultados más precisos cerca de la interfaz sólido-fluido. En concreto, usamos una descripción Lagrangiana para derivar la forma semidiscreta de las ecuaciones del sólido y una descripción Arbitraria LagrangianaEuleriana (ALE) para el dominio del fluído. Esto significa que la malla del fluido tiene que ser actualizada para acomodar el movimiento de la estructura. Con este propósito resolvemos un problema adicional de elasticidad lineal en el que las condiciones de contorno son los desplazamientos procedentes del movimiento del sólido. Para la discretización espacial tanto del dominio del sólido como del fluído, adoptamos Análisis Isogeométrico (IGA) basado en E-Splines racionales no uniformes (NURBS), una generalización del método de elementos finitos que posee continuidad global de alto orden y que permite una representación geométrica más precisa de objetos complejos. En lo que respecta a la integración temporal, usamos un esquema alfa generalizado. El sistema no lineal de ecuaciones se resuelve usando un método de Newton-Raphson iterativo, que lleva a un algoritmo de dos fases predictor-multicorrector. El sistema lineal resultante es resuelto mediante un método GtviRES precondicionado. Se adopta además una formulación monolítica para la solución del problema FSI, esto es, las ecuaciones del fluido y del sólido se resuelven de manera acoplada mientras que el movimiento de la malla se resuelve separadamente, usando como input los datos del resolvedor sólido-fluido.[Resumo] O traballo presentado nesta tese e~tá destinado ó estudo e simulación numérica de problemas de interacción fluído-estrutura (FSI nas súas siglas en inglés) que involucran fluídos complexos. A natureza non lineal e dependente do tempo deste tipo de problemas fai que a súa solución analítica sexa moi difícil ou mesmo imposible de conseguir, esixindo o uso de análiscs experimentais e / ou simulacións numéricas. Este feíto levou ó desenvolvemento dunha gran variedade de modelos numéricos para a interaccióu de fluídos e estruturas sólidas. Con todo, a maioría dos esforzos concentráronse en fluídos clásicos gobernados palas ecuacións de N avier-Stokes. as cales non son capaces de capturar os mecanismos físicos detrás dos fluídos complexos. N esta tese tratamos de encher este burato propoñendo modelos para a interacción de sólidos e líquidos multifase e multicompoñente. Os modelos propostos son aplicados a problemas específicos que espertan gran interese en campos como a enxeñería, a microfabricación e a química. Neste trahallo o comportamento da estrutura está descrito polas ecuacións da clastodinámica non lineal e é tratada como un sólido hiperelástico. Usamos dúas teorías constitutivas diferentes: un modelo Neo-Hookeano e un modelo Saint Venant. A descrición do fluído complexo baséase no método de campos de fase ou método de interfaz difusa. En concreto, adóptanse dúas técnicas diferentes. A primeira baséase nas ecuacións de Navier-Stokes-Korteweg, que describen fluídos compresibles que están compostos de dúas fases dun único compoiiente, por exemplo, auga líquida e vapor de auga. Utilizamos este modelo para estudar o papel dos axentes tensoactivos na coalescencia de gotas e demostrar que o movemcnto de gotas pode ser desencadeado mediante gradientes de deformación do substrato no que se apoian -tensotaxe-. Tamén se mostran varios exemplos de implosión inducida por cambios de fase, no que unha estrutura fina colapsa pala condensación dun fluído. A segunda técnica está bascada no modelo de Cahn-Hilliard, o calé acoplado coas ccuacións de Navier-Stokes. Neste modelo adoptamos unha estabilización baseada na formulación variacional multiscala. Isto resulta mm modelo que describe fluxos inmiscibles de dous compoñentes con tensión superficial. Ilustrarnos o potencial deste modelo resolvendo varios problemas de elastocapilaridade en dúa.'i e tres dimensións, incluíndo origamis por capilaridade, a deformación estática de substratos brandos con gotas ou a deformación de micro pilares. Como técnica FSI, adoptamos un método de malla móbil con discretización compatible na interfaz sólido-líquido. Esta elección permite impoñer de xeito forte as condicións de compatibilidade cinemática e da lugar a resultados máis precisos preto da interfaz sólido-fluído. En concreto, usamos tmha descrición Lagranxiana para derivar a forma semidiscreta das ecuacións do sólido e unha descrición Arbitraria Lagranxiana-Euleriana (ALE) para o dominio do fluído. Isto quere dicir que a malla do fluído ten que ser actualizada para acomodar o movcmento da estrutura. Para iso. resolvemos un problema adicional de elasticidade lineal no que as condicións de contorno son os desprazamcutos procedentes do movemento do sólido. Para a discretizacióu espacial tanto do dominio do sólido coma do Huído adoptamos Análise lsoxeométrica (IGA) baseada en E-splines non uniformes (NURBS). unha xeueralización do método de elementos finitos que posúc contiuuidade global de alta ordP e que permite unha representación máis precisa de obxectos complexos. No que respecta á integración temporal, usamos un esquema alfa xeneralizado. O sistema de ccuacións non lineais é resalto a través dun método de Newton-Raphsou iterativo quP da lugar a un algoritmo preditor-multicorrector. O sistema lineal resultante é tratado mediante un método Gl\IRES precondicionado. Ademais, adoptamos unha formulación monolítica para o problema FSI, é dicir, as ecuacións do ftuído e do sólido son resaltas de xcito acoplado mentres que o movemento da malla se resolve por separado, utilizando como input os datos do resolvedor sólido-ftuído
    corecore