46 research outputs found
cDNA2Genome: A tool for mapping and annotating cDNAs
BACKGROUND: In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process. RESULTS: cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis. CONCLUSIONS: cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs
Bioinformatic workflows : G-PIPE as an implementation
We present G-PIPE, a graphic pipeline generator for PISE that allows the definition of pipelines, parameterization of its component methods, and storage of metadata in XML formats. Our implementation goes beyond macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. We also discuss the role of ontologies as as guidance systems in order to provide users with the possibility to define abstract work-flows, and execute them. A relevant baseline ontology is presented. Availability: http://if-web.imb.uq.edu.a
GOPET: A tool for automated predictions of Gene Ontology terms
BACKGROUND: Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. DESCRIPTION: We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO). Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool). It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via CONCLUSION: Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator
BACKGROUND: Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. RESULTS: We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability: (interactive), (download). CONCLUSION: From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools
ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ
The wealth of transcript information that has been made publicly available in recent years has led to large pools of individual web sites offering access to bioinformatics software. However, finding out which services exist, what they can or cannot do, how to use them and how to feed results from one service to the next one in the right format can be very time and resource consuming, especially for non-experts
Learning to Map Natural Language to Executable Programs Over Databases
Natural language is a fundamental form of information and communication and is becoming the next frontier in computer interfaces. As the amount of data available online has increased exponentially, so has the need for Natural Language Interfaces (NLIs, which is not used for natural language inference in this thesis) to connect the data and the user by easily using natural language, significantly promoting the possibility and efficiency of information access for many users besides data experts. All consumer-facing software will one day have a dialogue interface, and this is the next vital leap in the evolution of search engines. Such intelligent dialogue systems should understand the meaning of language grounded in various contexts and generate effective language responses in different forms for information requests and human-computer communication.Developing these intelligent systems is challenging due to (1) limited benchmarks to drive advancements, (2) alignment mismatches between natural language and formal programs, (3) lack of trustworthiness and interpretability, (4) context dependencies in both human conversational interactions and the target programs, and (5) joint language understanding between dialog questions and NLI environments (e.g. databases and knowledge graphs). This dissertation presents several datasets, neural algorithms, and language models to address these challenges for developing deep learning technologies for conversational natural language interfaces (more specifically, NLIs to Databases or NLIDB). First, to drive advancements towards neural-based conversational NLIs, we design and propose several complex and cross-domain NLI benchmarks, along with introducing several datasets. These datasets enable training large, deep learning models. The evaluation is done on unseen databases. (e.g., about course arrangement). Systems must generalize well to not only new SQL queries but also to unseen database schemas to perform well on these tasks. Furthermore, in real-world applications, users often access information in a multi-turn interaction with the system by asking a sequence of related questions. The users may explicitly refer to or omit previously mentioned entities and constraints and may introduce refinements, additions, or substitutions to what has already been said. Therefore, some of them require systems to model dialog dynamics and generate natural language explanations for user verification. The full dialogue interaction with the system’s responses is also important as this supports clarifying ambiguous questions, verifying returned results, and notifying users of unanswerable or unrelated questions. A robust dialogue-based NLI system that can engage with users by forming its responses has thus become an increasingly necessary component for the query process. Moreover, this thesis presents the development of scalable algorithms designed to parse complex and sequential questions to formal programs (e.g., mapping questions to SQL queries that can execute against databases). We propose a novel neural model that utilizes type information from knowledge graphs to better understand rare entities and numbers in natural language questions. We also introduce a neural model based on syntax tree neural networks, which was the first methodology proposed for generating complex programs from language. Finally, language modeling creates contextualized vector representations of words by training a model to predict the next word given context words, which are the basis of deep learning for NLP. Recently, pre-trained language models such as BERT and RoBERTa achieve tremendous success in many natural language processing tasks such as text understanding and reading comprehension. However, most language models are pre-trained only on free-text such as Wikipedia articles and Books. Given that language in semantic parsing is usually related to some formal representations such as logic forms and SQL queries and has to be grounded in structural environments (e.g., databases), we propose better language models for NLIs by enforcing such compositional interpolation in them. To show they could better jointly understand dialog questions and NLI environments (e.g. databases and knowledge graphs), we show that these language models achieve new state-of-the-art results for seven representative tasks on semantic parsing, dialogue state tracking, and question answering. Also, our proposed pre-training method is much more effective than other prior work
Interaction of multiphase fluids and solids: theory, algorithms and applications
Programa Oficial de Doutoramento en Enxeñaría Civil. 5011V01Tesis por compendio de publicación[Abstract]
The work presented in this thesis is devoted to the study and numerical simulation of
Fluid-Structure Interaction (FSI) problems involving complex
uids. The nonlinear and
time dependent nature of FSI problems makes the analytical solution very difficult or
even impossible to obtain, requiring the use of experimental analysis and/or numerical
simulations. This fact has prompted the development of a great variety of numerical
models for the interaction of
uids and solid structures. However, most of the efforts have
been focused on classical
uids governed by the Navier-Stokes equations, which cannot
capture the physical mechanisms behind complex
uids. Here, we try to fill this gap by
proposing several models for the interplay of solids and multi-phase or multi-component
ows. The proposed models are then applied to particular problems that spark interest in
fields, such as engineering, microfabrication and chemistry.
In this work, the behavior of the structure is described by the nonlinear equations of
elastodynamics and treated as an hyperelastic solid. Two different constitutive theories are
employed, a Neo-Hookean model with dilatational penalty and a Saint Venant-Kirchhoff
model. The description of complex
uids is based on the diffuse-interface or phase-field
method. In particular, two approaches are adopted. The first one is based on the Navier-
Stokes-Korteweg equations, which describe compressible
uids that are composed by two
phases of the same component that may undergo phase transformation, such as water
vapor and liquid water. We use this model to study the in
uence of surface active agents
in droplet coalescence and show that droplet motion may be driven by strain gradients
-tensotaxis- of the underlying substrate. We also show several problems of phase-changedriven
implosion, in which a thin structure collapses due to the condensation of a
uid.
The second approach is based on the Cahn-Hilliard model, which we couple with the
incompressible Navier-Stokes equations. We adopt an stabilization based on the residualbased
variational multiscale formulation. This results in a model that describes twocomponent
immiscible
ows with surface tension. The potential of this model is illustrated
by solving several elastocapillary problems in two and three dimensions including capillary
origami, the static wetting of soft substrates and the deformation of micropillars As FSI technique, we adopt a moving mesh or boundary-fitted approach with matching
discretization at the
uid-structure interface. This choice permits to strongly impose the
kinematic compatibility conditions and results in more accurate solutions at the
uid-solid
interface. In particular, we use the Lagrangian description to derive the semi-discrete form
of the solid equations and the Arbitrary Lagrangian-Eulerian (ALE) description for the
uid domain. This means that the
uid mesh needs to be updated to accommodate the
motion of the structure. For this purpose, we solve an additional linear elasticity problem
subject to displacement boundary conditions coming from the motion of the solid.
For the spatial discretization of the solid and
uid domains, we adopt Isogeometric
Analysis (IGA) based on Non-Uniform Rational B-Splines (NURBS), a generalization
of the finite-element method that posseses higher-order global continuity and allows for a
more precise geometric representation of complex objects. Regarding the time integration,
we use a generalized-[alfa] scheme. The nonlinear system of equations is solved using a
Newton-Raphson iteration procedure, which leads to a two-stage predictor-multicorrector
algorithm. The resulting linear system is solved using a preconditioned GMRES method.
A quasi-direct monolithic formulation is adopted for the solution of the FSI problem, that
is, the
fluid and solid equations are solved in a coupled fashion, while the mesh motion is
solved separately using as input, data from the
fluid-solid solve.[Resumen]
El trabajo presentado en esta tesis está destinado al estudio y simulación numérica de
problemas de interacción fluido-estructura (FSI de sus siglas en inglés) que involucran
fluidos complejos. La naturaleza no lineal y dependiente del tiempo de lm; problemas FSI
hace que su solución analítica sea muy difícil o incluso imposible de obtener, requiriendo
el uso del análisis experimental yjo de simulaciones numéricas. Este hecho ha impulsado
el desarrollo de una gran variedad de modelos numéricos para la interacción de fluidos y
estructuras sólidas. Sin embargo, la mayoría de los esfuerzos se han centrado en fluidos
clásicos gobernados por las ecuaciones de Na.vier-Stokes, las cuales no so11 capaces de capturar
los mecanismos físicos detrás de los fluidos complejos. En este trabajo, intentamos
rellenar ese hueco proponiendo varios modelos para la interacción de sólidos y fluidos multifase
y multicomponente. Los modelos propuestos son aplicados a problemas particulares
que desatan gran interés en campos como la ingeniería, la microfabricación y la química.
En este trabajo. el comportamiento de la estructura está descrito por las ecuaciones
de la elastodinámica no lineal y es tratado como un sólido hiperelástico. Se emplean dos
teorías constitutivas diferentes, un modelo Neo-Hookeauo y un modelo Saint Venant. La
descripción de los fluídos complejos está basada en el método de los campos de fase o
método de interfaz difusa. En concreto, se adoptan dos técnicas diferentes. La primera se basa en las ecuaciones de Navier-Stokes-Korteweg, las cuales describen fluidos compresibles
que están compuestos por dos fases de un mismo componente como, por ejemplo, agua
líquida y vapor de agua. Usamos este modelo para estudiar el papel de los tensoactivos
en la coalescencia de gotas y mostrar que el movimiento de gota.." puede desencadenarse
por gradientes de deformación -tensotaxis- del substrato en el que se apoyan. Mostramos
también varios ejemplos de implosión accionada por cambios de fase. en la cual una estructura
delgada colapsa debido a la condensación de un fluido. La segunda técnica se
basa en el modelo de Cahn-Hilliard, el cual acoplamos con las ecuaciones de Navicr-Stokes
incompresibles. En este modelo adoptamos una estabilización basada en la formulación
variacional multiescala. Esto resulta en un modelo que describe flujos inmiscibles de dos
componentes con tensión superficial. Ilustramos el potencial de este modelo resolviendo
varios problemas de elastocapilaridad en dos y tres dimensiones incluyendo origamis por
capilaridad, la deformación estática de substratos blandos con gotas o la deformación de
micropilares.
Corno técnica FSI, adoptamos un método de malla móvil con discretización compatible
en la interfaz sólido-fluido. Esta elección permite imponer de forma fuerte las condiciones
de compatibilidad cinemática y da lugar a resultados más precisos cerca de la
interfaz sólido-fluido. En concreto, usamos una descripción Lagrangiana para derivar la
forma semidiscreta de las ecuaciones del sólido y una descripción Arbitraria LagrangianaEuleriana
(ALE) para el dominio del fluído. Esto significa que la malla del fluido tiene
que ser actualizada para acomodar el movimiento de la estructura. Con este propósito resolvemos
un problema adicional de elasticidad lineal en el que las condiciones de contorno
son los desplazamientos procedentes del movimiento del sólido.
Para la discretización espacial tanto del dominio del sólido como del fluído, adoptamos
Análisis Isogeométrico (IGA) basado en E-Splines racionales no uniformes (NURBS), una
generalización del método de elementos finitos que posee continuidad global de alto orden
y que permite una representación geométrica más precisa de objetos complejos. En lo
que respecta a la integración temporal, usamos un esquema alfa generalizado. El sistema
no lineal de ecuaciones se resuelve usando un método de Newton-Raphson iterativo, que
lleva a un algoritmo de dos fases predictor-multicorrector. El sistema lineal resultante es
resuelto mediante un método GtviRES precondicionado. Se adopta además una formulación
monolítica para la solución del problema FSI, esto es, las ecuaciones del fluido y del
sólido se resuelven de manera acoplada mientras que el movimiento de la malla se resuelve
separadamente, usando como input los datos del resolvedor sólido-fluido.[Resumo]
O traballo presentado nesta tese e~tá destinado ó estudo e simulación numérica de problemas
de interacción fluído-estrutura (FSI nas súas siglas en inglés) que involucran fluídos
complexos. A natureza non lineal e dependente do tempo deste tipo de problemas fai que
a súa solución analítica sexa moi difícil ou mesmo imposible de conseguir, esixindo o uso de
análiscs experimentais e / ou simulacións numéricas. Este feíto levou ó desenvolvemento
dunha gran variedade de modelos numéricos para a interaccióu de fluídos e estruturas
sólidas. Con todo, a maioría dos esforzos concentráronse en fluídos clásicos gobernados
palas ecuacións de N avier-Stokes. as cales non son capaces de capturar os mecanismos
físicos detrás dos fluídos complexos. N esta tese tratamos de encher este burato propoñendo
modelos para a interacción de sólidos e líquidos multifase e multicompoñente. Os modelos
propostos son aplicados a problemas específicos que espertan gran interese en campos
como a enxeñería, a microfabricación e a química.
Neste trahallo o comportamento da estrutura está descrito polas ecuacións da clastodinámica
non lineal e é tratada como un sólido hiperelástico. Usamos dúas teorías constitutivas
diferentes: un modelo Neo-Hookeano e un modelo Saint Venant. A descrición
do fluído complexo baséase no método de campos de fase ou método de interfaz difusa.
En concreto, adóptanse dúas técnicas diferentes. A primeira baséase nas ecuacións de
Navier-Stokes-Korteweg, que describen fluídos compresibles que están compostos de dúas
fases dun único compoiiente, por exemplo, auga líquida e vapor de auga. Utilizamos este
modelo para estudar o papel dos axentes tensoactivos na coalescencia de gotas e demostrar
que o movemcnto de gotas pode ser desencadeado mediante gradientes de deformación do
substrato no que se apoian -tensotaxe-. Tamén se mostran varios exemplos de implosión
inducida por cambios de fase, no que unha estrutura fina colapsa pala condensación dun
fluído. A segunda técnica está bascada no modelo de Cahn-Hilliard, o calé acoplado coas
ccuacións de Navier-Stokes. Neste modelo adoptamos unha estabilización baseada na formulación
variacional multiscala. Isto resulta mm modelo que describe fluxos inmiscibles de
dous compoñentes con tensión superficial. Ilustrarnos o potencial deste modelo resolvendo
varios problemas de elastocapilaridade en dúa.'i e tres dimensións, incluíndo origamis por
capilaridade, a deformación estática de substratos brandos con gotas ou a deformación de
micro pilares.
Como técnica FSI, adoptamos un método de malla móbil con discretización compatible
na interfaz sólido-líquido. Esta elección permite impoñer de xeito forte as condicións
de compatibilidade cinemática e da lugar a resultados máis precisos preto da interfaz
sólido-fluído. En concreto, usamos tmha descrición Lagranxiana para derivar a forma
semidiscreta das ecuacións do sólido e unha descrición Arbitraria Lagranxiana-Euleriana
(ALE) para o dominio do fluído. Isto quere dicir que a malla do fluído ten que ser
actualizada para acomodar o movcmento da estrutura. Para iso. resolvemos un problema adicional de elasticidade lineal no que as condicións de contorno son os desprazamcutos
procedentes do movemento do sólido.
Para a discretizacióu espacial tanto do dominio do sólido coma do Huído adoptamos
Análise lsoxeométrica (IGA) baseada en E-splines non uniformes (NURBS). unha xeueralización
do método de elementos finitos que posúc contiuuidade global de alta ordP e
que permite unha representación máis precisa de obxectos complexos. No que respecta
á integración temporal, usamos un esquema alfa xeneralizado. O sistema de ccuacións
non lineais é resalto a través dun método de Newton-Raphsou iterativo quP da lugar a
un algoritmo preditor-multicorrector. O sistema lineal resultante é tratado mediante un
método Gl\IRES precondicionado. Ademais, adoptamos unha formulación monolítica para
o problema FSI, é dicir, as ecuacións do ftuído e do sólido son resaltas de xcito acoplado
mentres que o movemento da malla se resolve por separado, utilizando como input os datos
do resolvedor sólido-ftuído