7 research outputs found
Learning Word Importance with the Neural Bag-of-Words Model
International audienceThe Neural Bag-of-Words (NBOW) modelperforms classification with an average ofthe input word vectors and achieves an impressiveperformance. While the NBOWmodel learns word vectors targeted forthe classification task it does not explicitlymodel which words are important forgiven task. In this paper we propose animproved NBOW model with this abilityto learn task specific word importanceweights. The word importance weightsare learned by introducing a new weightedsum composition of the word vectors.With experiments on standard topic andsentiment classification tasks, we showthat (a) our proposed model learns meaningfulword importance for a given task (b)our model gives best accuracies among theBOW approaches. We also show that thelearned word importance weights are comparableto tf-idf based word weights whenused as features in a BOWSVM classifier
Recent Advances in Fully Dynamic Graph Algorithms
In recent years, significant advances have been made in the design and
analysis of fully dynamic algorithms. However, these theoretical results have
received very little attention from the practical perspective. Few of the
algorithms are implemented and tested on real datasets, and their practical
potential is far from understood. Here, we present a quick reference guide to
recent engineering and theory results in the area of fully dynamic graph
algorithms
Development of Computer-aided Concepts for the Optimization of Single-Molecules and their Integration for High-Throughput Screenings
In the field of synthetic biology, highly interdisciplinary approaches for the
design and modelling of functional molecules using computer-assisted methods
have become established in recent decades. These computer-assisted methods are
mainly used when experimental approaches reach their limits, as computer models
are able to e.g., elucidate the temporal behaviour of nucleic acid polymers or
proteins by single-molecule simulations, as well as to illustrate the functional
relationship of amino acid residues or nucleotides to each other. The knowledge
raised by computer modelling can be used continuously to influence the further
experimental process (screening), and also shape or function
(rational design) of the considered molecule. Such an optimization of the
biomolecules carried out by humans is often necessary, since the observed
substrates for the biocatalysts and enzymes are usually synthetic (``man-made
materials'', such as PET) and the evolution had no time to provide efficient
biocatalysts.
With regard to the computer-aided design of single-molecules, two fundamental paradigms
share the supremacy in the field of synthetic biology. On the one hand,
probabilistic experimental methods (e.g., evolutionary design processes such as
directed evolution) are used in combination with High-Throughput
Screening (HTS), on the other hand, rational, computer-aided single-molecule design
methods are applied.
For both topics, computer models/concepts were developed, evaluated and
published.
The first contribution in this thesis describes a computer-aided design approach
of the Fusarium Solanie Cutinase (FsC). The activity loss of the enzyme during a
longer incubation period was investigated in detail (molecular) with PET. For
this purpose, Molecular Dynamics (MD) simulations of the spatial structure of
FsC and a water-soluble degradation product of the
synthetic substrate PET (ethylene glycol) were computed. The existing model was
extended by combining it with Reduced Models. This simulation study has
identified certain areas of FsC which interact very
strongly with PET (ethylene glycol) and thus have a significant influence on the
flexibility and structure of the enzyme.
The subsequent original publication establishes a new method for the selection
of High-Throughput assays for the use in protein chemistry. The selection is
made via a meta-optimization of the assays to be analyzed. For this purpose,
control reactions are carried out for the respective assay. The distance of the
control distributions is evaluated using classical static methods such as the
Kolmogorov-Smirnov test. A performance is then assigned to each assay. The
described control experiments are performed before the actual experiment
(screening), and the assay with the highest performance is used for further
screening. By applying this generic method, high success rates can be achieved.
We were able to demonstrate this experimentally using
lipases and esterases as an example.
In the area of green chemistry, the above-mentioned processes can be useful for finding
enzymes for the degradation of synthetic materials more quickly or modifying
enzymes that occur naturally in such a way that these enzymes can
efficiently convert synthetic substrates after successful optimization. For this
purpose, the experimental effort (consumption of materials) is kept to a minimum
during the practical implementation. Especially for large-scale screenings, a
prior consideration or restriction of the possible sequence-space can contribute significantly to
maximizing the success rate of screenings and minimizing the total
time they require.
In addition to classical methods such as MD simulations in combination with
reduced models, new graph-based methods for the presentation and analysis of MD
simulations have been developed. For this purpose, simulations were converted
into distance-dependent dynamic graphs. Based on this reduced representation,
efficient algorithms for analysis were developed and tested. In particular,
network motifs were investigated to determine whether this type of
semantics is more suitable for describing molecular structures and interactions
within MD simulations than spatial coordinates. This concept was evaluated for
various MD simulations of molecules, such as water, synthetic pores, proteins,
peptides and RNA structures. It has been shown that this novel form of semantics
is an excellent way to describe (bio)molecular structures and their dynamics.
Furthermore, an algorithm (StreAM-Tg) has been developed for the creation of
motif-based Markov models, especially for the analysis of single molecule
simulations of nucleic acids. This algorithm is used for the design of RNAs. The
insights obtained from the analysis with StreAM-Tg (Markov models) can
provide useful design recommendations for the (re)design of functional RNA.
In this context, a new method was developed to quantify the environment (i.e.
water; solvent context) and its influence on biomolecules in MD simulations. For
this purpose, three vertex motifs were used to describe the structure of the
individual water molecules. This new method offers many advantages. With this
method, the structure and dynamics of water can be accurately described. For
example, we were able to reproduce the thermodynamic entropy of water in the
liquid and vapor phase along the vapor-liquid equilibrium curve from the
triple point to the critical point.
Another major field covered in this thesis is the development of new
computer-aided approaches for HTS for the design of
functional RNA. For the production of functional RNA (e.g., aptamers and riboswitches), an experimental,
round-based HTS (like SELEX) is typically used. By using
Next Generation Sequencing (NGS) in combination with the SELEX process,
this design process can be studied at the nucleotide and secondary structure
levels for the first time. The special feature of small RNA molecules compared
to proteins is that the secondary structure (topology), with a minimum free
energy, can be determined directly from the nucleotide sequence, with a high
degree of certainty.
Using the combination of M. Zuker's algorithm, NGS and the SELEX method, it was
possible to quantify the structural diversity of individual RNA molecules under
consideration of the genetic context. This combination of methods allowed the
prediction of rounds in which the first ciprofloxacin-riboswitch emerged.
In this example, only a simple structural comparison was made for the
quantification (Levenshtein distance) of the diversity of each round.
To improve this, a new representation of the RNA structure as a directed graph
was modeled, which was then compared with a probabilistic subgraph isomorphism.
Finally, the NGS dataset (ciprofloxacin-riboswitch) was modeled as a dynamic
graph and analyzed after the occurrence of defined seven-vertex motifs. For this
purpose, motif-based semantics were integrated into HTS
for RNA molecules for the first time. The identified motifs could be assigned to
secondary structural elements that were identified experimentally in the
ciprofloxacin aptamer R10k6.
Finally, all the algorithms presented were integrated into an R library,
published and made available to scientists from all over the world
Opportunities and obstacles for deep learning in biology and medicine
Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network\u27s prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum