136,755 research outputs found
Molecular similarity for machine learning in drug development : poster presentation
Poster presentation In pharmaceutical research and drug development, machine learning methods play an important role in virtual screening and ADME/Tox prediction. For the application of such methods, a formal measure of similarity between molecules is essential. Such a measure, in turn, depends on the underlying molecular representation. Input samples have traditionally been modeled as vectors. Consequently, molecules are represented to machine learning algorithms in a vectorized form using molecular descriptors. While this approach is straightforward, it has its shortcomings. Amongst others, the interpretation of the learned model can be difficult, e.g. when using fingerprints or hashing. Structured representations of the input constitute an alternative to vector based representations, a trend in machine learning over the last years. For molecules, there is a rich choice of such representations. Popular examples include the molecular graph, molecular shape and the electrostatic field. We have developed a molecular similarity measure defined directly on the (annotated) molecular graph, a long-standing established topological model for molecules. It is based on the concepts of optimal atom assignments and iterative graph similarity. In the latter, two atoms are considered similar if their neighbors are similar. This recursive definition leads to a non-linear system of equations. We show how to iteratively solve these equations and give bounds on the computational complexity of the procedure. Advantages of our similarity measure include interpretability (atoms of two molecules are assigned to each other, each pair with a score expressing local similarity; this can be visualized to show similar regions of two molecules and the degree of their similarity) and the possibility to introduce knowledge about the target where available. We retrospectively tested our similarity measure using support vector machines for virtual screening on several pharmaceutical and toxicological datasets, with encouraging results. Prospective studies are under way
EXTRACTING FLOW FEATURES USING BAG-OF-FEATURES AND SUPERVISED LEARNING TECHNIQUES
Measuring the similarity between two streamlines is fundamental to many important flow data analysis and visualization tasks such as feature detection, pattern querying and streamline clustering. This dissertation presents a novel streamline similarity measure inspired by the bag-of-features concept from computer vision. Different from other streamline similarity measures, the proposed one considers both the distribution of and the distances among features along a streamline. The proposed measure is tested in two common tasks in vector field exploration: streamline similarity query and streamline clustering. Compared with a recent streamline similarity measure, the proposed measure allows users to see the interesting features more clearly in a complicated vector field.
In addition to focusing on similar streamlines through streamline similarity query or clustering, users sometimes want to group and see similar features from different streamlines. For example, it is useful to find all the spirals contained in different streamlines and present them to users. To this end, this dissertation proposes to segment each streamline into different features. This problem has not been studied extensively in flow visualization. For instance, many flow feature extraction techniques segment streamline based on simple heuristics such as accumulative curvature or arc length, and, as a result, the segments they found usually do not directly correspond to complete flow features. This dissertation proposes a machine learning-based streamline segmentation algorithm to segment each streamline into distinct features.
It is shown that the proposed method can locate interesting features (e.g., a spiral in a streamline) more accurately than some other flow feature extraction methods. Since streamlines are space curves, the proposed method also serves as a general curve segmentation method and may be applied in other fields such as computer vision.
Besides flow visualization, a pedagogical visualization tool DTEvisual for teaching access control is also discussed in this dissertation. Domain Type Enforcement (DTE) is a powerful abstraction for teaching students about modern models of access control in operating systems. With DTEvisual, students have an environment for visualizing a DTE-based policy using graphs, visually modifying the policy, and animating the common DTE queries in real time. A user study of DTEvisual suggests that the tool is helpful for students to understand DTE
Deformable Registration through Learning of Context-Specific Metric Aggregation
We propose a novel weakly supervised discriminative algorithm for learning
context specific registration metrics as a linear combination of conventional
similarity measures. Conventional metrics have been extensively used over the
past two decades and therefore both their strengths and limitations are known.
The challenge is to find the optimal relative weighting (or parameters) of
different metrics forming the similarity measure of the registration algorithm.
Hand-tuning these parameters would result in sub optimal solutions and quickly
become infeasible as the number of metrics increases. Furthermore, such
hand-crafted combination can only happen at global scale (entire volume) and
therefore will not be able to account for the different tissue properties. We
propose a learning algorithm for estimating these parameters locally,
conditioned to the data semantic classes. The objective function of our
formulation is a special case of non-convex function, difference of convex
function, which we optimize using the concave convex procedure. As a proof of
concept, we show the impact of our approach on three challenging datasets for
different anatomical structures and modalities.Comment: Accepted for publication in the 8th International Workshop on Machine
Learning in Medical Imaging (MLMI 2017), in conjunction with MICCAI 201
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Fast Predictive Multimodal Image Registration
We introduce a deep encoder-decoder architecture for image deformation
prediction from multimodal images. Specifically, we design an image-patch-based
deep network that jointly (i) learns an image similarity measure and (ii) the
relationship between image patches and deformation parameters. While our method
can be applied to general image registration formulations, we focus on the
Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By
predicting the initial momentum of the shooting formulation of LDDMM, we
preserve its mathematical properties and drastically reduce the computation
time, compared to optimization-based approaches. Furthermore, we create a
Bayesian probabilistic version of the network that allows evaluation of
registration uncertainty via sampling of the network at test time. We evaluate
our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our
experiments show that our method generates accurate predictions and that
learning the similarity measure leads to more consistent registrations than
relying on generic multimodal image similarity measures, such as mutual
information. Our approach is an order of magnitude faster than
optimization-based LDDMM.Comment: Accepted as a conference paper for ISBI 201
- …