Search CORE

90,521 research outputs found

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Author: Cang Zixuan
Mu Lin
Wei Guowei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/08/2017
Field of study

This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

Author: A Martino Di
A Mayer
B Fischl
C Wachinger
D Bouneffouf
David C. Van Essen
DS Marcus
H Shimodaira
K Ellis
K Franke
K Marek
Katja Franke
MP Milham
N Japkowicz
RL Gollub
S Pan
S Valizadeh
WR Thompson
Y Zhu
Publication venue
Publication date: 29/05/2017
Field of study

With the availability of big medical image data, the selection of an adequate training set is becoming more important to address the heterogeneity of different datasets. Simply including all the data does not only incur high processing costs but can even harm the prediction. We formulate the smart and efficient selection of a training dataset from big medical image data as a multi-armed bandit problem, solved by Thompson sampling. Our method assumes that image features are not available at the time of the selection of the samples, and therefore relies only on meta information associated with the images. Our strategy simultaneously exploits data sources with high chances of yielding useful samples and explores new data regions. For our evaluation, we focus on the application of estimating the age from a brain MRI. Our results on 7,250 subjects from 10 datasets show that our approach leads to higher accuracy while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding

arXiv.org e-Print Archive

Crossref

A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation

Author: Aggarwal
Bartesaghi
Bartesaghi
Beck
Chen
Collado
Delgado
Frazier
Goodfellow
Grünewald
Jasnin
Kemmerling
LeCun
LeCun
Luengo
Martinez-Sanchez
Martinez-Sanchez
Maulik
Miguel Ricardo Leung
Min
Min
Min Xu
Pedregosa
Pei
Pettersen
Ramachandran
Rigort
Scheres
Tang
Tibshirani
Tosic
Tzviya Zeev-Ben-Mordehai
Wold
Xiangrui Zeng
Xu
Xu
Xu
Publication venue: 'Elsevier BV'
Publication date: 28/12/2017
Field of study

Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse grouping of 3D small subvolumes extracted from tomograms. We demonstrate that the autoencoder can be used for efficient and coarse characterization of features of macromolecular complexes and surfaces, such as membranes. In addition, the autoencoder can be used to detect non-cellular features related to sample preparation and data collection, such as carbon edges from the grid and tomogram boundaries. The autoencoder is also able to detect patterns that may indicate spatial interactions between cellular components. Furthermore, we demonstrate that our autoencoder can be used for weakly supervised semantic segmentation of cellular components, requiring a very small amount of manual annotation.Comment: Accepted by Journal of Structural Biolog

arXiv.org e-Print Archive

Crossref

Utrecht University Repository

Methods for Analysing Endothelial Cell Shape and Behaviour in Relation to the Focal Nature of Atherosclerosis

Author: Iftikhar Saadia
Iftikhar Saadia
Publication venue: Bioengineering, Imperial College London
Publication date: 01/06/2011
Field of study

The aim of this thesis is to develop automated methods for the analysis of the spatial patterns, and the functional behaviour of endothelial cells, viewed under microscopy, with applications to the understanding of atherosclerosis. Initially, a radial search approach to segmentation was attempted in order to trace the cell and nuclei boundaries using a maximum likelihood algorithm; it was found inadequate to detect the weak cell boundaries present in the available data. A parametric cell shape model was then introduced to fit an equivalent ellipse to the cell boundary by matching phase-invariant orientation fields of the image and a candidate cell shape. This approach succeeded on good quality images, but failed on images with weak cell boundaries. Finally, a support vector machines based method, relying on a rich set of visual features, and a small but high quality training dataset, was found to work well on large numbers of cells even in the presence of strong intensity variations and imaging noise. Using the segmentation results, several standard shear-stress dependent parameters of cell morphology were studied, and evidence for similar behaviour in some cell shape parameters was obtained in in-vivo cells and their nuclei. Nuclear and cell orientations around immature and mature aortas were broadly similar, suggesting that the pattern of flow direction near the wall stayed approximately constant with age. The relation was less strong for the cell and nuclear length-to-width ratios. Two novel shape analysis approaches were attempted to find other properties of cell shape which could be used to annotate or characterise patterns, since a wide variability in cell and nuclear shapes was observed which did not appear to fit the standard parameterisations. Although no firm conclusions can yet be drawn, the work lays the foundation for future studies of cell morphology. To draw inferences about patterns in the functional response of cells to flow, which may play a role in the progression of disease, single-cell analysis was performed using calcium sensitive florescence probes. Calcium transient rates were found to change with flow, but more importantly, local patterns of synchronisation in multi-cellular groups were discernable and appear to change with flow. The patterns suggest a new functional mechanism in flow-mediation of cell-cell calcium signalling

Spiral - Imperial College Digital Repository

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals