749 research outputs found
OctaSOM - An octagonal based SOM lattice structure for biomedical problems
In this study, an octagonal-based self-organizing network’s lattice structure is proposed to allow more exploration and exploitation in updating the weights for better mapping and classification performances.The neighborhood of the octagonal-based lattice structure provides more nodes for the weights updating than standard hexagonal-based lattice structure. Based on our experiment, the octagonal-based lattice structure performance is better than standard hexagonal lattice structure on biomedical datasets for classification problem. This indicates that proposed algorithm is an alternative lattice structure for self-organizing network which give more wisdom to classification problems especially in the biomedical domains
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies
In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To compensate for this deficiency, we incorporate explicit word ontologies, such as WordNet, into the LDA algorithm to account for various semantic relationships. Experiments over the 20 Newsgroups, NIPS, OHSUMED, and IED document collections demonstrate that incorporating such knowledge improves perplexity measure over LDA alone for given parameters. In addition, the same ontology augmentation improves recall and precision results for user queries
Recommended from our members
Improvements in Molecular Mechanics Sampling and Energy Models
The process of bringing drugs to market continues to be a slow and expensive affair. And despite recent advances in technology, the cost both in monetary terms and in terms of time between target identification and arrival of a new drug on the market continues to increase. High throughput screening is a first step towards testing a large number of possible bioactive compounds very quickly. However, the space of possible small molecules is limitless, and high throughput screening is limited both by the size of available libraries and the cost of running such a large number of experiments. Therefore, advancements in computational drug screening are necessary in order to maintain the current rate of progress in modern medicine.
Computational drug design, or computer assisted drug design, offers a possible way of addressing some of the shortfalls of conventional high throughput screening. Using computational methods, it is possible to estimate parameters such as binding affinity of any small molecule, even those not currently present in any small molecule library, without having to first invest in the often slow and expensive process of finding a synthetic pathway. Computational methods can be used to screen similar molecules, or mutations in small molecule space, seeking to increase binding affinity to the protein target, and thereby efficacy, while simultaneously minimizing binding affinity to other proteins, decreasing cross reactivity, and reducing toxicity and harmful side effects.Computational biology methods of drug research can be broadly classified in a number of different ways.
However, one of the most common classifications is according to the methods used to identify possible drug compounds and later optimize those leads. The first broad category is informatics or artificial intelligence based approaches. In these approaches, artificial intelligence methods such as neural networks, support vector machines, and qualitative structure-activity relationships (QSAR) are used to identify chemical or structural properties that contribute heavily to binding affinity.
The next category, ligand based approaches, is very useful when there are a large number of known binders for a specific family of proteins. In this approach, the ligands are clustered using a metric of chemical similarity and new compounds which occupy a similar chemical space are likely to also bind strongly with the protein of interest.
The final class of methods of computational drug design, and the method explored in this thesis, is the diverse class known as structural methods. These approaches in the most general sense make use of a sampling method to sample a number of protein, or protein-small-molecule interaction conformations and an energy model or scoring function to measure dimensions which would be very difficult and or expensive to measure experimentally. In this thesis, a number of different sampling methods that are applicable to different questions in computational biology are presented. Additionally, an improved algorithm for evaluating implicit solvent effects is presented, and a number of improvements in performance, reliability and utility of the molecular mechanics program used are discussed
Structural aspects of molecular recognition
This thesis describes the design, implementation and application of a novel docking algorithm. Chapter 1 reviews some important facts about proteins and protein structure. Several molecular recognition systems are examined in detail. This Chapter also reviews a representative set of recent protein/protein docking methods and discusses their relative merits. Chapter 2 sets out the aims of the new docking algorithm, called DAPMatch, and gives full details of its implementation on a parallel architecture computer. The testing of the algorithm is also discussed. Subsequent chapters describe the application of the DAPMatch algorithm to a number of docking problems. DAPMatch is used to reconstruct the known structures of three antibody/lysozyme complexes, using the unbound structure of lysozyme. For the first time a model of the D1.3 antibody is used as a target molecule for a docking algorithm. These results are presented in Chapter 3 and analysed in detail to demonstrate their significance; non-native solutions are also examined. Chapter 4 describes the practical use of the DAPMatch algorithm in a modelling situation, to construct a hypothetical structure for the high molecular weight epidermal growth factor complex. Chapter 5 describes the adaptation of the DAPMatch algorithm to investigate α-helix/α-helix docking, and presents the results obtained. Chapter 6 explains the conclusions that were derived from this work, and suggests possible future enhancements to the algorithm
Path finding on a spherical self-organizing map using distance transformations
Spatialization methods create visualizations that allow users to analyze high-dimensional data in an intuitive manner and facilitates the extraction of meaningful information. Just as geographic maps are simpli ed representations of geographic spaces, these visualizations are esssentially maps of abstract data spaces that are created through dimensionality reduction. While we are familiar with geographic maps for path planning/ nding applications, research into using maps of high-dimensional spaces for such purposes has been largely ignored. However, literature has shown that it is possible to use these maps to track temporal and state changes within a high-dimensional space. A popular dimensionality reduction method that produces a mapping for these purposes is the Self-Organizing Map. By using its topology preserving capabilities with a colour-based visualization method known as the U-Matrix, state transitions can be visualized as trajectories on the resulting mapping. Through these trajectories, one can gather information on the transition path between two points in the original high-dimensional state space. This raises the interesting question of whether or not the Self-Organizing Map can be used to discover the transition path between two points in an n-dimensional space. In this thesis, we use a spherically structured Self-Organizing Map called the Geodesic Self-Organizing Map for dimensionality reduction and the creation of a topological mapping that approximates the n-dimensional space. We rst present an intuitive method for a user to navigate the surface of the Geodesic SOM. A new application of the distance transformation algorithm is then proposed to compute the path between two points on the surface of the SOM, which corresponds to two points in the data space. Discussions will then follow on how this application could be improved using some form of surface shape analysis. The new approach presented in this thesis would then be evaluated by analyzing the results of using the Geodesic SOM for manifold embedding and by carrying out data analyses using carbon dioxide emissions data
Path finding on a spherical self-organizing map using distance transformations
Spatialization methods create visualizations that allow users to analyze high-dimensional data in an intuitive manner and facilitates the extraction of meaningful information. Just as geographic maps are simpli ed representations of geographic spaces, these visualizations are esssentially maps of abstract data spaces that are created through dimensionality reduction. While we are familiar with geographic maps for path planning/ nding applications, research into using maps of high-dimensional spaces for such purposes has been largely ignored. However, literature has shown that it is possible to use these maps to track temporal and state changes within a high-dimensional space. A popular dimensionality reduction method that produces a mapping for these purposes is the Self-Organizing Map. By using its topology preserving capabilities with a colour-based visualization method known as the U-Matrix, state transitions can be visualized as trajectories on the resulting mapping. Through these trajectories, one can gather information on the transition path between two points in the original high-dimensional state space. This raises the interesting question of whether or not the Self-Organizing Map can be used to discover the transition path between two points in an n-dimensional space. In this thesis, we use a spherically structured Self-Organizing Map called the Geodesic Self-Organizing Map for dimensionality reduction and the creation of a topological mapping that approximates the n-dimensional space. We rst present an intuitive method for a user to navigate the surface of the Geodesic SOM. A new application of the distance transformation algorithm is then proposed to compute the path between two points on the surface of the SOM, which corresponds to two points in the data space. Discussions will then follow on how this application could be improved using some form of surface shape analysis. The new approach presented in this thesis would then be evaluated by analyzing the results of using the Geodesic SOM for manifold embedding and by carrying out data analyses using carbon dioxide emissions data
- …