34 research outputs found
Chemogenomic Analysis of the Druggable Kinome and Its Application to Repositioning and Lead Identification Studies
Owing to the intrinsic polypharmacological nature of most small-molecule kinase inhibitors, there is a need for computational models that enable systematic exploration of the chemogenomic landscape underlying druggable kinome toward more efficient kinome-profiling strategies. We implemented Virtual-KinomeProfiler, an efficient computational platform that captures distinct representations of chemical similarity space of the druggable kinome for various drug discovery endeavors. By using the computational platform, we profiled approximately 37 million compound-kinase pairs and made predictions for 151,708 compounds in terms of their repositioning and lead molecule potential, against 248 kinases simultaneously. Experimental testing with biochemical assays validated 51 of the predicted interactions, identifying 19 small-molecule inhibitors of EGFR, HCK, FLT1, and MSK1 protein kinases. The prediction model led to a 1.5-fold increase in precision and 2.8-fold decrease in false-discovery rate, when compared with traditional single-dose biochemical screening, which demonstrates its potential to drastically expedite the kinome-specific drug discovery process.Peer reviewe
Eye detection using discriminatory features and an efficient support vector machine
Accurate and efficient eye detection has broad applications in computer vision, machine learning, and pattern recognition. This dissertation presents a number of accurate and efficient eye detection methods using various discriminatory features and a new efficient Support Vector Machine (eSVM).
This dissertation first introduces five popular image representation methods - the gray-scale image representation, the color image representation, the 2D Haar wavelet image representation, the Histograms of Oriented Gradients (HOG) image representation, and the Local Binary Patterns (LBP) image representation - and then applies these methods to derive five types of discriminatory features. Comparative assessments are then presented to evaluate the performance of these discriminatory features on the problem of eye detection.
This dissertation further proposes two discriminatory feature extraction (DFE) methods for eye detection. The first DFE method, discriminant component analysis (DCA), improves upon the popular principal component analysis (PCA) method. The PCA method can derive the optimal features for data representation but not for classification. In contrast, the DCA method, which applies a new criterion vector that is defined on two novel measure vectors, derives the optimal discriminatory features in the whitened PCA space for two-class classification problems. The second DFE method, clustering-based discriminant analysis (CDA), improves upon the popular Fisher linear discriminant (FLD) method. A major disadvantage of the FLD is that it may not be able to extract adequate features in order to achieve satisfactory performance, especially for two-class problems. To address this problem, three CDA models (CDA-1, -2, and -3) are proposed by taking advantage of the clustering technique. For every CDA model anew between-cluster scatter matrix is defined. The CDA method thus can derive adequate features to achieve satisfactory performance for eye detection. Furthermore, the clustering nature of the three CDA models and the nonparametric nature of the CDA-2 and -3 models can further improve the detection performance upon the conventional FLD method.
This dissertation finally presents a new efficient Support Vector Machine (eSVM) for eye detection that improves the computational efficiency of the conventional Support Vector Machine (SVM). The eSVM first defines a Î set that consists of the training samples on the wrong side of their margin derived from the conventional soft-margin SVM. The Î set plays an important role in controlling the generalization performance of the eSVM. The eSVM then introduces only a single slack variable for all the training samples in the Î set, and as a result, only a very small number of those samples in the Î set become support vectors. The eSVM hence significantly reduces the number of support vectors and improves the computational efficiency without sacrificing the generalization performance. A modified Sequential Minimal Optimization (SMO) algorithm is then presented to solve the large Quadratic Programming (QP) problem defined in the optimization of the eSVM.
Three large-scale face databases, the Face Recognition Grand challenge (FRGC) version 2 database, the BioID database, and the FERET database, are applied to evaluate the proposed eye detection methods. Experimental results show the effectiveness of the proposed methods that improve upon some state-of-the-art eye detection methods
Kernel Square-Loss Exemplar Machines for Image Retrieval
International audienceZepeda and PĂ©rez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval. This paper extends this approach in several directions: We first show that replacing the hinge loss by the square loss in the ESVM cost function significantly reduces encoding time with negligible effect on accuracy. We call this model square-loss exemplar machine, or SLEM. We then introduce a kernelized SLEM which can be implemented efficiently through low-rank matrix decomposition , and displays improved performance. Both SLEM variants exploit the fact that the negative examples are fixed, so most of the SLEM computational complexity is relegated to an offline process independent of the positive examples. Our experiments establish the performance and computational advantages of our approach using a large array of base features and standard image retrieval datasets
Optimized complex power quality classifier using one vs. rest support vector machine
Nowadays, power quality issues are becoming a significant research topic because of the increasing inclusion of very sensitive devices and considerable renewable energy sources. In general, most of the previous power quality classification techniques focused on single power quality events and did not include an optimal feature selection process. This paper presents a classification system that employs Wavelet Transform and the RMS profile to extract the main features of the measured waveforms containing either single or complex disturbances. A data mining process is designed to select the optimal set of features that better describes each disturbance present in the waveform. Support Vector Machine binary classifiers organized in a ?One Vs Rest? architecture are individually optimized to classify single and complex disturbances. The parameters that rule the performance of each binary classifier are also individually adjusted using a grid search algorithm that helps them achieve optimal performance. This specialized process significantly improves the total classification accuracy. Several single and complex disturbances were simulated in order to train and test the algorithm. The results show that the classifier is capable of identifying >99% of single disturbances and >97% of complex disturbances.Fil: de Yong, David Marcelo. Universidad Nacional de RĂo Cuarto. Facultad de IngenierĂa. Departamento de Electricidad y ElectrĂłnica; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - CĂłrdoba; ArgentinaFil: Bhowmik, Sudipto. Nexant Inc; Estados UnidosFil: Magnago, Fernando. Universidad Nacional de RĂo Cuarto. Facultad de IngenierĂa. Departamento de Electricidad y ElectrĂłnica; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - CĂłrdoba; Argentin
Optimization of airport terminal-area air traffic operations under uncertain weather conditions
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 153-158).Convective weather is responsible for large delays and widespread disruptions in the U.S. National Airspace System, especially during summer. Although Air Traffic Flow Management algorithms exist to schedule and route traffic in the face of disruptions, they require reliable forecasts of airspace capacity. However, there exists a gap between the spatial and temporal accuracy of aviation weather forecasts (and existing capacity models) and what these algorithms assume. In this thesis we consider the problem of integrating currently available convective weather forecasts with air traffic management in terminal airspace (near airports). We first demonstrate how raw convective weather forecasts, which provide deterministic predictions of the Vertically Integrated Liquid (the precipitation content in a column of airspace) can be translated into reliable and accurate probabilistic fore- casts of whether or not a terminal-area route will be blocked. Given a flight route through the terminal-area, we apply techniques from machine learning to determine the probability that the route will be open in actual weather. This probabilistic route blockage predictor is then used to optimize terminal-area operations. We develop an integer programming formulation for a 2-dimensional model of terminal airspace that dynamically moves arrival and departure routes to maximize expected capacity. Experiments using real weather scenarios on stormy days show that our algorithms recommend that a terminal-area route be modified 30% of the time, opening up 13% more available routes during these scenarios. The error rate is low, with only 5% of cases corresponding to a modified route being blocked while the original route is in fact open. In addition, for routes predicted to be open with probability 0.95 or greater by our method, 96% of these routes are indeed open (on average) in the weather that materializes. In the final part of the thesis we consider more realistic models of terminal airspace routing and structure. We develop an A*-based routing algorithm that identifies 3-D routes through airspace that adhere to physical aircraft constraints during climb and descent, are conflict-free, and are likely to avoid convective weather hazards. The proposed approach is aimed at improving traffic manager decision-making in today's operational environment.by Diana Michalek Pfeil.Ph.D
A Simulation Study Comparing the Use of Supervised Machine Learning Variable Selection Methods in the Psychological Sciences
When specifying a predictive model for classification, variable selection (or subset selection) is one of the most important steps for researchers to consider. Reducing the necessary number of variables in a prediction model is vital for many reasons, including reducing the burden of data collection and increasing model efficiency and generalizability. The pool of variable selection methods from which to choose is large, and researchers often struggle to identify which method they should use given the specific features of their data set. Yet, there is a scarcity of literature available to guide researchers in their choice; the literature centers on comparing different implementations of a given method rather than comparing different methodologies under vary data features. Through the implementation of a large-scale Monte Carlo simulation and the application to three psychological datasets, we evaluated the prediction error rates, area under the receiver operating curve, number of variables selected, computation times, and true positive rates of five different variable selection methods using R under varying parameterizations (i.e., default vs. grid tuning): the genetic algorithm (ga), LASSO (glmnet), Elastic Net (glmnet), Support Vector Machines (svmfs), and random forest (Boruta). Performance measures did not converge upon a single best method; as such, researchers should guide their method selection based on what measure of performance they deem most important.
Results do, however, indicate that the genetic algorithm is the most widely applicable method, exhibiting minimum error rates in hold-out samples when compared to other variable selection methods. Thus, if little is known of the format of the data by the researcher, choosing to implement the genetic algorithm will provide strong results
Recommended from our members
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors
With the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer-level sensors has the potential of introducing many new applications, such as user-captured 3D warehouse and search engines, robots with 3D sensing capability, and customized 3D printing. Nevertheless, the low-cost sensors used by general consumers also pose new technological challenges. First, they have relatively high levels of sensor noise. Second, the use of such consumer devices is often in uncontrolled settings, resulting in challenging conditions, such as poor lighting, cluttered scenes, and object occlusion. To address such emerging opportunities and associated challenges, this dissertation is dedicated to the development of novel algorithms and systems for 3D data understanding and processing, using input from a consumer-level 3D sensor.
In particular, the key problems of 3D shape retrieval, scene understanding, and pose recognition are explored in order to present a comprehensive coverage of the key aspects of content-based 3D shape analysis. To resolve the aforementioned challenges, we propose a flexible Markov Random Field (MRF) framework that uses local information to allow partial matching, and thus address the model incompleteness problem; the framework also uses higher-order correlation to provide additional robustness against sensor noise. With the MRF framework, these 3D analysis problems can be transformed into a unified potential energy minimization problem, while preserving the flexibility to adapt to different settings and resolve the unique challenges of each problem. The contributions of the dissertation include:
a. Cross-Domain 3D Retrieval: First we tackle the problem of searching 3D noise- free models using noisy data captured by low-cost 3D sensors â a unique cross-domain setting. To manage the challenges of sensor noise and model incompleteness from consumer-level sensors, we propose a novel MRF formulation for the retrieval problem. The potential function of the random field is designed to capture both the local shape and global spatial consistency in order to preserve the local matching capability, while offering robustness against the sensor noise. The specific form of the potential functions is determined efficiently by a series of weak classifiers, thus forming a variant of the Regression Tree Field (RTF). We achieve better retrieval precision and recall in the cross-domain settings with a consumer-level depth sensor compared with state-of-the-art approaches.
b. 3D Scene Understanding: We develop a scene understanding system based on input from consumer-level depth sensors. To resolve the key challenge of the lack of annotated 3D training data, we construct an MRF that connects the input 3D point cloud and the associated 2D reference images, based on which the 3D point cloud is stitched. A series of weak classifiers are trained to obtain an approximate semantic segmentation result from the reference images. The potential function of the field is designed to integrate the results from the classifiers, while taking advantage of the 3D spatial consistency in order to output a comprehensive scene understanding result. We achieve comparable accuracy and much faster speed compared with state-of-the-art 3D scene understanding systems, with the difference that we do not require annotated 3D training data.
c. Pose Recognition of Deformable Objects: We develop a method for supporting a robotics system to recognize pose and manipulate deformable objects. More specifically, garment pose is recognized with the help of an offline simulated database and the proposed retrieval approach. We use a novel binary feature representation extracted from the reconstructed 3D surfaces in order to allow efficient matching, thus achieving real-time performance. A spatial weight is further learned in order to integrate the local matching result. The system shows superior recognition accuracy and faster speed than the state-of-the-art approaches.
d. Application with 2D Data: In addition to the traditional 3D applications, we explore the possibility of extending MRF formulation to 2D data, especially those used in classical low-level 2D vision problems, such as image deblurring and denoising. One well-known technique that uses image prior, the probabilistic patched-based prior, is known to have bottlenecks in finding the most similar model from a model set, which can be posed as a retrieval problem. Therefore, we apply the MRF formulation originally developed for 3D shape retrieval, and extend it to this 2D problem by introducing a grid-like random field structure. We can achieve 40x acceleration compared with the state-of-the-art algorithm, while preserving quality.
We organize the dissertation as follows. First, the core problems of 3D shape retrieval, scene understanding, and pose recognition, and with the proposed solutions that use MRF and RTF are explored in Part I. In Part II, the extension to 2D data is discussed. Extensive evaluation is performed in each specific task in order to compare the proposed approaches with state-of-the-art algorithms and systems, and also to justify the components of the proposed methods. Finally, in Part III, we include the conclusion remarks and discussion of open issues and future work