34 research outputs found

    Chemogenomic Analysis of the Druggable Kinome and Its Application to Repositioning and Lead Identification Studies

    Get PDF
    Owing to the intrinsic polypharmacological nature of most small-molecule kinase inhibitors, there is a need for computational models that enable systematic exploration of the chemogenomic landscape underlying druggable kinome toward more efficient kinome-profiling strategies. We implemented Virtual-KinomeProfiler, an efficient computational platform that captures distinct representations of chemical similarity space of the druggable kinome for various drug discovery endeavors. By using the computational platform, we profiled approximately 37 million compound-kinase pairs and made predictions for 151,708 compounds in terms of their repositioning and lead molecule potential, against 248 kinases simultaneously. Experimental testing with biochemical assays validated 51 of the predicted interactions, identifying 19 small-molecule inhibitors of EGFR, HCK, FLT1, and MSK1 protein kinases. The prediction model led to a 1.5-fold increase in precision and 2.8-fold decrease in false-discovery rate, when compared with traditional single-dose biochemical screening, which demonstrates its potential to drastically expedite the kinome-specific drug discovery process.Peer reviewe

    Eye detection using discriminatory features and an efficient support vector machine

    Get PDF
    Accurate and efficient eye detection has broad applications in computer vision, machine learning, and pattern recognition. This dissertation presents a number of accurate and efficient eye detection methods using various discriminatory features and a new efficient Support Vector Machine (eSVM). This dissertation first introduces five popular image representation methods - the gray-scale image representation, the color image representation, the 2D Haar wavelet image representation, the Histograms of Oriented Gradients (HOG) image representation, and the Local Binary Patterns (LBP) image representation - and then applies these methods to derive five types of discriminatory features. Comparative assessments are then presented to evaluate the performance of these discriminatory features on the problem of eye detection. This dissertation further proposes two discriminatory feature extraction (DFE) methods for eye detection. The first DFE method, discriminant component analysis (DCA), improves upon the popular principal component analysis (PCA) method. The PCA method can derive the optimal features for data representation but not for classification. In contrast, the DCA method, which applies a new criterion vector that is defined on two novel measure vectors, derives the optimal discriminatory features in the whitened PCA space for two-class classification problems. The second DFE method, clustering-based discriminant analysis (CDA), improves upon the popular Fisher linear discriminant (FLD) method. A major disadvantage of the FLD is that it may not be able to extract adequate features in order to achieve satisfactory performance, especially for two-class problems. To address this problem, three CDA models (CDA-1, -2, and -3) are proposed by taking advantage of the clustering technique. For every CDA model anew between-cluster scatter matrix is defined. The CDA method thus can derive adequate features to achieve satisfactory performance for eye detection. Furthermore, the clustering nature of the three CDA models and the nonparametric nature of the CDA-2 and -3 models can further improve the detection performance upon the conventional FLD method. This dissertation finally presents a new efficient Support Vector Machine (eSVM) for eye detection that improves the computational efficiency of the conventional Support Vector Machine (SVM). The eSVM first defines a Θ set that consists of the training samples on the wrong side of their margin derived from the conventional soft-margin SVM. The Θ set plays an important role in controlling the generalization performance of the eSVM. The eSVM then introduces only a single slack variable for all the training samples in the Θ set, and as a result, only a very small number of those samples in the Θ set become support vectors. The eSVM hence significantly reduces the number of support vectors and improves the computational efficiency without sacrificing the generalization performance. A modified Sequential Minimal Optimization (SMO) algorithm is then presented to solve the large Quadratic Programming (QP) problem defined in the optimization of the eSVM. Three large-scale face databases, the Face Recognition Grand challenge (FRGC) version 2 database, the BioID database, and the FERET database, are applied to evaluate the proposed eye detection methods. Experimental results show the effectiveness of the proposed methods that improve upon some state-of-the-art eye detection methods

    Kernel Square-Loss Exemplar Machines for Image Retrieval

    Get PDF
    International audienceZepeda and PĂ©rez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval. This paper extends this approach in several directions: We first show that replacing the hinge loss by the square loss in the ESVM cost function significantly reduces encoding time with negligible effect on accuracy. We call this model square-loss exemplar machine, or SLEM. We then introduce a kernelized SLEM which can be implemented efficiently through low-rank matrix decomposition , and displays improved performance. Both SLEM variants exploit the fact that the negative examples are fixed, so most of the SLEM computational complexity is relegated to an offline process independent of the positive examples. Our experiments establish the performance and computational advantages of our approach using a large array of base features and standard image retrieval datasets

    Optimized complex power quality classifier using one vs. rest support vector machine

    Get PDF
    Nowadays, power quality issues are becoming a significant research topic because of the increasing inclusion of very sensitive devices and considerable renewable energy sources. In general, most of the previous power quality classification techniques focused on single power quality events and did not include an optimal feature selection process. This paper presents a classification system that employs Wavelet Transform and the RMS profile to extract the main features of the measured waveforms containing either single or complex disturbances. A data mining process is designed to select the optimal set of features that better describes each disturbance present in the waveform. Support Vector Machine binary classifiers organized in a ?One Vs Rest? architecture are individually optimized to classify single and complex disturbances. The parameters that rule the performance of each binary classifier are also individually adjusted using a grid search algorithm that helps them achieve optimal performance. This specialized process significantly improves the total classification accuracy. Several single and complex disturbances were simulated in order to train and test the algorithm. The results show that the classifier is capable of identifying >99% of single disturbances and >97% of complex disturbances.Fil: de Yong, David Marcelo. Universidad Nacional de RĂ­o Cuarto. Facultad de IngenierĂ­a. Departamento de Electricidad y ElectrĂłnica; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - CĂłrdoba; ArgentinaFil: Bhowmik, Sudipto. Nexant Inc; Estados UnidosFil: Magnago, Fernando. Universidad Nacional de RĂ­o Cuarto. Facultad de IngenierĂ­a. Departamento de Electricidad y ElectrĂłnica; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - CĂłrdoba; Argentin

    Optimization of airport terminal-area air traffic operations under uncertain weather conditions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 153-158).Convective weather is responsible for large delays and widespread disruptions in the U.S. National Airspace System, especially during summer. Although Air Traffic Flow Management algorithms exist to schedule and route traffic in the face of disruptions, they require reliable forecasts of airspace capacity. However, there exists a gap between the spatial and temporal accuracy of aviation weather forecasts (and existing capacity models) and what these algorithms assume. In this thesis we consider the problem of integrating currently available convective weather forecasts with air traffic management in terminal airspace (near airports). We first demonstrate how raw convective weather forecasts, which provide deterministic predictions of the Vertically Integrated Liquid (the precipitation content in a column of airspace) can be translated into reliable and accurate probabilistic fore- casts of whether or not a terminal-area route will be blocked. Given a flight route through the terminal-area, we apply techniques from machine learning to determine the probability that the route will be open in actual weather. This probabilistic route blockage predictor is then used to optimize terminal-area operations. We develop an integer programming formulation for a 2-dimensional model of terminal airspace that dynamically moves arrival and departure routes to maximize expected capacity. Experiments using real weather scenarios on stormy days show that our algorithms recommend that a terminal-area route be modified 30% of the time, opening up 13% more available routes during these scenarios. The error rate is low, with only 5% of cases corresponding to a modified route being blocked while the original route is in fact open. In addition, for routes predicted to be open with probability 0.95 or greater by our method, 96% of these routes are indeed open (on average) in the weather that materializes. In the final part of the thesis we consider more realistic models of terminal airspace routing and structure. We develop an A*-based routing algorithm that identifies 3-D routes through airspace that adhere to physical aircraft constraints during climb and descent, are conflict-free, and are likely to avoid convective weather hazards. The proposed approach is aimed at improving traffic manager decision-making in today's operational environment.by Diana Michalek Pfeil.Ph.D

    A Simulation Study Comparing the Use of Supervised Machine Learning Variable Selection Methods in the Psychological Sciences

    Get PDF
    When specifying a predictive model for classification, variable selection (or subset selection) is one of the most important steps for researchers to consider. Reducing the necessary number of variables in a prediction model is vital for many reasons, including reducing the burden of data collection and increasing model efficiency and generalizability. The pool of variable selection methods from which to choose is large, and researchers often struggle to identify which method they should use given the specific features of their data set. Yet, there is a scarcity of literature available to guide researchers in their choice; the literature centers on comparing different implementations of a given method rather than comparing different methodologies under vary data features. Through the implementation of a large-scale Monte Carlo simulation and the application to three psychological datasets, we evaluated the prediction error rates, area under the receiver operating curve, number of variables selected, computation times, and true positive rates of five different variable selection methods using R under varying parameterizations (i.e., default vs. grid tuning): the genetic algorithm (ga), LASSO (glmnet), Elastic Net (glmnet), Support Vector Machines (svmfs), and random forest (Boruta). Performance measures did not converge upon a single best method; as such, researchers should guide their method selection based on what measure of performance they deem most important. Results do, however, indicate that the genetic algorithm is the most widely applicable method, exhibiting minimum error rates in hold-out samples when compared to other variable selection methods. Thus, if little is known of the format of the data by the researcher, choosing to implement the genetic algorithm will provide strong results
    corecore