Search CORE

474,144 research outputs found

Recommended from our members

Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose.

Author: Cleves Ann E
Jain Ajay N
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification

eScholarship - University of California

Integrate Model and Instance Based Machine Learning for Network Intrusion Detection

Author: Ara Lena
Publication venue
Publication date: 01/01/2018
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)In computer networks, the convenient internet access facilitates internet services, but at the same time also augments the spread of malicious software which could represent an attack or unauthorized access. Thereby, making the intrusion detection an important area to explore for detecting these unwanted activities. This thesis concentrates on combining the Model and Instance Based Machine Learning for detecting intrusions through a series of algorithms starting from clustering the similar hosts. Similar hosts have been found based on the supervised machine learning techniques like Support Vector Machines, Decision Trees and K Nearest Neighbors using our proposed Data Fusion algorithm. Maximal cliques of Graph Theory has been explored to find the clusters. A recursive way is proposed to merge the decision areas of best features. The idea is to implement a combination of model and instance based machine learning and analyze how it performs as compared to a conventional machine learning algorithm like Random Forest for intrusion detection. The system has been evaluated on three datasets by CTU-13. The results show that our proposed method gives better detection rate as compared to traditional methods which might overfit the data. The research work done in model merging, instance based learning, random forests, data mining and ensemble learning with regards to intrusion detection have been studied and taken as reference

IUPUIScholarWorks

Purdue E-Pubs

FigShare

Towards a framework for designing full model selection and optimization systems

Author: Mayo Michael
Pfahringer Bernhard
Sun Quan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm

Research Commons@Waikato

Making Fair ML Software using Trustworthy Explanation

Author: Aggarwal Aniya
Calmon Flavio
Galhotra Sainyam
Guy
Kamishima Toshihiro
Lundberg Scott M
Pleiss Geoff
Ribeiro Marco Tulio
Udeshi Sakshi
Zemel Rich
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/08/2020
Field of study

Machine learning software is being used in many applications (finance, hiring, admissions, criminal justice) having a huge social impact. But sometimes the behavior of this software is biased and it shows discrimination based on some sensitive attributes such as sex, race, etc. Prior works concentrated on finding and mitigating bias in ML models. A recent trend is using instance-based model-agnostic explanation methods such as LIME to find out bias in the model prediction. Our work concentrates on finding shortcomings of current bias measures and explanation methods. We show how our proposed method based on K nearest neighbors can overcome those shortcomings and find the underlying bias of black-box models. Our results are more trustworthy and helpful for the practitioners. Finally, We describe our future framework combining explanation and planning to build fair software.Comment: New Ideas and Emerging Results (NIER) track; The 35th IEEE/ACM International Conference on Automated Software Engineering; Melbourne, Australi

arXiv.org e-Print Archive

Crossref

DBBRBF- Convalesce optimization for software defect prediction problem using hybrid distribution base balance instance selection and radial basis Function classifier

Author: Panda Mrutyunjaya
Publication venue
Publication date: 08/06/2018
Field of study

Software is becoming an indigenous part of human life with the rapid development of software engineering, demands the software to be most reliable. The reliability check can be done by efficient software testing methods using historical software prediction data for development of a quality software system. Machine Learning plays a vital role in optimizing the prediction of defect-prone modules in real life software for its effectiveness. The software defect prediction data has class imbalance problem with a low ratio of defective class to non-defective class, urges an efficient machine learning classification technique which otherwise degrades the performance of the classification. To alleviate this problem, this paper introduces a novel hybrid instance-based classification by combining distribution base balance based instance selection and radial basis function neural network classifier model (DBBRBF) to obtain the best prediction in comparison to the existing research. Class imbalanced data sets of NASA, Promise and Softlab were used for the experimental analysis. The experimental results in terms of Accuracy, F-measure, AUC, Recall, Precision, and Balance show the effectiveness of the proposed approach. Finally, Statistical significance tests are carried out to understand the suitability of the proposed model.Comment: 32 pages, 24 Tables, 8 Figures

arXiv.org e-Print Archive

Part-Based Models Improve Adversarial Robustness

Author: Carlini Nicholas
Chen Yizheng
Pongmala Kornrapat
Sitawarin Chawin
Wagner David
Publication venue
Publication date: 15/09/2022
Field of study

We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object. Empirically, our part-based models achieve both higher accuracy and higher adversarial robustness than a ResNet-50 baseline on all three datasets. For instance, the clean accuracy of our part models is up to 15 percentage points higher than the baseline's, given the same level of robustness. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations. The code is publicly available at https://github.com/chawins/adv-part-model.Comment: Code can be found at https://github.com/chawins/adv-part-mode

arXiv.org e-Print Archive