474,144 research outputs found
Recommended from our members
Quantitative surface field analysis: learning causal models to predict ligand binding affinity and pose.
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification
Integrate Model and Instance Based Machine Learning for Network Intrusion Detection
Indiana University-Purdue University Indianapolis (IUPUI)In computer networks, the convenient internet access facilitates internet services, but at the same time also augments the spread of malicious software which could represent an attack or unauthorized access. Thereby, making the intrusion detection an important area to explore for detecting these unwanted activities. This thesis concentrates on combining the Model and Instance Based Machine Learning for detecting intrusions through a series of algorithms starting from clustering the similar hosts.
Similar hosts have been found based on the supervised machine learning techniques like Support Vector Machines, Decision Trees and K Nearest Neighbors using our proposed Data Fusion algorithm. Maximal cliques of Graph Theory has been explored to find the clusters. A recursive way is proposed to merge the decision areas of best features. The idea is to implement a combination of model and instance based machine learning and analyze how it performs as compared to a conventional machine learning algorithm like Random Forest for intrusion detection. The system has been evaluated on three datasets by CTU-13. The results show that our proposed method gives better detection rate as compared to traditional methods which might overfit the data.
The research work done in model merging, instance based learning, random forests, data mining and ensemble learning with regards to intrusion detection have been studied and taken as reference
Towards a framework for designing full model selection and optimization systems
People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm
Making Fair ML Software using Trustworthy Explanation
Machine learning software is being used in many applications (finance,
hiring, admissions, criminal justice) having a huge social impact. But
sometimes the behavior of this software is biased and it shows discrimination
based on some sensitive attributes such as sex, race, etc. Prior works
concentrated on finding and mitigating bias in ML models. A recent trend is
using instance-based model-agnostic explanation methods such as LIME to find
out bias in the model prediction. Our work concentrates on finding shortcomings
of current bias measures and explanation methods. We show how our proposed
method based on K nearest neighbors can overcome those shortcomings and find
the underlying bias of black-box models. Our results are more trustworthy and
helpful for the practitioners. Finally, We describe our future framework
combining explanation and planning to build fair software.Comment: New Ideas and Emerging Results (NIER) track; The 35th IEEE/ACM
International Conference on Automated Software Engineering; Melbourne,
Australi
DBBRBF- Convalesce optimization for software defect prediction problem using hybrid distribution base balance instance selection and radial basis Function classifier
Software is becoming an indigenous part of human life with the rapid
development of software engineering, demands the software to be most reliable.
The reliability check can be done by efficient software testing methods using
historical software prediction data for development of a quality software
system. Machine Learning plays a vital role in optimizing the prediction of
defect-prone modules in real life software for its effectiveness. The software
defect prediction data has class imbalance problem with a low ratio of
defective class to non-defective class, urges an efficient machine learning
classification technique which otherwise degrades the performance of the
classification. To alleviate this problem, this paper introduces a novel hybrid
instance-based classification by combining distribution base balance based
instance selection and radial basis function neural network classifier model
(DBBRBF) to obtain the best prediction in comparison to the existing research.
Class imbalanced data sets of NASA, Promise and Softlab were used for the
experimental analysis. The experimental results in terms of Accuracy,
F-measure, AUC, Recall, Precision, and Balance show the effectiveness of the
proposed approach. Finally, Statistical significance tests are carried out to
understand the suitability of the proposed model.Comment: 32 pages, 24 Tables, 8 Figures
Part-Based Models Improve Adversarial Robustness
We show that combining human prior knowledge with end-to-end learning can
improve the robustness of deep neural networks by introducing a part-based
model for object classification. We believe that the richer form of annotation
helps guide neural networks to learn more robust features without requiring
more samples or larger models. Our model combines a part segmentation model
with a tiny classifier and is trained end-to-end to simultaneously segment
objects into parts and then classify the segmented object. Empirically, our
part-based models achieve both higher accuracy and higher adversarial
robustness than a ResNet-50 baseline on all three datasets. For instance, the
clean accuracy of our part models is up to 15 percentage points higher than the
baseline's, given the same level of robustness. Our experiments indicate that
these models also reduce texture bias and yield better robustness against
common corruptions and spurious correlations. The code is publicly available at
https://github.com/chawins/adv-part-model.Comment: Code can be found at https://github.com/chawins/adv-part-mode
- …