Search CORE

342 research outputs found

Active Sampling-based Binary Verification of Dynamical Systems

Author: Ali A.
Anguita D.
Bishop C. M.
Brinker K.
Hoxha B.
Montgomery D. C.
Platt J.
Quindlen J. F.
Scholkopf B.
Settles B.
Tipping M. E.
Vapnik V. N.
Publication venue
Publication date: 16/01/2018
Field of study

Nonlinear, adaptive, or otherwise complex control techniques are increasingly relied upon to ensure the safety of systems operating in uncertain environments. However, the nonlinearity of the resulting closed-loop system complicates verification that the system does in fact satisfy those requirements at all possible operating conditions. While analytical proof-based techniques and finite abstractions can be used to provably verify the closed-loop system's response at different operating conditions, they often produce conservative approximations due to restrictive assumptions and are difficult to construct in many applications. In contrast, popular statistical verification techniques relax the restrictions and instead rely upon simulations to construct statistical or probabilistic guarantees. This work presents a data-driven statistical verification procedure that instead constructs statistical learning models from simulated training data to separate the set of possible perturbations into "safe" and "unsafe" subsets. Binary evaluations of closed-loop system requirement satisfaction at various realizations of the uncertainties are obtained through temporal logic robustness metrics, which are then used to construct predictive models of requirement satisfaction over the full set of possible uncertainties. As the accuracy of these predictive statistical models is inherently coupled to the quality of the training data, an active learning algorithm selects additional sample points in order to maximize the expected change in the data-driven model and thus, indirectly, minimize the prediction error. Various case studies demonstrate the closed-loop verification procedure and highlight improvements in prediction error over both existing analytical and statistical verification techniques.Comment: 23 page

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Supervised inference of gene-regulatory networks

Author: A Ben-Hur
A Ng
B Scholkopf
B Scholkopf
Cuong C To
FR Bach
H de Jong
H Wang
I Jollife
Jiri Vohradsky
JR Koza
PT Spellman
T Kato
TI Lee
Y Yamanishi
Z Bar-Joseph
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Inference of protein interaction networks from various sources of data has become an important topic of both systems and computational biology. Here we present a supervised approach to identification of gene expression regulatory networks. Results The method is based on a kernel approach accompanied with genetic programming. As a data source, the method utilizes gene expression time series for prediction of interactions among regulatory proteins and their target genes. The performance of the method was verified using Saccharomyces cerevisiae cell cycle and DNA/RNA/protein biosynthesis gene expression data. The results were compared with independent data sources. Finally, a prediction of novel interactions within yeast gene expression circuits has been performed. Conclusion Results show that our algorithm gives, in most cases, results identical with the independent experiments, when compared with the YEASTRACT database. In several cases our algorithm gives predictions of novel interactions which have not been reported.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Premise Selection for Mathematics by Corpus Analysis and Kernel Methods

Author: A Grabowski
A Riazanov
A Tarski
A Trybulec
B Scholkopf
CM Bishop
Daniel Kühlwein
E Tsivtsivadze
Evgeni Tsivtsivadze
J Meng
J Shawe-Taylor
J Urban
J Urban
J Urban
Jesse Alama
Josef Urban
MD Richard
P Rudnicki
R Rifkin
S Schulz
S Shalev-Shwartz
Tom Heskes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2012
Field of study

Smart premise selection is essential when using automated reasoning as a tool for large-theory formal proof development. A good method for premise selection in complex mathematical libraries is the application of machine learning to large corpora of proofs. This work develops learning-based premise selection in two ways. First, a newly available minimal dependency analysis of existing high-level formal mathematical proofs is used to build a large knowledge base of proof dependencies, providing precise data for ATP-based re-verification and for training premise selection algorithms. Second, a new machine learning algorithm for premise selection based on kernel methods is proposed and implemented. To evaluate the impact of both techniques, a benchmark consisting of 2078 large-theory mathematical problems is constructed,extending the older MPTP Challenge benchmark. The combined effect of the techniques results in a 50% improvement on the benchmark over the Vampire/SInE state-of-the-art system for automated reasoning in large theories.Comment: 26 page

arXiv.org e-Print Archive

Crossref

Radboud Repository

Accelerated Particle Swarm Optimization and Support Vector Machine for Business Optimization and Applications

Author: A. Chatterjee
B. Scholkopf
C. Blum
D.E. Goldberg
G.R. Shi
G.R. Shi
J. Kennedy
J.C. Plate
K. Kim
L.-X. Liu
M. Clerc
N. Lu
P.F. Pai
R. Kohavi
R. Kolisch
S. Hartmann
T. Howley
V. Vapnik
V. Vapnik
X.-S. Yang
X.-S. Yang
X.S. Yang
X.S. Yang
X.S. Yang
Publication venue
Publication date: 01/01/2011
Field of study

Business optimization is becoming increasingly important because all business activities aim to maximize the profit and performance of products and services, under limited resources and appropriate constraints. Recent developments in support vector machine and metaheuristics show many advantages of these techniques. In particular, particle swarm optimization is now widely used in solving tough optimization problems. In this paper, we use a combination of a recently developed Accelerated PSO and a nonlinear support vector machine to form a framework for solving business optimization problems. We first apply the proposed APSO-SVM to production optimization, and then use it for income prediction and project scheduling. We also carry out some parametric studies and discuss the advantages of the proposed metaheuristic SVM.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Middlesex University Research Repository

Probabilistic forecasting of the disturbance storm time index: An autoregressive Gaussian process approach

Author: Ji E. Y.
Krige D. G.
Loskutov A.
Pudovkin M. I.
Scholkopf B.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Intelligent OS X malware threat detection with code inspection

Author: A Case
A Fattori
A Feizollah
A Mohaisen
A Shabtai
B Scholkopf
CJ Burges
G Suarez-Tangil
GG Richard III
J Gardiner
K Shaerpour
M Nauman
M Sun
N Nissim
N Nissim
NV Chawla
P Faruki
RJ Mangialardo
S Garcia
S Huda
SY Yerima
Z Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/10/2017
Field of study

With the increasing market share of Mac OS X operating system, there is a corresponding increase in the number of malicious programs (malware) designed to exploit vulnerabilities on Mac OS X platforms. However, existing manual and heuristic OS X malware detection techniques are not capable of coping with such a high rate of malware. While machine learning techniques offer promising results in automated detection of Windows and Android malware, there have been limited efforts in extending them to OS X malware detection. In this paper, we propose a supervised machine learning model. The model applies kernel base Support Vector Machine (SVM) and a novel weighting measure based on application library calls to detect OS X malware. For training and evaluating the model, a dataset with a combination of 152 malware and 450 benign were is created. Using common supervised Machine Learning algorithm on the dataset, we obtain over 91% detection accuracy with 3.9% false alarm rate. We also utilize Synthetic Minority Over-sampling Technique (SMOTE) to create three synthetic datasets with different distributions based on the refined version of collected dataset to investigate impact of different sample sizes on accuracy of malware detection. Using SMOTE datasets we could achieve over 96% detection accuracy and false alarm of less than 4%. All malware classification experiments are tested using cross validation technique. Our results reflect that increasing sample size in synthetic datasets has direct positive effect on detection accuracy while increases false alarm rate in compare to the original dataset

University of Salford Institutional Repository

Crossref

White Rose Research Online

Prediction of drug–target interaction networks from the integration of chemical and genomic spaces

Author: A. Gutteridge
Cheng
Dobson
Gribskov
Haggarty
Kanehisa
Keiser
Kratochwil
Kuruvilla
M. Araki
M. Kanehisa
Rainsford
Rarey
Scholkopf
Schomburg
Smith
Stockwell
W. Honda
Wheeler
Y. Yamanishi
Yamanishi
Y ld r m
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The identification of interactions between drugs and target proteins is a key area in genomic drug discovery. Therefore, there is a strong incentive to develop new methods capable of detecting these potential drug–target interactions efficiently

Crossref

PubMed Central

Locality-Convolution Kernel and Its Application to Dependency Parse Ranking

Author: A. Zien
B. Scholkopf
B. Schölkopf
E. Tsivtsivadze
H. Lodhi
J. Shawe-Taylor
M. Collins
M.G. Kendall
T. Pahikkala
T. Poggio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Abstract. We propose a Locality-Convolution (LC) kernel in applica-tion to dependency parse ranking. The LC kernel measures parse similar-ities locally, within a small window constructed around each matching feature. Inside the window it makes use of a position sensitive func-tion to take into account the order of the feature appearance. The sim-ilarity between two windows is calculated by computing the product of their common attributes and the kernel value is the sum of the window similarities. We applied the introduced kernel together with Regular-ized Least-Squares (RLS) algorithm to a dataset containing dependency parses obtained from a manually annotated biomedical corpus of 1100 sentences. Our experiments show that RLS with LC kernel performs bet-ter than the baseline method. The results outline the importance of local correlations and the order of feature appearance within the parse. Final validation demonstrates statistically significant increase in parse ranking performance.

CiteSeerX

Crossref

Kernels as features: On kernels, margins, and low-dimensional mappings

Author: A. Ben-Israel
Avrim Blum
B. Scholkopf
C. Cortes
D. Achlioptas
J. Shawe-Taylor
K. R. Muller
Maria-Florina Balcan
N. Littlestone
O. Goldreich
R. Herbrich
S. Ben-David
Santosh Vempala
W. B. Johnson
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

On consensus biomarker selection

Author: A Gambin
A Gambin
Anna Gambin
B Scholkopf
B Wu
BL Adam
C Dwork
CA Smith
EF Petricoin
GA Jones
IJ Jacobs
IT Jolliffe
J Li
Janusz Dutkowski
JS Yu
L Breiman
M Luksza
P Geurts
P Pokarowski
R Tibshirani
RH Lilien
T Hastie
T Speed
V Vapnik
WK Grassmann
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Recent development of mass spectrometry technology enabled the analysis of complex peptide mixtures. A lot of effort is currently devoted to the identification of biomarkers in human body fluids like serum or plasma, based on which new diagnostic tests for different diseases could be constructed. Various biomarker selection procedures have been exploited in recent studies. It has been noted that they often lead to different biomarker lists and as a consequence, the patient classification may also vary. Results Here we propose a new approach to the biomarker selection problem: to apply several competing feature ranking procedures and compute a consensus list of features based on their outcomes. We validate our methods on two proteomic datasets for the diagnosis of ovarian and prostate cancer. Conclusion The proposed methodology can improve the classification results and at the same time provide a unified biomarker list for further biological examinations and interpretation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central