11,844 research outputs found
Classification of protein interaction sentences via gaussian processes
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption
Kernel methods in machine learning
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms
of a kernel. Working in linear spaces of function has the benefit of
facilitating the construction and analysis of learning algorithms while at the
same time allowing large classes of functions. The latter include nonlinear
functions as well as functions defined on nonvectorial data. We cover a wide
range of methods, ranging from binary classifiers to sophisticated methods for
estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fuzzy Least Squares Twin Support Vector Machines
Least Squares Twin Support Vector Machine (LST-SVM) has been shown to be an
efficient and fast algorithm for binary classification. It combines the
operating principles of Least Squares SVM (LS-SVM) and Twin SVM (T-SVM); it
constructs two non-parallel hyperplanes (as in T-SVM) by solving two systems of
linear equations (as in LS-SVM). Despite its efficiency, LST-SVM is still
unable to cope with two features of real-world problems. First, in many
real-world applications, labels of samples are not deterministic; they come
naturally with their associated membership degrees. Second, samples in
real-world applications may not be equally important and their importance
degrees affect the classification. In this paper, we propose Fuzzy LST-SVM
(FLST-SVM) to deal with these two characteristics of real-world data. Two
models are introduced for FLST-SVM: the first model builds up crisp hyperplanes
using training samples and their corresponding membership degrees. The second
model, on the other hand, constructs fuzzy hyperplanes using training samples
and their membership degrees. Numerical evaluation of the proposed method with
synthetic and real datasets demonstrate significant improvement in the
classification accuracy of FLST-SVM when compared to well-known existing
versions of SVM
- …