166,404 research outputs found
Prediction of protein-protein interactions using one-class classification methods and integrating diverse data
This research addresses the problem of prediction of protein-protein interactions (PPI)
when integrating diverse kinds of biological information. This task has been commonly
viewed as a binary classification problem (whether any two proteins do or do not interact)
and several different machine learning techniques have been employed to solve this
task. However the nature of the data creates two major problems which can affect results.
These are firstly imbalanced class problems due to the number of positive examples (pairs
of proteins which really interact) being much smaller than the number of negative ones.
Secondly the selection of negative examples can be based on some unreliable assumptions
which could introduce some bias in the classification results.
Here we propose the use of one-class classification (OCC) methods to deal with the task of
prediction of PPI. OCC methods utilise examples of just one class to generate a predictive
model which consequently is independent of the kind of negative examples selected; additionally
these approaches are known to cope with imbalanced class problems. We have
designed and carried out a performance evaluation study of several OCC methods for this
task, and have found that the Parzen density estimation approach outperforms the rest. We
also undertook a comparative performance evaluation between the Parzen OCC method
and several conventional learning techniques, considering different scenarios, for example
varying the number of negative examples used for training purposes. We found that the
Parzen OCC method in general performs competitively with traditional approaches and in
many situations outperforms them. Finally we evaluated the ability of the Parzen OCC
approach to predict new potential PPI targets, and validated these results by searching for
biological evidence in the literature
Reliable microsatellite genotyping of the Eurasian badger (Meles meles) using faecal DNA
The potential link between badgers and bovine tuberculosis has made it vital to develop
accurate techniques to census badgers. Here we investigate the potential of using genetic
profiles obtained from faecal DNA as a basis for population size estimation. After trialling
several methods we obtained a high amplification success rate (89%) by storing faeces in
70% ethanol and using the guanidine thiocyanate/silica method for extraction. Using 70%
ethanol as a storage agent had the advantage of it being an antiseptic. In order to obtain reliable
genotypes with fewer amplification reactions than the standard multiple-tubes
approach, we devised a comparative approach in which genetic profiles were compared
and replication directed at similar, but not identical, genotypes. This modified method
achieved a reduction in polymerase chain reactions comparable with the maximumlikelihood
model when just using reliability criteria, and was slightly better when using
reliability criteria with the additional proviso that alleles must be observed twice to be considered
reliable. Our comparative approach would be best suited for studies that include
multiple faeces from each individual. We utilized our approach in a well-studied population
of badgers from which individuals had been sampled and reliable genotypes obtained.
In a study of 53 faeces sampled from three social groups over 10 days, we found that direct
enumeration could not be used to estimate population size, but that the application of
mark–recapture models has the potential to provide more accurate results
Evaluating probabilistic forecasts with scoringRules
Probabilistic forecasts in the form of probability distributions over future
events have become popular in several fields including meteorology, hydrology,
economics, and demography. In typical applications, many alternative
statistical models and data sources can be used to produce probabilistic
forecasts. Hence, evaluating and selecting among competing methods is an
important task. The scoringRules package for R provides functionality for
comparative evaluation of probabilistic models based on proper scoring rules,
covering a wide range of situations in applied work. This paper discusses
implementation and usage details, presents case studies from meteorology and
economics, and points to the relevant background literature
- …