Search CORE

674 research outputs found

Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

Author: Livi Lorenzo
Rizzi Antonello
Sadeghian Alireza
Publication venue: 'IOS Press'
Publication date: 01/01/2015
Field of study

We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Learning to Predict with Highly Granular Temporal Data: Estimating individual behavioral profiles with smart meter data

Author: Mikhaylov Slava J.
Ushakova Anastasia
Publication venue
Publication date: 15/11/2017
Field of study

Big spatio-temporal datasets, available through both open and administrative data sources, offer significant potential for social science research. The magnitude of the data allows for increased resolution and analysis at individual level. While there are recent advances in forecasting techniques for highly granular temporal data, little attention is given to segmenting the time series and finding homogeneous patterns. In this paper, it is proposed to estimate behavioral profiles of individuals' activities over time using Gaussian Process-based models. In particular, the aim is to investigate how individuals or groups may be clustered according to the model parameters. Such a Bayesian non-parametric method is then tested by looking at the predictability of the segments using a combination of models to fit different parts of the temporal profiles. Model validity is then tested on a set of holdout data. The dataset consists of half hourly energy consumption records from smart meters from more than 100,000 households in the UK and covers the period from 2015 to 2016. The methodological approach developed in the paper may be easily applied to datasets of similar structure and granularity, for example social media data, and may lead to improved accuracy in the prediction of social dynamics and behavior

arXiv.org e-Print Archive

University of Birmingham Research Portal

Quality, Frequency and Similarity Based Fuzzy Nearest Neighbor Classification

Author: Cornelis Chris
Jensen Richard
Verbiest Nele
Publication venue: IEEE Press
Publication date: 01/01/2013
Field of study

This paper proposes an approach based on fuzzy rough set theory to improve nearest neighbor based classification. Six measures are introduced to evaluate the quality of the nearest neighbors. This quality is combined with the frequency at which classes occur among the nearest neighbors and the similarity w.r.t. the nearest neighbor, to decide which class to pick among the neighbor's classes. The importance of each aspect is weighted using optimized weights. An experimental study shows that our method, Quality, Frequency and Similarity based Fuzzy Nearest Neighbor (QFSNN), outperforms state-of-the-art nearest neighbor classifiers

Crossref

Aberystwyth Research Portal

Ghent University Academic Bibliography

Chronic liver disease staging classification based on ultrasound, clinical and laboratorial data

Author: Marinho Rui
Ramalho Fernando
Ribeiro Ricardo
Sanches João
Velosa José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging

Repositório Científico do Instituto Politécnico de Lisboa

Crossref