6 research outputs found
Feature Relevance Bounds for Ordinal Regression
The increasing occurrence of ordinal data, mainly sociodemographic, led to a renewed research interest in ordinal regression, i.e. the prediction of ordered classes. Besides model accuracy, the interpretation of these models itself is of high relevance, and existing approaches therefore enforce e.g. model sparsity. For high dimensional or highly correlated data, however, this might be misleading due to strong variable dependencies. In this contribution, we aim for an identification of feature relevance bounds which - besides identifying all relevant features - explicitly differentiates between strongly and weakly relevant features
FRI -- Feature Relevance Intervals for Interpretable and Interactive Data Exploration
Most existing feature selection methods are insufficient for analytic
purposes as soon as high dimensional data or redundant sensor signals are dealt
with since features can be selected due to spurious effects or correlations
rather than causal effects. To support the finding of causal features in
biomedical experiments, we hereby present FRI, an open source Python library
that can be used to identify all-relevant variables in linear classification
and (ordinal) regression problems. Using the recently proposed feature
relevance method, FRI is able to provide the base for further general
experimentation or in specific can facilitate the search for alternative
biomarkers. It can be used in an interactive context, by providing model
manipulation and visualization methods, or in a batch process as a filter
method.Comment: Addition of IEEE copyright notice. Accepted for CIBCB 2019
(https://cibcb2019.icas.xyz/
Time-series representation framework based on multi-instance similarity measures
Time series analysis plays an essential role in today’s society due to the ease of access to information. This analysis is present in the majority of applications that involve sensors, but in recent years thanks to technological advancement, this approach has been directed towards the treatment of complex signals that lack periodicity and even that present non-stationary dynamics such as signals of brain activity or magnetic and satellite resonance images. The main challenges at the time of time series analysis are focused on the representation of the same, for which methodologies based on similarity measures have been proposed. However, these approaches are oriented to the measurement of local patterns point-to-point in the signals using metrics based on the form. Besides, the selection of relevant information from the representations is of high importance, in order to eliminate noise and train classifiers with discriminant information for the analysis tasks, however, this selection is usually made at the level of characteristics, leaving aside the Global signal information. In the same way, lately, there have been applications in which it is necessary to analyze time series from different sources of information or multimodal, for which there are methods that generate acceptable performance but lack interpretability. In this regard, we propose a framework based on representations of similarity and multiple-instance learning that allows selecting relevant information for classification tasks in order to improve the performance and interpretability of the modelsResumen: El análisis de series de tiempo juega un papel importante en la sociedad actual debido a la facilidad de acceso a la información. Este análisis está presente en la mayorÃa de aplicaciones que involucran sensores, pero en los ´últimos años gracias al avance tecnológico, este enfoque se ha encaminado hacia el tratamiento de señales complejas que carecen de periodicidad e incluso que presentan dinámicas no estacionarias como lo son las señales de actividad cerebral o las imágenes de resonancias magnéticas y satelitales. Los principales retos a la hora de realizar en análisis de series de tiempo se centran en la representación de las mismas, para lo cual se han propuesto metodologÃas basadas en medidas de similitud, sin embargo, estos enfoques están orientados a la medición de patrones locales punto a punto en las señales utilizando métricas basadas en la forma. Además, es de alta importancia la selección de información relevante de las representaciones, con el fin de eliminar el ruido y entrenar clasificadores con información discriminante para las tareas de análisis, sin embargo, esta selección se suele hacer a nivel de caracterÃsticas, dejando de lado la información de global de la señal. De la misma manera, ´últimamente han surgido aplicaciones en las cuales es necesario el análisis de series de tiempo provenientes de diferentes fuentes de información o multimodales, para lo cual existen métodos que generan un rendimiento aceptable, pero carecen de interpretabilidad. En este sentido, en nosotros proponemos un marco de trabajo basado en representaciones de similitud y aprendizaje de múltiples instancias que permita seleccionar información relevante para tareas de clasificación con el fin de mejorar el rendimiento y la interpretabilidad de los modelosMaestrÃ
Feature Relevance Bounds for Ordinal Regression
Pfannschmidt L, Jakob J, Biehl M, Tino P, Hammer B. Feature Relevance Bounds for Ordinal Regression. In: Verleysen M, ed. Proceedings of the 27th European Symposium on Artificial Neural Networks (ESANN 2019). Louvain-la-Neuve: i6doc; 2019.The increasing occurrence of ordinal data, mainly sociodemographic, led to a
renewed research interest in ordinal regression, i.e. the prediction of ordered
classes. Besides model accuracy, the interpretation of these models itself is
of high relevance, and existing approaches therefore enforce e.g. model
sparsity. For high dimensional or highly correlated data, however, this might
be misleading due to strong variable dependencies. In this contribution, we aim
for an identification of feature relevance bounds which - besides identifying
all relevant features - explicitly differentiates between strongly and weakly
relevant features