Search CORE

14 research outputs found

Optimal Recovery of Local Truth

Author: Rodriguez Carlos C.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2000
Field of study

Probability mass curves the data space with horizons. Let f be a multivariate probability density function with continuous second order partial derivatives. Consider the problem of estimating the true value of f(z) > 0 at a single point z, from n independent observations. It is shown that, the fastest possible estimators (like the k-nearest neighbor and kernel) have minimum asymptotic mean square errors when the space of observations is thought as conformally curved. The optimal metric is shown to be generated by the Hessian of f in the regions where the Hessian is definite. Thus, the peaks and valleys of f are surrounded by singular horizons when the Hessian changes signature from Riemannian to pseudo-Riemannian. Adaptive estimators based on the optimal variable metric show considerable theoretical and practical improvements over traditional methods. The formulas simplify dramatically when the dimension of the data space is 4. The similarities with General Relativity are striking but possibly illusory at this point. However, these results suggest that nonparametric density estimation may have something new to say about current physical theory.Comment: To appear in Proceedings of Maximum Entropy and Bayesian Methods 1999. Check also: http://omega.albany.edu:8008

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A Data Quality-Driven View of MLOps

Author: Gürel Nezihe Merve
Karlaš Bojan
Renggli Cedric
Rimanic Luka
Wu Wentao
Zhang Ce
Publication venue
Publication date: 01/01/2021
Field of study

Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between the two lies in the strong dependency between the quality of a machine learning model and the quality of the data used to train or perform evaluations. In this work, we demonstrate how different aspects of data quality propagate through various stages of machine learning development. By performing a joint analysis of the impact of well-known data quality dimensions and the downstream machine learning process, we show that different components of a typical MLOps pipeline can be efficiently designed, providing both a technical and theoretical perspective

arXiv.org e-Print Archive

Repository for Publications and Research Data

The non-parametric Parzen's window in stereo vision matching

Author: Cruz García Jesús Manuel de la
Pajares Martinsanz Gonzalo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2002
Field of study

This paper presents an approach to the local stereovision matching problem using edge segments as features with four attributes. From these attributes we compute a matching probability between pairs of features of the stereo images. A correspondence is said true when such a probability is maximum. We introduce a nonparametric strategy based on Parzen's window to estimate a probability density function (PDF) which is used to obtain the matching probability. This is the main finding of the paper. A comparative analysis of other recent matching methods is included to show that this finding can be justified theoretically. A generalization of the proposed method is made in order to give guidelines about its use with the similarity constraint and also in different environments where other features and attributes are more suitable

Docta Complutense

Why Is My Classifier Discriminatory?

Author: Chen Irene
Johansson Fredrik D.
Sontag David
Publication venue
Publication date: 01/01/2018
Field of study

Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.Comment: Appeared in Advances in Neural Information Processing Systems (NeurIPS 2018); 3 figures, 8 pages, 6 page supplementar

arXiv.org e-Print Archive

Chalmers Research

Segmentación de imágenes ecográficas mediante máquinas de aprendizaje para la medición del grosor de arterias

Author: Esteban Sánchez Carolina
Publication venue
Publication date: 02/12/2014
Field of study

Una vez revisadas las técnicas más usadas en el campo de la segmentación de imagen, resulta interesante establecer los objetivos a perseguir antes de seleccionar una de ellas. En el caso de este proyecto se desea: 1.Poner en práctica los conocimientos adquiridos durante la carrera acerca de técnicas de reconocimiento de patrones. 2.Aplicar los conocimientos mencionados en el punto anterior sobre volúmenes de datos superiores a los trabajados a lo largo de la titulación. 3.Adquirir conocimiento acerca de la teoría y manejo de técnicas de procesado digital de la señal. 4.Crear un sistema capaz de: -Eliminar el ruido en aquellas imágenes que lo presenten. -Segmentar la línea de media tanto en la near wall como en la far wall. -Medir el diámetro de las arterias que se le pasen y validar los resultados.Escuela Técnica Superior de Ingeniería de TelecomunicaciónUniversidad Politécnica de Cartagen

Repositorio Digital de la Universidad Politécnica de Cartagena

Statistical Classifier Design and Evaluation

Author: Fukunaga Keinosuke
Hayes Raymond Reynolds
Publication venue: 'Purdue University (bepress)'
Publication date: 01/05/1988
Field of study

This thesis is concerned with the design and evaluation of statistical classifiers. This problem has an optimal solution with a priori knowledge of the underlying probability distributions. Here, we examine the expected performance of parametric classifiers designed from a finite set of training samples and tested under various conditions. By investigating the statistical properties of the performance bias when tested on the true distributions, we have isolated the effects of the individual design components (i.e., the number of training samples, the dimensionality, and the parameters of the underlying distributions). These results have allowed us to establish a firm theoretical foundation for new design guidelines and to develop an empirical approach for estimating the asymptotic performance. Investigation of the statistical properties of the performance bias when tested on finite sample sets has allowed us to pinpoint the effects of individual design samples, the relationship between the sizes of the design and test sets, and the effects of a dependency between these sets. This, in turn, leads to a better understanding of how a single training set can be used most efficiently. In addition, we have developed a theoretical framework for the analysis and comparison of various performance evaluation procedures. Nonparametric and one-class classifiers are also considered. The reduced Parzen classifier, a nonparametric classifier which combines the error estimation capabilities of the Parzen density estimate with the computational feasibility of parametric classifiers, is presented. Also, the effect of the distance-space mapping in a one-class classifier is discussed through the approximation of the performance of a distance-ranking procedure

Purdue E-Pubs

Dimension-reduction and discrimination of neuronal multi-channel signals

Author: Kremper Helmut Alexander
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2006
Field of study

Dimensionsreduktion und Trennung neuronaler Multikanal-Signale

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Adaptive sequential feature selection in visual perception and pattern recognition

Author: Avdiyenko Liliya
Publication venue
Publication date: 15/09/2014
Field of study

In the human visual system, one of the most prominent functions of the extensive feedback from the higher brain areas within and outside of the visual cortex is attentional modulation. The feedback helps the brain to concentrate its resources on visual features that are relevant for recognition, i. e. it iteratively selects certain aspects of the visual scene for refined processing by the lower areas until the inference process in the higher areas converges to a single hypothesis about this scene. In order to minimize a number of required selection-refinement iterations, one has to find a short sequence of maximally informative portions of the visual input. Since the feedback is not static, the selection process is adapted to a scene that should be recognized. To find a scene-specific subset of informative features, the adaptive selection process on every iteration utilizes results of previous processing in order to reduce the remaining uncertainty about the visual scene. This phenomenon inspired us to develop a computational algorithm solving a visual classification task that would incorporate such principle, adaptive feature selection. It is especially interesting because usually feature selection methods are not adaptive as they define a unique set of informative features for a task and use them for classifying all objects. However, an adaptive algorithm selects features that are the most informative for the particular input. Thus, the selection process should be driven by statistics of the environment concerning the current task and the object to be classified. Applied to a classification task, our adaptive feature selection algorithm favors features that maximally reduce the current class uncertainty, which is iteratively updated with values of the previously selected features that are observed on the testing sample. In information-theoretical terms, the selection criterion is the mutual information of a class variable and a feature-candidate conditioned on the already selected features, which take values observed on the current testing sample. Then, the main question investigated in this thesis is whether the proposed adaptive way of selecting features is advantageous over the conventional feature selection and in which situations. Further, we studied whether the proposed adaptive information-theoretical selection scheme, which is a computationally complex algorithm, is utilized by humans while they perform a visual classification task. For this, we constructed a psychophysical experiment where people had to select image parts that as they think are relevant for classification of these images. We present the analysis of behavioral data where we investigate whether human strategies of task-dependent selective attention can be explained by a simple ranker based on the mutual information, a more complex feature selection algorithm based on the conventional static mutual information and the proposed here adaptive feature selector that mimics a mechanism of the iterative hypothesis refinement. Hereby, the main contribution of this work is the adaptive feature selection criterion based on the conditional mutual information. Also it is shown that such adaptive selection strategy is indeed used by people while performing visual classification.:1. Introduction 2. Conventional feature selection 3. Adaptive feature selection 4. Experimental investigations of ACMIFS 5. Information-theoretical strategies of selective attention 6. Discussion Appendix Bibliograph

Qucosa - Publikationsserver der Universität Leipzig