14 research outputs found

    Optimal Recovery of Local Truth

    Get PDF
    Probability mass curves the data space with horizons. Let f be a multivariate probability density function with continuous second order partial derivatives. Consider the problem of estimating the true value of f(z) > 0 at a single point z, from n independent observations. It is shown that, the fastest possible estimators (like the k-nearest neighbor and kernel) have minimum asymptotic mean square errors when the space of observations is thought as conformally curved. The optimal metric is shown to be generated by the Hessian of f in the regions where the Hessian is definite. Thus, the peaks and valleys of f are surrounded by singular horizons when the Hessian changes signature from Riemannian to pseudo-Riemannian. Adaptive estimators based on the optimal variable metric show considerable theoretical and practical improvements over traditional methods. The formulas simplify dramatically when the dimension of the data space is 4. The similarities with General Relativity are striking but possibly illusory at this point. However, these results suggest that nonparametric density estimation may have something new to say about current physical theory.Comment: To appear in Proceedings of Maximum Entropy and Bayesian Methods 1999. Check also: http://omega.albany.edu:8008

    A Data Quality-Driven View of MLOps

    Full text link
    Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between the two lies in the strong dependency between the quality of a machine learning model and the quality of the data used to train or perform evaluations. In this work, we demonstrate how different aspects of data quality propagate through various stages of machine learning development. By performing a joint analysis of the impact of well-known data quality dimensions and the downstream machine learning process, we show that different components of a typical MLOps pipeline can be efficiently designed, providing both a technical and theoretical perspective

    The non-parametric Parzen's window in stereo vision matching

    Get PDF
    This paper presents an approach to the local stereovision matching problem using edge segments as features with four attributes. From these attributes we compute a matching probability between pairs of features of the stereo images. A correspondence is said true when such a probability is maximum. We introduce a nonparametric strategy based on Parzen's window to estimate a probability density function (PDF) which is used to obtain the matching probability. This is the main finding of the paper. A comparative analysis of other recent matching methods is included to show that this finding can be justified theoretically. A generalization of the proposed method is made in order to give guidelines about its use with the similarity constraint and also in different environments where other features and attributes are more suitable

    Why Is My Classifier Discriminatory?

    Full text link
    Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.Comment: Appeared in Advances in Neural Information Processing Systems (NeurIPS 2018); 3 figures, 8 pages, 6 page supplementar

    Segmentación de imágenes ecográficas mediante máquinas de aprendizaje para la medición del grosor de arterias

    Get PDF
    Una vez revisadas las técnicas más usadas en el campo de la segmentación de imagen, resulta interesante establecer los objetivos a perseguir antes de seleccionar una de ellas. En el caso de este proyecto se desea: 1.Poner en práctica los conocimientos adquiridos durante la carrera acerca de técnicas de reconocimiento de patrones. 2.Aplicar los conocimientos mencionados en el punto anterior sobre volúmenes de datos superiores a los trabajados a lo largo de la titulación. 3.Adquirir conocimiento acerca de la teoría y manejo de técnicas de procesado digital de la señal. 4.Crear un sistema capaz de: -Eliminar el ruido en aquellas imágenes que lo presenten. -Segmentar la línea de media tanto en la near wall como en la far wall. -Medir el diámetro de las arterias que se le pasen y validar los resultados.Escuela Técnica Superior de Ingeniería de TelecomunicaciónUniversidad Politécnica de Cartagen

    Statistical Classifier Design and Evaluation

    Get PDF
    This thesis is concerned with the design and evaluation of statistical classifiers. This problem has an optimal solution with a priori knowledge of the underlying probability distributions. Here, we examine the expected performance of parametric classifiers designed from a finite set of training samples and tested under various conditions. By investigating the statistical properties of the performance bias when tested on the true distributions, we have isolated the effects of the individual design components (i.e., the number of training samples, the dimensionality, and the parameters of the underlying distributions). These results have allowed us to establish a firm theoretical foundation for new design guidelines and to develop an empirical approach for estimating the asymptotic performance. Investigation of the statistical properties of the performance bias when tested on finite sample sets has allowed us to pinpoint the effects of individual design samples, the relationship between the sizes of the design and test sets, and the effects of a dependency between these sets. This, in turn, leads to a better understanding of how a single training set can be used most efficiently. In addition, we have developed a theoretical framework for the analysis and comparison of various performance evaluation procedures. Nonparametric and one-class classifiers are also considered. The reduced Parzen classifier, a nonparametric classifier which combines the error estimation capabilities of the Parzen density estimate with the computational feasibility of parametric classifiers, is presented. Also, the effect of the distance-space mapping in a one-class classifier is discussed through the approximation of the performance of a distance-ranking procedure

    Dimension-reduction and discrimination of neuronal multi-channel signals

    Get PDF
    Dimensionsreduktion und Trennung neuronaler Multikanal-Signale

    Adaptive sequential feature selection in visual perception and pattern recognition

    Get PDF
    In the human visual system, one of the most prominent functions of the extensive feedback from the higher brain areas within and outside of the visual cortex is attentional modulation. The feedback helps the brain to concentrate its resources on visual features that are relevant for recognition, i. e. it iteratively selects certain aspects of the visual scene for refined processing by the lower areas until the inference process in the higher areas converges to a single hypothesis about this scene. In order to minimize a number of required selection-refinement iterations, one has to find a short sequence of maximally informative portions of the visual input. Since the feedback is not static, the selection process is adapted to a scene that should be recognized. To find a scene-specific subset of informative features, the adaptive selection process on every iteration utilizes results of previous processing in order to reduce the remaining uncertainty about the visual scene. This phenomenon inspired us to develop a computational algorithm solving a visual classification task that would incorporate such principle, adaptive feature selection. It is especially interesting because usually feature selection methods are not adaptive as they define a unique set of informative features for a task and use them for classifying all objects. However, an adaptive algorithm selects features that are the most informative for the particular input. Thus, the selection process should be driven by statistics of the environment concerning the current task and the object to be classified. Applied to a classification task, our adaptive feature selection algorithm favors features that maximally reduce the current class uncertainty, which is iteratively updated with values of the previously selected features that are observed on the testing sample. In information-theoretical terms, the selection criterion is the mutual information of a class variable and a feature-candidate conditioned on the already selected features, which take values observed on the current testing sample. Then, the main question investigated in this thesis is whether the proposed adaptive way of selecting features is advantageous over the conventional feature selection and in which situations. Further, we studied whether the proposed adaptive information-theoretical selection scheme, which is a computationally complex algorithm, is utilized by humans while they perform a visual classification task. For this, we constructed a psychophysical experiment where people had to select image parts that as they think are relevant for classification of these images. We present the analysis of behavioral data where we investigate whether human strategies of task-dependent selective attention can be explained by a simple ranker based on the mutual information, a more complex feature selection algorithm based on the conventional static mutual information and the proposed here adaptive feature selector that mimics a mechanism of the iterative hypothesis refinement. Hereby, the main contribution of this work is the adaptive feature selection criterion based on the conditional mutual information. Also it is shown that such adaptive selection strategy is indeed used by people while performing visual classification.:1. Introduction 2. Conventional feature selection 3. Adaptive feature selection 4. Experimental investigations of ACMIFS 5. Information-theoretical strategies of selective attention 6. Discussion Appendix Bibliograph
    corecore