10,963 research outputs found
Recommended from our members
Towards solving computer vision problems: datasets, labels, algorithms, and applications
The solution to a supervised computer vision problem consists of an application, algorithm, input data, and a set of human generated labels. Solving these kinds of tasks involves collecting large quantities of data, collecting appropriate labels, and developing machine vision algorithms tailored to the application. Progress on these problems has often benefited from large scale datasets with high fidelity labels. Successful algorithms display a synergy between application goals and the size and quality of the dataset. This thesis presents work highlighting the importance of each component of a supervised vision task.First, the problem of automatically classifying groups of people into social categories is introduced. This problem is called Urban Tribe Classification. To tackle this problem, each individual and the entire group of individuals are modeled. Since this was a newly introduced computer vision problem, a dataset for this task was created. On this dataset, the combined representation of group and individuals outperforms using only the person representations. This model showed promising results for automatic subculture classification.Second, the problem of creating perceptual embeddings based on human similarity judgements is tackled. This work focuses on triplet similarity comparisons of the form ``Is object more similar to or ?'', which have been useful for computer vision and machine learning applications. Unfortunately, triplet similarity comparisons, like many human labeling efforts, can be prohibitively expensive. This work proposes two techniques for dealing with this obstacle. First, an alternative display for collecting triplets is designed. This display shows a probe image and a grid of query images, allowing the user to collect multiple triplets simultaneously. The display is shown to reduce the cost and time of triplet collection. In addition, higher quality embeddings are created with the improved triplet collection UI. A 10,000-food item dataset of human taste similarity was created using this UI. Second, ``SNaCK,'' a low-dimensional perceptual embedding algorithm that combines human expertise with automatic machine kernels, is introduced. Both parts are complementary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. Finally, the precise localization of key frames of an action is explored. This work focuses on detecting the exact starting frame of a behavior, an important task for neuroscience research. To address this problem, a loss designed to penalize extra and missed action start detections over small misalignments. Recurrent neural networks (RNN) are trained to optimize this loss. The model is shown to reduce the number of false positives, an important criteria defined by the neuroscientist. The performance of the model is evaluated on a new dataset, the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was created for neuroscience research. On this dataset, the proposed model outperforms related approaches and baseline methods using an unstructured loss
Gender and Ethnicity Classification Using Partial Face in Biometric Applications
As the number of biometric applications increases, the use of non-ideal information such as images which are not strictly controlled, images taken covertly, or images where the main interest is partially occluded, also increases. Face images are a specific example of this. In these non-ideal instances, other information, such as gender and ethnicity, can be determined to narrow the search space and/or improve the recognition results. Some research exists for gender classification using partial-face images, but there is little research involving ethnic classifications on such images. Few datasets have had the ethnic diversity needed and sufficient subjects for each ethnicity to perform this evaluation. Research is also lacking on how gender and ethnicity classifications on partial face are impacted by age. If the extracted gender and ethnicity information is to be integrated into a larger system, some measure of the reliability of the extracted information is needed. This study will provide an analysis of gender and ethnicity classification on large datasets captured by non-researchers under day-to-day operations using texture, color, and shape features extracted from partial-face regions. This analysis will allow for a greater understanding of the limitations of various facial regions for gender and ethnicity classifications. These limitations will guide the integration of automatically extracted partial-face gender and ethnicity information with a biometric face application in order to improve recognition under non-ideal circumstances. Overall, the results from this work showed that reliable gender and ethnic classification can be achieved from partial face images. Different regions of the face hold varying amount of gender and ethnicity information. For machine classification, the upper face regions hold more ethnicity information while the lower face regions hold more gender information. All regions were impacted by age, but the eyes were impacted the most in texture and color. The shape of the nose changed more with respect to age than any of the other regions
Greybox XAI: a Neural-Symbolic learning framework to produce interpretable predictions for image classification
Although Deep Neural Networks (DNNs) have great generalization and prediction capabilities, their
functioning does not allow a detailed explanation of their behavior. Opaque deep learning models are
increasingly used to make important predictions in critical environments, and the danger is that they make
and use predictions that cannot be justified or legitimized. Several eXplainable Artificial Intelligence (XAI)
methods that separate explanations from machine learning models have emerged, but have shortcomings
in faithfulness to the model actual functioning and robustness. As a result, there is a widespread agreement
on the importance of endowing Deep Learning models with explanatory capabilities so that they can
themselves provide an answer to why a particular prediction was made. First, we address the problem
of the lack of universal criteria for XAI by formalizing what an explanation is. We also introduced a
set of axioms and definitions to clarify XAI from a mathematical perspective. Finally, we present the
Greybox XAI, a framework that composes a DNN and a transparent model thanks to the use of a symbolic
Knowledge Base (KB). We extract a KB from the dataset and use it to train a transparent model (i.e., a
logistic regression). An encoder-decoder architecture is trained on RGB images to produce an output
similar to the KB used by the transparent model. Once the two models are trained independently, they
are used compositionally to form an explainable predictive model. We show how this new architecture is
accurate and explainable in several datasets.French ANRT (AssociationNationale Recherche Technologie - ANRT)SEGULA TechnologiesJuan de la Cierva Incorporacion grant - MCIN/AEI by "ESF Investing in your future" I JC2019-039152-IGoogle Research Scholar ProgramDepartment of Education of the Basque Government (Consolidated Research Group MATHMODE) IT1456-2
- …