158 research outputs found

    Adversarial content manipulation for analyzing and improving model robustness

    Get PDF
    The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewĂ€hrleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche AusnahmefĂ€lle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulĂ€ssigen Variationen einer Eingabe generieren und die Ausgabe des Systems ĂŒberprĂŒfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu ĂŒberprĂŒfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu Ă€ndern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu Ă€ndern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezĂŒglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die DiversitĂ€t von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwĂ€hlen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von fĂŒr die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die fĂŒr das Modell möglichst ungĂŒnstig sind. Zum Beispiel Ă€ndern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung tĂ€uschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern

    Taking Politics at Face Value: How Features Expose Ideology

    Get PDF
    Previous studies using computer vision neural networks to analyze facial images have uncovered patterns in the feature extracted output that are indicative of individual dispositions. For example, Wang and Kosinski (2018) were able to predict the sexual orientation of a target from his or her facial image with surprising accuracy, while Kosinski (2021) was able to do the same in regards to political orientation. These studies suggest that computer vision neural networks can be used to classify people into categories using only their facial images.However, there is some ambiguity in regards to the degree to which these features extracted from facial images incorporate facial morphology when used to make predictions. Critics have suggested that a subject’s transient facial features, such as using makeup, having a tan, donning a beard, or wearing glasses, might be subtly indicative of group belonging (AgĂŒera y Arcas et al., 2018). Further, previous research in this domain has found that accurate image categorization can occur without utilizing facial morphology at all, instead relying upon image brightness, color dominance, or the background of the image to make successful classifications (Leuner, 2019; Wang, 2022). This dissertation seeks to bring some clarity to this domain. Using an application programming interface (API) for the popular social networking site Twitter, a sample of nearly a quarter million images of ideological organization followers was created. These images were followers of organizations supportive of, or oppositional to, the polarizing political issues of gun control and immigration. Through a series of strong comparisons, this research tests for the influence of facial morphology in image categorization. Facial images were converted into point and mesh coordinate representations of the subjects’ faces, thus eliminating the influence of transient facial features. Images were able to be classified using facial morphology alone at rates well above chance (64% accuracy across all models utilizing only facial points, 62% using facial mesh). These results provide the strongest evidence to date that images can be categorized into social categories by their facial morphology alone

    A machine learning application in wine quality prediction

    Get PDF
    The wine business relies heavily on wine quality certification. The excellence of New Zealand Pinot noir wines is well-known worldwide. Our major goal in this research is to predict wine quality by generating synthetic data and construct a machine learning model based on this synthetic data and available experimental data collected from different and diverse regions across New Zealand. We utilised 18 Pinot noir wine samples with 54 different characteristics (7 physiochemical and 47 chemical features). We generated 1381 samples from 12 original samples using the SMOTE method, and six samples were preserved for model testing. The findings were compared using four distinct feature selection approaches. Important attributes (referred as essential variables) that were shown to be relevant in at least three feature selection methods were utilised to predict wine quality. Seven machine learning algorithms were trained and tested on a holdout original sample. Adaptive Boosting (AdaBoost) classifier showed 100% accuracy when trained and evaluated without feature selection, with feature selection (XGB), and with essential variables (features found important in at least three feature selection methods). In the presence of essential variables, the Random Forest (RF) classifier performance was increased

    Explainable contextual data driven fusion

    Get PDF
    Numerous applications require the intelligent combining of disparate sensor data streams to create a more complete and enhanced observation in support of underlying tasks like classification, regression, or decision making. This presentation is focused on two underappreciated and often overlooked parts of information fusion, explainability and context. Due to the rapidly increasing deployment and complexity of machine learning solutions, it is critical that the humans who deploy these algorithms can understand why and how a given algorithm works, as well as be able to determine when an algorithm is suitable for use in a particular instance of the problem. The first half of this paper outlines a new similarity measure for capacities and integrals. This measure is used to compare machine learned fusion solutions and explain what a single fusion solution learned. The second half of the paper is focused on contextual fusion with respect to incomplete (limited knowledge) models and metadata for unmanned aerial vehicles (UAVs). Example UAV metadata includes platform (e.g., GPS, IMU, etc.) and environmental (e.g., weather, solar position, etc.) data. Incomplete models herein are a result of limitations of machine learning related to under-sampling of training data. To address these challenges, a new contextually adaptive online Choquet integral is outlined

    Conditional Invertible Generative Models for Supervised Problems

    Get PDF
    Invertible neural networks (INNs), in the setting of normalizing flows, are a type of unconditional generative likelihood model. Despite various attractive properties compared to other common generative model types, they are rarely useful for supervised tasks or real applications due to their unguided outputs. In this work, we therefore present three new methods that extend the standard INN setting, falling under a broader category we term generative invertible models. These new methods allow leveraging the theoretical and practical benefits of INNs to solve supervised problems in new ways, including real-world applications from different branches of science. The key finding is that our approaches enhance many aspects of trustworthiness in comparison to conventional feed-forward networks, such as uncertainty estimation and quantification, explainability, and proper handling of outlier data

    Early detection of health changes in the elderly using in-home multi-sensor data streams

    Get PDF
    The rapid aging of the population worldwide requires increased attention from health care providers and the entire society. For the elderly to live independently, many health issues related to old age, such as frailty and risk of falling, need increased attention and monitoring. When monitoring daily routines for older adults, it is desirable to detect the early signs of health changes before serious health events, such as hospitalizations, happen, so that timely and adequate preventive care may be provided. By deploying multi-sensor systems in homes of the elderly, we can track trajectories of daily behaviors in a feature space defined using the sensor data. In this work, we investigate a methodology for learning data distribution from streaming data and tracking the evolution of the behavior trajectories over long periods (years) using high dimensional streaming clustering and provide very early indicators of changes in health. If we assume that habitual behaviors correspond to clusters in feature space and diseases produce a change in behavior, albeit not highly specific, tracking trajectory deviations can provide hints of early illness. Retrospectively, we visualize the streaming clustering results and track how the behavior clusters evolve in feature space with the help of two dimension-reduction algorithms, Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). Moreover, our tracking algorithm in the original high dimensional feature space generates early health warning alerts if a negative trend is detected in the behavior trajectory. We validated our algorithm on synthetic data, real-world data and tested it on a pilot dataset of four TigerPlace residents monitored with a collection of motion, bed, and depth sensors over ten years. We used the TigerPlace electronic health records (EHR) to understand the residents' behavior patterns and to evaluate and explain the health warnings generated by our algorithm. The results obtained on the TigerPlace dataset show that most of the warnings produced by our algorithm can be linked to health events documented in the EHR, providing strong support for a prospective deployment of the approach.Includes bibliographical references

    Cognitive lexicon

    Get PDF
    • 

    corecore