19 research outputs found

    Reliably Capture Local Clusters in Noisy Domains From Parallel Universes

    Get PDF
    When seeking for small local patterns it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We propose a new approach that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when the temporal information is ignored. This is achieved by making use of the fact that noise does not reproduce its incidental structure but even small patterns do. In particular, we developed a method to track clusters over time based on an optimal match of data partitions between time periods

    FoodAuthent – Developing a System for Food Authenticity by Collecting, Analyzing and Utilizing Product Data

    Get PDF
    The research project FoodAuthent aims at providing the technical framework and incentives for the routine use of “fingerprinting” to secure and monitor food quality. The planned system captures, analyses and processes data on the chemical fingerprint of food and can prove its authenticity. For this purpose, cloud-based fingerprinting databases are combined with methods of data analysis and batch-specific product information. The project focuses on proof of origin and fraud detection of food as well as on analytical methods for the product categories cheese, oil and spirits

    Dagstuhl News January - December 2007

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    07181 Abstracts Collection -- Parallel Universes and Local Patterns

    Get PDF
    From 1 May 2007 to 4 May 2007 the Dagstuhl Seminar 07181 ``Parallel Universes and Local Patterns\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    A Matrix Factorization Approach for Integrating Multiple Data Views

    Full text link

    Association Discovery in Two-View Data

    Get PDF
    International audienceTwo-view datasets are datasets whose attributes are naturally split into two sets, each providing a different view on the same set of objects. We introduce the task of finding small and non-redundant sets of associations that describe how the two views are related. To achieve this, we propose a novel approach in which sets of rules are used to translate one view to the other and vice versa. Our models, dubbed translation tables, contain both unidirectional and bidirectional rules that span both views and provide lossless translation from either of the views to the opposite view. To be able to evaluate different translation tables and perform model selection, we present a score based on the Minimum Description Length (MDL) principle. Next, we introduce three TRANSLATOR algorithms to find good models according to this score. The first algorithm is parameter-free and iteratively adds the rule that improves compression most. The other two algorithms use heuristics to achieve better trade-offs between runtime and compression. The empirical evaluation on real-world data demonstrates that only modest numbers of associations are needed to characterize the two-view structure present in the data, while the obtained translation rules are easily interpretable and provide insight into the data

    Deep Active Learning Explored Across Diverse Label Spaces

    Get PDF
    abstract: Deep learning architectures have been widely explored in computer vision and have depicted commendable performance in a variety of applications. A fundamental challenge in training deep networks is the requirement of large amounts of labeled training data. While gathering large quantities of unlabeled data is cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. Thus, developing algorithms that minimize the human effort in training deep models is of immense practical importance. Active learning algorithms automatically identify salient and exemplar samples from large amounts of unlabeled data and can augment maximal information to supervised learning models, thereby reducing the human annotation effort in training machine learning models. The goal of this dissertation is to fuse ideas from deep learning and active learning and design novel deep active learning algorithms. The proposed learning methodologies explore diverse label spaces to solve different computer vision applications. Three major contributions have emerged from this work; (i) a deep active framework for multi-class image classication, (ii) a deep active model with and without label correlation for multi-label image classi- cation and (iii) a deep active paradigm for regression. Extensive empirical studies on a variety of multi-class, multi-label and regression vision datasets corroborate the potential of the proposed methods for real-world applications. Additional contributions include: (i) a multimodal emotion database consisting of recordings of facial expressions, body gestures, vocal expressions and physiological signals of actors enacting various emotions, (ii) four multimodal deep belief network models and (iii) an in-depth analysis of the effect of transfer of multimodal emotion features between source and target networks on classification accuracy and training time. These related contributions help comprehend the challenges involved in training deep learning models and motivate the main goal of this dissertation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Learning in Parallel Universes

    No full text
    Klassische Data Mining Verfahren beruhen praktisch immer auf einer eindeutigen Objektdarstellung, häufig in Form eines hochdimensionalen Attributvektors pro Objekt. In vielen Anwendungsbereichen zeigt sich jedoch, dass die zu analysierenden Objekte (Moleküle, 3D Modelle, Prozesse) auf vielfältige Weise beschrieben werden können, was zu einer Vielzahl von Darstellungsräumen, so genannten Parallelen Universen, führt.<br />Im Rahmen dieser Arbeit wird das Lernen in Parallelen Universen als neues Lernkonzept vorgestellt, das die simultane Analyse der verschiedenen Universen umfasst. Ziel ist die Erzeugung eines verständlichen Modells, das interessante Strukturen in den Daten, verteilt auf verschiedene Universen darstellt. Das gefundene Modell besteht also aus Teilmodellen, die in jeweils unterschiedlichen Universen gültig sind und zusammen die den gesamten Objekten zugrunde liegende Struktur beschreiben. Dazu werden zwei Verfahren, die dieses neue Lernkonzept umsetzen, neu entwickelt und untersucht. Das erste Verfahren dient zur Lösung überwachter Lernprobleme. Es beruht auf der Erstellung lokaler Nachbarschaftsdiagramme, kurz Neighborgramme, für die Objekte einer oder mehrerer Zielklassen in allen Universen. Beruhend auf Kennwerten wie z. B. der Klassenverteilung oder -dichte in diesen Diagrammen werden so Neighborgramme aus unterschiedlichen Universen ausgewählt, um ein Klassifikationsmodell zu erstellen. Von praktischem Interesse ist dabei auch die Neighborgram-Visualisierung, die zum einen den Vergleich der Nachbarschaftsbeziehung in den Universen ermöglicht und zum anderen eine interaktive Modellerstellung erlaubt. Das zweite Verfahren beschreibt eine Erweiterung verschiedener unüberwachter unscharfer Clusterverfahren. Die Kernidee dabei ist die Modellierung der Zugehörigkeit zu Parallelen Universen durch neue Variablen in den klassischen Zielfunktionen, wobei die Optimierung dann auch in Bezug auf diese neuen Variablen stattfinden. Die Ausgabe der vorgestellten Verfahren beinhaltet u. a. die Clusterprototypen sowie Zugehörigkeitswerte der Cluster (bzw. oder der Objekte) zu Universen, so dass für jeden Cluster oder für jedes Objekt eine partielle Zuordnung zu den individuellen Universen vorgenommen werden kann.<br />Die beschriebenen Verfahren werden auf verschiedenen Datensätzen evaluiert. Diese<br />umfassen einen molekularen Datensatz, bei dem basierend auf vier Universen die Aktivität von Molekülen vorherzusagen ist, sowie einen Datensatz von 3D Objekten, die in<br />16 Universen beschrieben sind
    corecore