345 research outputs found

    Wissenschaftlich-Technischer Jahresbericht 1993

    Get PDF

    Wissenschaftlich-Technischer Jahresbericht 1993

    Get PDF

    Pioniergeist, Ausdauer, Leidenschaft. Festschrift zu Ehren von Prof. Dr. JĂĽrgen Handke

    Get PDF
    Diese Festschrift ist dem wissenschaftlichen Wirken von Prof. Dr. Jürgen Handke gewidmet und wurde anlässlich seiner Pensionierung im März 2020 zusammengestellt. Der Band behandelt eine Vielzahl an wissenschaftlichen Bereichen, dazu zählen die Linguistik, die digitale Lehre, der Inverted Classroom, das curriculare Design sowie die Educational Robotics. Zu den Beitragenden gehören namhafte Vertreter aus Wissenschaft und Wirtschaft sowie Jungwissenschaftler und Nachwuchsforscher mit Artikeln in sowohl deutscher als auch englischer Sprache. Darüber hinaus hat sich eine Gruppe Künstler mit dem „Forschungsgegenstand“ Jürgen Handke auf künstlerische Weise auseinandergesetzt

    Measurement uncertainty in machine learning - uncertainty propagation and influence on performance

    Get PDF
    Industry 4.0 is based on the intelligent networking of machines and processes in industry and makes a decisive contribution to increasing competitiveness. For this, reliable measurements of used sensors and sensor systems are essential. Metrology deals with the definition of internationally accepted measurement units and standards. In order to internationally compare measurement results, the Guide to the Expression of Uncertainty in Measurement (GUM) provides the basis for evaluating and interpreting measurement uncertainty. At the same time, measurement uncertainty also provides data quality information, which is important when machine learning is applied in the digitalized factory. However, measurement uncertainty in line with the GUM has been mostly neglected in machine learning or only estimated by cross-validation. Therefore, this dissertation aims to combine measurement uncertainty based on the principles of the GUM and machine learning. For performing machine learning, a data pipeline that fuses raw data from different measurement systems and determines measurement uncertainties from dynamic calibration information is presented. Furthermore, a previously published automated toolbox for machine learning is extended to include uncertainty propagation based on the GUM and its supplements. Using this uncertainty-aware toolbox, the influence of measurement uncertainty on machine learning results is investigated, and approaches to improve these results are discussed.Industrie 4.0 basiert auf der intelligenten Vernetzung von Maschinen und Prozessen und trägt zur Steigerung der Wettbewerbsfähigkeit entscheidend bei. Zuverlässige Messungen der eingesetzten Sensoren und Sensorsysteme sind dabei unerlässlich. Die Metrologie befasst sich mit der Festlegung international anerkannter Maßeinheiten und Standards. Um Messergebnisse international zu vergleichen, stellt der Guide to the Expression of Uncertainty in Measurement (GUM) die Basis zur Bewertung von Messunsicherheit bereit. Gleichzeitig liefert die Messunsicherheit auch Informationen zur Datenqualität, welche wiederum wichtig ist, wenn maschinelles Lernen in der digitalisierten Fabrik zur Anwendung kommt. Bisher wurde die Messunsicherheit im Bereich des maschinellen Lernens jedoch meist vernachlässigt oder nur mittels Kreuzvalidierung geschätzt. Ziel dieser Dissertation ist es daher, Messunsicherheit basierend auf dem GUM und maschinelles Lernen zu vereinen. Zur Durchführung des maschinellen Lernens wird eine Datenpipeline vorgestellt, welche Rohdaten verschiedener Messsysteme fusioniert und Messunsicherheiten aus dynamischen Kalibrierinformationen bestimmt. Des Weiteren wird eine bereits publizierte automatisierte Toolbox für maschinelles Lernen um Unsicherheitsfortpflanzungen nach dem GUM erweitert. Unter Verwendung dieser Toolbox werden der Einfluss der Messunsicherheit auf die Ergebnisse des maschinellen Lernens untersucht und Ansätze zur Verbesserung dieser Ergebnisse aufgezeigt

    Clustering Approaches for Multi-source Entity Resolution

    Get PDF
    Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

    Proceedings. 19. Workshop Computational Intelligence, Dortmund, 2. - 4. Dezember 2009

    Get PDF
    Dieser Tagungsband enthält die Beiträge des 19. Workshops „Computational Intelligence“ des Fachausschusses 5.14 der VDI/VDE-Gesellschaft fĂĽr Mess- und Automatisierungstechnik (GMA) und der Fachgruppe „Fuzzy-Systeme und Soft-Computing“ der Gesellschaft fĂĽr Informatik (GI), der vom 2.-4. Dezember 2009 im Haus Bommerholz bei Dortmund stattfindet

    Approximating the schema of a set of documents by means of resemblance

    Get PDF
    The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation.Peer ReviewedPostprint (author's final draft
    • …
    corecore