106 research outputs found

    MicroConceptBERT: concept-relation based document information extraction framework.

    Get PDF
    Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models

    Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits

    Full text link
    Probabilistic Circuits (PCs) are a general and unified computational framework for tractable probabilistic models that support efficient computation of various inference tasks (e.g., computing marginal probabilities). Towards enabling such reasoning capabilities in complex real-world tasks, Liu et al. (2022) propose to distill knowledge (through latent variable assignments) from less tractable but more expressive deep generative models. However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. Therefore, instead of performing distillation from the most expressive deep generative model, we study what properties the teacher model and the PC should have in order to achieve good distillation performance. This leads to a generic algorithmic improvement as well as other data-type-specific ones over the existing latent variable distillation pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06 bits-per-dimension, which is only 0.34 behind variational diffusion models (Kingma et al., 2021)

    A Quadratic Synchronization Rule for Distributed Deep Learning

    Full text link
    In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for HH steps without synchronizing with others, hence reducing communication frequency. While HH has been viewed as a hyperparameter to trade optimization efficiency for communication cost, recent research indicates that setting a proper HH value can lead to generalization improvement. Yet, selecting a proper HH is elusive. This work proposes a theory-grounded method for determining HH, named the Quadratic Synchronization Rule (QSR), which recommends dynamically setting HH in proportion to 1η2\frac{1}{\eta^2} as the learning rate η\eta decays over time. Extensive ImageNet experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. Compared with the standard data parallel training, QSR enables Local AdamW on ViT-B to cut the training time on 16 or 64 GPUs down from 26.7 to 20.2 hours or from 8.6 to 5.5 hours and, at the same time, achieves 1.16%1.16\% or 0.84%0.84\% higher top-1 validation accuracy

    Improving Demand Forecasting: The Challenge of Forecasting Studies Comparability and a Novel Approach to Hierarchical Time Series Forecasting

    Get PDF
    Bedarfsprognosen sind in der Wirtschaft unerlĂ€sslich. Anhand des erwarteten Kundenbe-darfs bestimmen Firmen beispielsweise welche Produkte sie entwickeln, wie viele Fabri-ken sie bauen, wie viel Personal eingestellt wird oder wie viel Rohmaterial geordert wer-den muss. FehleinschĂ€tzungen bei Bedarfsprognosen können schwerwiegende Auswir-kungen haben, zu Fehlentscheidungen fĂŒhren, und im schlimmsten Fall den Bankrott einer Firma herbeifĂŒhren. Doch in vielen FĂ€llen ist es komplex, den tatsĂ€chlichen Bedarf in der Zukunft zu antizipie-ren. Die Einflussfaktoren können vielfĂ€ltig sein, beispielsweise makroökonomische Ent-wicklung, das Verhalten von Wettbewerbern oder technologische Entwicklungen. Selbst wenn alle Einflussfaktoren bekannt sind, sind die ZusammenhĂ€nge und Wechselwirkun-gen hĂ€ufig nur schwer zu quantifizieren. Diese Dissertation trĂ€gt dazu bei, die Genauigkeit von Bedarfsprognosen zu verbessern. Im ersten Teil der Arbeit wird im Rahmen einer ĂŒberfassenden Übersicht ĂŒber das gesamte Spektrum der Anwendungsfelder von Bedarfsprognosen ein neuartiger Ansatz eingefĂŒhrt, wie Studien zu Bedarfsprognosen systematisch verglichen werden können und am Bei-spiel von 116 aktuellen Studien angewandt. Die Vergleichbarkeit von Studien zu verbes-sern ist ein wesentlicher Beitrag zur aktuellen Forschung. Denn anders als bspw. in der Medizinforschung, gibt es fĂŒr Bedarfsprognosen keine wesentlichen vergleichenden quan-titativen Meta-Studien. Der Grund dafĂŒr ist, dass empirische Studien fĂŒr Bedarfsprognosen keine vereinheitlichte Beschreibung nutzen, um ihre Daten, Verfahren und Ergebnisse zu beschreiben. Wenn Studien hingegen durch systematische Beschreibung direkt miteinan-der verglichen werden können, ermöglicht das anderen Forschern besser zu analysieren, wie sich Variationen in AnsĂ€tzen auf die PrognosegĂŒte auswirken – ohne die aufwĂ€ndige Notwendigkeit, empirische Experimente erneut durchzufĂŒhren, die bereits in Studien beschrieben wurden. Diese Arbeit fĂŒhrt erstmals eine solche Systematik zur Beschreibung ein. Der weitere Teil dieser Arbeit behandelt Prognoseverfahren fĂŒr intermittierende Zeitreihen, also Zeitreihen mit wesentlichem Anteil von Bedarfen gleich Null. Diese Art der Zeitreihen erfĂŒllen die Anforderungen an Stetigkeit der meisten Prognoseverfahren nicht, weshalb gĂ€ngige Verfahren hĂ€ufig ungenĂŒgende PrognosegĂŒte erreichen. Gleichwohl ist die Rele-vanz intermittierender Zeitreihen hoch – insbesondere Ersatzteile weisen dieses Bedarfs-muster typischerweise auf. ZunĂ€chst zeigt diese Arbeit in drei Studien auf, dass auch die getesteten Stand-der-Technik Machine Learning AnsĂ€tze bei einigen bekannten DatensĂ€t-zen keine generelle Verbesserung herbeifĂŒhren. Als wesentlichen Beitrag zur Forschung zeigt diese Arbeit im Weiteren ein neuartiges Verfahren auf: Der Similarity-based Time Series Forecasting (STSF) Ansatz nutzt ein Aggregation-Disaggregationsverfahren basie-rend auf einer selbst erzeugten Hierarchie statistischer Eigenschaften der Zeitreihen. In Zusammenhang mit dem STSF Ansatz können alle verfĂŒgbaren Prognosealgorithmen eingesetzt werden – durch die Aggregation wird die Stetigkeitsbedingung erfĂŒllt. In Expe-rimenten an insgesamt sieben öffentlich bekannten DatensĂ€tzen und einem proprietĂ€ren Datensatz zeigt die Arbeit auf, dass die PrognosegĂŒte (gemessen anhand des Root Mean Square Error RMSE) statistisch signifikant um 1-5% im Schnitt gegenĂŒber dem gleichen Verfahren ohne Einsatz von STSF verbessert werden kann. Somit fĂŒhrt das Verfahren eine wesentliche Verbesserung der PrognosegĂŒte herbei. Zusammengefasst trĂ€gt diese Dissertation zum aktuellen Stand der Forschung durch die zuvor genannten Verfahren wesentlich bei. Das vorgeschlagene Verfahren zur Standardi-sierung empirischer Studien beschleunigt den Fortschritt der Forschung, da sie verglei-chende Studien ermöglicht. Und mit dem STSF Verfahren steht ein Ansatz bereit, der zuverlĂ€ssig die PrognosegĂŒte verbessert, und dabei flexibel mit verschiedenen Arten von Prognosealgorithmen einsetzbar ist. Nach dem Erkenntnisstand der umfassenden Literatur-recherche sind keine vergleichbaren AnsĂ€tze bislang beschrieben worden

    Exploration and adaptation of large language models for specialized domains

    Get PDF
    Large language models have transformed the field of natural language processing (NLP). Their improved performance on various NLP benchmarks makes them a promising tool—also for the application in specialized domains. Such domains are characterized by highly trained professionals with particular domain expertise. Since these experts are rare, improving the efficiency of their work with automated systems is especially desirable. However, domain-specific text resources hold various challenges for NLP systems. These challenges include distinct language, noisy and scarce data, and a high level of variation. Further, specialized domains present an increased need for transparent systems since they are often applied in high stakes settings. In this dissertation, we examine whether large language models (LLMs) can overcome some of these challenges and propose methods to effectively adapt them to domain-specific requirements. We first investigate the inner workings and abilities of LLMs and show how they can fill the gaps that are present in previous NLP algorithms for specialized domains. To this end, we explore the sources of errors produced by earlier systems to identify which of them can be addressed by using LLMs. Following this, we take a closer look at how information is processed within Transformer-based LLMs to better understand their capabilities. We find that their layers encode different dimensions of the input text. Here, the contextual vector representation, and the general language knowledge learned during pre-training are especially beneficial for solving complex and multi-step tasks common in specialized domains. Following this exploration, we propose solutions for further adapting LLMs to the requirements of domain-specific tasks. We focus on the clinical domain, which incorporates many typical challenges found in specialized domains. We show how to improve generalization by integrating different domain-specific resources into our models. We further analyze the behavior of the produced models and propose a behavioral testing framework that can serve as a tool for communication with domain experts. Finally, we present an approach for incorporating the benefits of LLMs while fulfilling requirements such as interpretability and modularity. The presented solutions show improvements in performance on benchmark datasets and in manually conducted analyses with medical professionals. Our work provides both new insights into the inner workings of pre-trained language models as well as multiple adaptation methods showing that LLMs can be an effective tool for NLP in specialized domains

    A Comprehensive Study on Knowledge Graph Embedding over Relational Patterns Based on Rule Learning

    Full text link
    Knowledge Graph Embedding (KGE) has proven to be an effective approach to solving the Knowledge Graph Completion (KGC) task. Relational patterns which refer to relations with specific semantics exhibiting graph patterns are an important factor in the performance of KGE models. Though KGE models' capabilities are analyzed over different relational patterns in theory and a rough connection between better relational patterns modeling and better performance of KGC has been built, a comprehensive quantitative analysis on KGE models over relational patterns remains absent so it is uncertain how the theoretical support of KGE to a relational pattern contributes to the performance of triples associated to such a relational pattern. To address this challenge, we evaluate the performance of 7 KGE models over 4 common relational patterns on 2 benchmarks, then conduct an analysis in theory, entity frequency, and part-to-whole three aspects and get some counterintuitive conclusions. Finally, we introduce a training-free method Score-based Patterns Adaptation (SPA) to enhance KGE models' performance over various relational patterns. This approach is simple yet effective and can be applied to KGE models without additional training. Our experimental results demonstrate that our method generally enhances performance over specific relational patterns. Our source code is available from GitHub at https://github.com/zjukg/Comprehensive-Study-over-Relational-Patterns.Comment: This paper is accepted by ISWC 202

    A General Framework for Robust G-Invariance in G-Equivariant Networks

    Full text link
    We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks (GG-CNNs), which we call the GG-triple-correlation (GG-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps - such as the max - are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the GG-TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max GG-Pooling in GG-CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for GG-CNNs defined on both commutative and non-commutative groups - SO(2)SO(2), O(2)O(2), SO(3)SO(3), and O(3)O(3) (discretized as the cyclic C8C8, dihedral D16D16, chiral octahedral OO and full octahedral OhO_h groups) - acting on R2\mathbb{R}^2 and R3\mathbb{R}^3 on both GG-MNIST and GG-ModelNet10 datasets

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum

    12th International Conference on Geographic Information Science: GIScience 2023, September 12–15, 2023, Leeds, UK

    Get PDF
    No abstract available
    • 

    corecore