106 research outputs found
MicroConceptBERT: concept-relation based document information extraction framework.
Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models
Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits
Probabilistic Circuits (PCs) are a general and unified computational
framework for tractable probabilistic models that support efficient computation
of various inference tasks (e.g., computing marginal probabilities). Towards
enabling such reasoning capabilities in complex real-world tasks, Liu et al.
(2022) propose to distill knowledge (through latent variable assignments) from
less tractable but more expressive deep generative models. However, it is still
unclear what factors make this distillation work well. In this paper, we
theoretically and empirically discover that the performance of a PC can exceed
that of its teacher model. Therefore, instead of performing distillation from
the most expressive deep generative model, we study what properties the teacher
model and the PC should have in order to achieve good distillation performance.
This leads to a generic algorithmic improvement as well as other
data-type-specific ones over the existing latent variable distillation
pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging
image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4.06
bits-per-dimension, which is only 0.34 behind variational diffusion models
(Kingma et al., 2021)
A Quadratic Synchronization Rule for Distributed Deep Learning
In distributed deep learning with data parallelism, synchronizing gradients
at each training step can cause a huge communication overhead, especially when
many nodes work together to train large models. Local gradient methods, such as
Local SGD, address this issue by allowing workers to compute locally for
steps without synchronizing with others, hence reducing communication
frequency. While has been viewed as a hyperparameter to trade optimization
efficiency for communication cost, recent research indicates that setting a
proper value can lead to generalization improvement. Yet, selecting a
proper is elusive. This work proposes a theory-grounded method for
determining , named the Quadratic Synchronization Rule (QSR), which
recommends dynamically setting in proportion to as the
learning rate decays over time. Extensive ImageNet experiments on ResNet
and ViT show that local gradient methods with QSR consistently improve the test
accuracy over other synchronization strategies. Compared with the standard data
parallel training, QSR enables Local AdamW on ViT-B to cut the training time on
16 or 64 GPUs down from 26.7 to 20.2 hours or from 8.6 to 5.5 hours and, at the
same time, achieves or higher top-1 validation accuracy
Improving Demand Forecasting: The Challenge of Forecasting Studies Comparability and a Novel Approach to Hierarchical Time Series Forecasting
Bedarfsprognosen sind in der Wirtschaft unerlĂ€sslich. Anhand des erwarteten Kundenbe-darfs bestimmen Firmen beispielsweise welche Produkte sie entwickeln, wie viele Fabri-ken sie bauen, wie viel Personal eingestellt wird oder wie viel Rohmaterial geordert wer-den muss. FehleinschĂ€tzungen bei Bedarfsprognosen können schwerwiegende Auswir-kungen haben, zu Fehlentscheidungen fĂŒhren, und im schlimmsten Fall den Bankrott einer Firma herbeifĂŒhren.
Doch in vielen FÀllen ist es komplex, den tatsÀchlichen Bedarf in der Zukunft zu antizipie-ren. Die Einflussfaktoren können vielfÀltig sein, beispielsweise makroökonomische Ent-wicklung, das Verhalten von Wettbewerbern oder technologische Entwicklungen. Selbst wenn alle Einflussfaktoren bekannt sind, sind die ZusammenhÀnge und Wechselwirkun-gen hÀufig nur schwer zu quantifizieren.
Diese Dissertation trÀgt dazu bei, die Genauigkeit von Bedarfsprognosen zu verbessern.
Im ersten Teil der Arbeit wird im Rahmen einer ĂŒberfassenden Ăbersicht ĂŒber das gesamte Spektrum der Anwendungsfelder von Bedarfsprognosen ein neuartiger Ansatz eingefĂŒhrt, wie Studien zu Bedarfsprognosen systematisch verglichen werden können und am Bei-spiel von 116 aktuellen Studien angewandt. Die Vergleichbarkeit von Studien zu verbes-sern ist ein wesentlicher Beitrag zur aktuellen Forschung. Denn anders als bspw. in der Medizinforschung, gibt es fĂŒr Bedarfsprognosen keine wesentlichen vergleichenden quan-titativen Meta-Studien. Der Grund dafĂŒr ist, dass empirische Studien fĂŒr Bedarfsprognosen keine vereinheitlichte Beschreibung nutzen, um ihre Daten, Verfahren und Ergebnisse zu beschreiben. Wenn Studien hingegen durch systematische Beschreibung direkt miteinan-der verglichen werden können, ermöglicht das anderen Forschern besser zu analysieren, wie sich Variationen in AnsĂ€tzen auf die PrognosegĂŒte auswirken â ohne die aufwĂ€ndige Notwendigkeit, empirische Experimente erneut durchzufĂŒhren, die bereits in Studien beschrieben wurden. Diese Arbeit fĂŒhrt erstmals eine solche Systematik zur Beschreibung ein.
Der weitere Teil dieser Arbeit behandelt Prognoseverfahren fĂŒr intermittierende Zeitreihen, also Zeitreihen mit wesentlichem Anteil von Bedarfen gleich Null. Diese Art der Zeitreihen erfĂŒllen die Anforderungen an Stetigkeit der meisten Prognoseverfahren nicht, weshalb gĂ€ngige Verfahren hĂ€ufig ungenĂŒgende PrognosegĂŒte erreichen. Gleichwohl ist die Rele-vanz intermittierender Zeitreihen hoch â insbesondere Ersatzteile weisen dieses Bedarfs-muster typischerweise auf. ZunĂ€chst zeigt diese Arbeit in drei Studien auf, dass auch die getesteten Stand-der-Technik Machine Learning AnsĂ€tze bei einigen bekannten DatensĂ€t-zen keine generelle Verbesserung herbeifĂŒhren. Als wesentlichen Beitrag zur Forschung zeigt diese Arbeit im Weiteren ein neuartiges Verfahren auf: Der Similarity-based Time Series Forecasting (STSF) Ansatz nutzt ein Aggregation-Disaggregationsverfahren basie-rend auf einer selbst erzeugten Hierarchie statistischer Eigenschaften der Zeitreihen. In Zusammenhang mit dem STSF Ansatz können alle verfĂŒgbaren Prognosealgorithmen eingesetzt werden â durch die Aggregation wird die Stetigkeitsbedingung erfĂŒllt. In Expe-rimenten an insgesamt sieben öffentlich bekannten DatensĂ€tzen und einem proprietĂ€ren Datensatz zeigt die Arbeit auf, dass die PrognosegĂŒte (gemessen anhand des Root Mean Square Error RMSE) statistisch signifikant um 1-5% im Schnitt gegenĂŒber dem gleichen Verfahren ohne Einsatz von STSF verbessert werden kann. Somit fĂŒhrt das Verfahren eine wesentliche Verbesserung der PrognosegĂŒte herbei.
Zusammengefasst trĂ€gt diese Dissertation zum aktuellen Stand der Forschung durch die zuvor genannten Verfahren wesentlich bei. Das vorgeschlagene Verfahren zur Standardi-sierung empirischer Studien beschleunigt den Fortschritt der Forschung, da sie verglei-chende Studien ermöglicht. Und mit dem STSF Verfahren steht ein Ansatz bereit, der zuverlĂ€ssig die PrognosegĂŒte verbessert, und dabei flexibel mit verschiedenen Arten von Prognosealgorithmen einsetzbar ist. Nach dem Erkenntnisstand der umfassenden Literatur-recherche sind keine vergleichbaren AnsĂ€tze bislang beschrieben worden
Exploration and adaptation of large language models for specialized domains
Large language models have transformed the field of natural language processing (NLP). Their improved performance on various NLP benchmarks makes them a promising toolâalso for the application in specialized domains. Such domains are characterized by highly trained professionals with particular domain expertise. Since these experts are rare, improving the efficiency of their work with automated systems is especially desirable. However, domain-specific text resources hold various challenges for NLP systems. These challenges include distinct language, noisy and scarce data, and a high level of variation. Further, specialized domains present an increased need for transparent systems since they are often applied in high stakes settings. In this dissertation, we examine whether large language models (LLMs) can overcome some of these challenges and propose methods to effectively adapt them to domain-specific requirements.
We first investigate the inner workings and abilities of LLMs and show how they can fill the gaps that are present in previous NLP algorithms for specialized domains. To this end, we explore the sources of errors produced by earlier systems to identify which of them can be addressed by using LLMs. Following this, we take a closer look at how information is processed within Transformer-based LLMs to better understand their capabilities. We find that their layers encode different dimensions of the input text. Here, the contextual vector representation, and the general language knowledge learned during pre-training are especially beneficial for solving complex and multi-step tasks common in specialized domains.
Following this exploration, we propose solutions for further adapting LLMs to the requirements of domain-specific tasks. We focus on the clinical domain, which incorporates many typical challenges found in specialized domains. We show how to improve generalization by integrating different domain-specific resources into our models. We further analyze the behavior of the produced models and propose a behavioral testing framework that can serve as a tool for communication with domain experts. Finally, we present an approach for incorporating the benefits of LLMs while fulfilling requirements such as interpretability and modularity. The presented solutions show improvements in performance on benchmark datasets and in manually conducted analyses with medical professionals.
Our work provides both new insights into the inner workings of pre-trained language models as well as multiple adaptation methods showing that LLMs can be an effective tool for NLP in specialized domains
A Comprehensive Study on Knowledge Graph Embedding over Relational Patterns Based on Rule Learning
Knowledge Graph Embedding (KGE) has proven to be an effective approach to
solving the Knowledge Graph Completion (KGC) task. Relational patterns which
refer to relations with specific semantics exhibiting graph patterns are an
important factor in the performance of KGE models. Though KGE models'
capabilities are analyzed over different relational patterns in theory and a
rough connection between better relational patterns modeling and better
performance of KGC has been built, a comprehensive quantitative analysis on KGE
models over relational patterns remains absent so it is uncertain how the
theoretical support of KGE to a relational pattern contributes to the
performance of triples associated to such a relational pattern. To address this
challenge, we evaluate the performance of 7 KGE models over 4 common relational
patterns on 2 benchmarks, then conduct an analysis in theory, entity frequency,
and part-to-whole three aspects and get some counterintuitive conclusions.
Finally, we introduce a training-free method Score-based Patterns Adaptation
(SPA) to enhance KGE models' performance over various relational patterns. This
approach is simple yet effective and can be applied to KGE models without
additional training. Our experimental results demonstrate that our method
generally enhances performance over specific relational patterns. Our source
code is available from GitHub at
https://github.com/zjukg/Comprehensive-Study-over-Relational-Patterns.Comment: This paper is accepted by ISWC 202
A General Framework for Robust G-Invariance in G-Equivariant Networks
We introduce a general method for achieving robust group-invariance in
group-equivariant convolutional neural networks (-CNNs), which we call the
-triple-correlation (-TC) layer. The approach leverages the theory of the
triple-correlation on groups, which is the unique, lowest-degree polynomial
invariant map that is also complete. Many commonly used invariant maps - such
as the max - are incomplete: they remove both group and signal structure. A
complete invariant, by contrast, removes only the variation due to the actions
of the group, while preserving all information about the structure of the
signal. The completeness of the triple correlation endows the -TC layer with
strong robustness, which can be observed in its resistance to invariance-based
adversarial attacks. In addition, we observe that it yields measurable
improvements in classification accuracy over standard Max -Pooling in
-CNN architectures. We provide a general and efficient implementation of the
method for any discretized group, which requires only a table defining the
group's product structure. We demonstrate the benefits of this method for
-CNNs defined on both commutative and non-commutative groups - ,
, , and (discretized as the cyclic , dihedral ,
chiral octahedral and full octahedral groups) - acting on
and on both -MNIST and -ModelNet10
datasets
LIPIcs, Volume 277, GIScience 2023, Complete Volume
LIPIcs, Volume 277, GIScience 2023, Complete Volum
12th International Conference on Geographic Information Science: GIScience 2023, September 12â15, 2023, Leeds, UK
No abstract available
- âŠ