41 research outputs found
Hybridization expansion Monte Carlo simulation of multi-orbital quantum impurity problems: matrix product formalism and improved sampling
We explore two complementary modifications of the hybridization-expansion continuous-time Monte Carlo method, aiming at large multi-orbital quantum impurity problems. One idea is to compute the imaginary-time propagation using a matrix product state representation. We show that bond dimensions considerably smaller than the dimension of the Hilbert space are sufficient to obtain accurate results and that this approach scales polynomially, rather than exponentially with the number of orbitals. Based on scaling analyses, we conclude that a matrix product state implementation will outperform the exact-diagonalization based method for quantum impurity problems with more than 12 orbitals. The second idea is an improved Monte Carlo sampling scheme which is applicable to all variants of the hybridization expansion method. We show that this so-called sliding window sampling scheme speeds up the simulation by at least an order of magnitude for a broad range of model parameters, with the largest improvements at low temperature
Matrix Product State applications for the ALPS project
The density-matrix renormalization group method has become a standard
computational approach to the low-energy physics as well as dynamics of
low-dimensional quantum systems. In this paper, we present a new set of
applications, available as part of the ALPS package, that provide an efficient
and flexible implementation of these methods based on a matrix-product state
(MPS) representation. Our applications implement, within the same framework,
algorithms to variationally find the ground state and low-lying excited states
as well as simulate the time evolution of arbitrary one-dimensional and
two-dimensional models. Implementing the conservation of quantum numbers for
generic Abelian symmetries, we achieve performance competitive with the best
codes in the community. Example results are provided for (i) a model of
itinerant fermions in one dimension and (ii) a model of quantum magnetism.Comment: 11+5 pages, 8 figures, 2 example
Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale
Over the past few decades, the amount of scientific articles and technical
literature has increased exponentially in size. Consequently, there is a great
need for systems that can ingest these documents at scale and make the
contained knowledge discoverable. Unfortunately, both the format of these
documents (e.g. the PDF format or bitmap images) as well as the presentation of
the data (e.g. complex tables) make the extraction of qualitative and
quantitive data extremely challenging. In this paper, we present a modular,
cloud-based platform to ingest documents at scale. This platform, called the
Corpus Conversion Service (CCS), implements a pipeline which allows users to
parse and annotate documents (i.e. collect ground-truth), train
machine-learning classification algorithms and ultimately convert any type of
PDF or bitmap-documents to a structured content representation format. We will
show that each of the modules is scalable due to an asynchronous microservice
architecture and can therefore handle massive amounts of documents.
Furthermore, we will show that our capability to gather ground-truth is
accelerated by machine-learning algorithms by at least one order of magnitude.
This allows us to both gather large amounts of ground-truth in very little time
and obtain very good precision/recall metrics in the range of 99\% with regard
to content conversion to structured output. The CCS platform is currently
deployed on IBM internal infrastructure and serving more than 250 active users
for knowledge-engineering project engagements.Comment: Accepted paper at KDD 2018 conferenc
ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents
Transforming documents into machine-processable representations is a
challenging task due to their complex structures and variability in formats.
Recovering the layout structure and content from PDF files or scanned material
has remained a key problem for decades. ICDAR has a long tradition in hosting
competitions to benchmark the state-of-the-art and encourage the development of
novel solutions to document layout understanding. In this report, we present
the results of our \textit{ICDAR 2023 Competition on Robust Layout Segmentation
in Corporate Documents}, which posed the challenge to accurately segment the
page layout in a broad range of document styles and domains, including
corporate reports, technical literature and patents. To raise the bar over
previous competitions, we engineered a hard competition dataset and proposed
the recent DocLayNet dataset for training. We recorded 45 team registrations
and received official submissions from 21 teams. In the presented solutions, we
recognize interesting combinations of recent computer vision models, data
augmentation strategies and ensemble methods to achieve remarkable accuracy in
the task we posed. A clear trend towards adoption of vision-transformer based
methods is evident. The results demonstrate substantial progress towards
achieving robust and highly generalizing methods for document layout
understanding.Comment: ICDAR 2023, 10 pages, 4 figure