5 research outputs found
Document Automation Architectures: Updated Survey in Light of Large Language Models
This paper surveys the current state of the art in document automation (DA).
The objective of DA is to reduce the manual effort during the generation of
documents by automatically creating and integrating input from different
sources and assembling documents conforming to defined templates. There have
been reviews of commercial solutions of DA, particularly in the legal domain,
but to date there has been no comprehensive review of the academic research on
DA architectures and technologies. The current survey of DA reviews the
academic literature and provides a clearer definition and characterization of
DA and its features, identifies state-of-the-art DA architectures and
technologies in academic research, and provides ideas that can lead to new
research opportunities within the DA field in light of recent advances in
generative AI and large language models.Comment: The current paper is the updated version of an earlier survey on
document automation [Ahmadi Achachlouei et al. 2021]. Updates in the current
paper are as follows: We shortened almost all sections to reduce the size of
the main paper (without references) from 28 pages to 10 pages, added a review
of selected papers on large language models, removed certain sections and
most of diagrams. arXiv admin note: substantial text overlap with
arXiv:2109.1160
Notebook-as-a-VRE (NaaVRE): From private notebooks to a collaborative cloud virtual research environment
Virtual Research Environments (VREs) provide user-centric support in the
lifecycle of research activities, e.g., discovering and accessing research
assets, or composing and executing application workflows. A typical VRE is
often implemented as an integrated environment, which includes a catalog of
research assets, a workflow management system, a data management framework, and
tools for enabling collaboration among users. Notebook environments, such as
Jupyter, allow researchers to rapidly prototype scientific code and share their
experiments as online accessible notebooks. Jupyter can support several popular
languages that are used by data scientists, such as Python, R, and Julia.
However, such notebook environments do not have seamless support for running
heavy computations on remote infrastructure or finding and accessing software
code inside notebooks. This paper investigates the gap between a notebook
environment and a VRE and proposes an embedded VRE solution for the Jupyter
environment called Notebook-as-a-VRE (NaaVRE). The NaaVRE solution provides
functional components via a component marketplace and allows users to create a
customized VRE on top of the Jupyter environment. From the VRE, a user can
search research assets (data, software, and algorithms), compose workflows,
manage the lifecycle of an experiment, and share the results among users in the
community. We demonstrate how such a solution can enhance a legacy workflow
that uses Light Detection and Ranging (LiDAR) data from country-wide airborne
laser scanning surveys for deriving geospatial data products of ecosystem
structure at high resolution over broad spatial extents. This enables users to
scale out the processing of multi-terabyte LiDAR point clouds for ecological
applications to more data sources in a distributed cloud environment.Comment: A revised version has been published in the journal software practice
and experienc