147 research outputs found
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Document-based Visual Question Answering poses a challenging task between
linguistic sense disambiguation and fine-grained multimodal retrieval. Although
there has been encouraging progress in document-based question answering due to
the utilization of large language and open-world prior models\cite{1}, several
challenges persist, including prolonged response times, extended inference
durations, and imprecision in matching. In order to overcome these challenges,
we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive
question features, we leverage the exceptional capabilities of RoBERTa
large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we
subject the outputs from both models to a concatenation process. This operation
allows the model to consider information from diverse sources concurrently,
strengthening its representational capability. By leveraging pre-trained models
for feature extraction, our approach has the potential to amplify the
performance of these models through concatenation. After concatenation, we
apply dimensionality reduction to the output features, reducing the model's
computational effectiveness and inference time. Empirical results demonstrate
that our proposed model achieves competitive performance on Task C of the
PDF-VQA Dataset. If the user adds any new data, they should make sure to style
it as per the instructions provided in previous sections.Comment: This paper is the technical research paper of CIKM 2023 DocIU
challenges. The authors received the CIKM 2023 DocIU Winner Award, sponsored
by Google, Microsoft, and the Centre for data-driven geoscienc
Empirical Analysis of Wind Power Potential at Multiple Heights for North Dakota Wind Observation Sites
Wind speed is the most critical factor that determines wind power potential and generation. In this paper, the wind speed data of multiple years from various observation sites in North Dakota, U.S. was analyzed to assess the wind power potential. The study first applied probability density functions (PDFs) to characterize the wind speed data and fit the distributions at various heights for each observation site. The fitted distributions were then used to estimate the wind power potential based on the theoretical cubic power relationship between energy potential and wind speed. Due to the complexity of functions, the numerical integration approach was employed. The following major findings were obtained from this empirical study: (1) Weibull distribution is not always the best function to fit wind speed data, while gamma and lognormal distributions produce better fitting in many occasions; (2) For different height levels at one observation site, the best performing distributions may be different; (3) The estimation accuracies of wind energy potential based on the fitted wind speed distributions range from -4% to 3.8%; (4) The rank of energy potential estimation accuracies is not always consistent with that of goodness-of-fit for wind speed distributions. In addition, a simplified approach that only relies on the hourly mean wind speed to estimate wind power potential is evaluated. Based on the theoretical cubic relationship for wind power estimation, it was found that the simplified approach may provide significantly lower estimates of wind power potential by 42-54%. As such, this approach will become more practical if this amount of difference is to be compensated.Key words: Wind speed; Distribution; Goodness-of-fit; Wind power potential; North Dakot
MOPRD: A multidisciplinary open peer review dataset
Open peer review is a growing trend in academic publications. Public access
to peer review data can benefit both the academic and publishing communities.
It also serves as a great support to studies on review comment generation and
further to the realization of automated scholarly paper review. However, most
of the existing peer review datasets do not provide data that cover the whole
peer review process. Apart from this, their data are not diversified enough as
they are mainly collected from the field of computer science. These two
drawbacks of the currently available peer review datasets need to be addressed
to unlock more opportunities for related studies. In response to this problem,
we construct MOPRD, a multidisciplinary open peer review dataset. This dataset
consists of paper metadata, multiple version manuscripts, review comments,
meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we
design a modular guided review comment generation method based on MOPRD.
Experiments show that our method delivers better performance indicated by both
automatic metrics and human evaluation. We also explore other potential
applications of MOPRD, including meta-review generation, editorial decision
prediction, author rebuttal generation, and scientometric analysis. MOPRD is a
strong endorsement for further studies in peer review-related research and
other applications
A new method of reconstructing Galactic three-dimensional structures using ultralong-wavelength radio observations
The free-free absorption of low frequency radio waves by thermal electrons in
the warm ionized medium of our Galaxy becomes very significant at
MHz (ultralong-wavelength), and the absorption strength depends on the radio
frequency. Upcoming space experiments such as the Discovering Sky at the
Longest wavelength (DSL) and Farside Array for Radio Science Investigations of
the Dark ages and Exoplanets (FARSIDE) will produce high-resolution
multi-frequency sky maps at the ultralong-wavelength, providing a new window to
observe the Universe. In this paper we propose that from these
ultralong-wavelength multi-frequency maps, the three-dimensional distribution
of the Galactic electrons can be reconstructed. This novel and robust
reconstruction of the Galactic electron distribution will be a key science case
of those space missions. Ultralong-wavelength observations will be a powerful
tool for studying the astrophysics relevant to the Galactic electron
distribution, for example, the impacts of supernova explosions on electron
distribution, and the interaction between interstellar atoms and ionizing
photons escaped from the HII regions around massive stars. An animation shows
the reconstructed results using {\tt NE2001} model as input test. On ArXiv, it
is given in the directory: Ancillary files. In the paper the animation is
linked to Fig. 5.Comment: 16 pages, 8 figures (including one animation, on ArXiv see the
ancillary files), Accepted for publication in Ap
Automated scholarly paper review: Technologies and challenges
Peer review is a widely accepted mechanism for research evaluation, playing a
pivotal role in scholarly publishing. However, criticisms have long been
leveled on this mechanism, mostly because of its inefficiency and subjectivity.
Recent years have seen the application of artificial intelligence (AI) in
assisting the peer review process. Nonetheless, with the involvement of humans,
such limitations remain inevitable. In this review paper, we propose the
concept and pipeline of automated scholarly paper review (ASPR) and review the
relevant literature and technologies of achieving a full-scale computerized
review process. On the basis of the review and discussion, we conclude that
there is already corresponding research and implementation at each stage of
ASPR. We further look into the challenges in ASPR with the existing
technologies. The major difficulties lie in imperfect document parsing and
representation, inadequate data, defective human-computer interaction and
flawed deep logical reasoning. Moreover, we discuss the possible moral &
ethical issues and point out the future directions of ASPR. In the foreseeable
future, ASPR and peer review will coexist in a reinforcing manner before ASPR
is able to fully undertake the reviewing workload from humans
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
A popular approach to streaming speech translation is to employ a single
offline model with a \textit{wait-} policy to support different latency
requirements, which is simpler than training multiple online models with
different latency constraints. However, there is a mismatch problem in using a
model trained with complete utterances for streaming inference with partial
input. We demonstrate that speech representations extracted at the end of a
streaming input are significantly different from those extracted from a
complete utterance. To address this issue, we propose a new approach called
Future-Aware Streaming Translation (FAST) that adapts an offline ST model for
streaming input. FAST includes a Future-Aware Inference (FAI) strategy that
incorporates future context through a trainable masked embedding, and a
Future-Aware Distillation (FAD) framework that transfers future context from an
approximation of full speech to streaming input. Our experiments on the MuST-C
EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs
between translation quality and latency than strong baselines. Extensive
analyses suggest that our methods effectively alleviate the aforementioned
mismatch problem between offline training and online inference.Comment: work in progres
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Recent studies have shown that sequence-to-sequence (seq2seq) models struggle
with compositional generalization (CG), i.e., the ability to systematically
generalize to unseen compositions of seen components. There is mounting
evidence that one of the reasons hindering CG is the representation of the
encoder uppermost layer is entangled, i.e., the syntactic and semantic
representations of sequences are entangled. However, we consider that the
previously identified representation entanglement problem is not comprehensive
enough. Additionally, we hypothesize that the source keys and values
representations passing into different decoder layers are also entangled.
Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se
\textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an
extension to seq2seq models which learns to compose representations of
different encoder layers dynamically for different tasks, since recent studies
reveal that the bottom layers of the Transformer encoder contain more syntactic
information and the top ones contain more semantic information. Specifically,
we introduce a \textit{composed layer} between the encoder and decoder to
compose different encoder layers' representations to generate specific keys and
values passing into different decoder layers. \textsc{CompoSition} achieves
competitive results on two comprehensive and realistic benchmarks, which
empirically demonstrates the effectiveness of our proposal. Codes are available
at~\url{https://github.com/thinkaboutzero/COMPOSITION}.Comment: Accepted by Findings of EMNLP 202
- …