147 research outputs found

    Jaeger: A Concatenation-Based Multi-Transformer VQA Model

    Full text link
    Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval. Although there has been encouraging progress in document-based question answering due to the utilization of large language and open-world prior models\cite{1}, several challenges persist, including prolonged response times, extended inference durations, and imprecision in matching. In order to overcome these challenges, we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive question features, we leverage the exceptional capabilities of RoBERTa large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we subject the outputs from both models to a concatenation process. This operation allows the model to consider information from diverse sources concurrently, strengthening its representational capability. By leveraging pre-trained models for feature extraction, our approach has the potential to amplify the performance of these models through concatenation. After concatenation, we apply dimensionality reduction to the output features, reducing the model's computational effectiveness and inference time. Empirical results demonstrate that our proposed model achieves competitive performance on Task C of the PDF-VQA Dataset. If the user adds any new data, they should make sure to style it as per the instructions provided in previous sections.Comment: This paper is the technical research paper of CIKM 2023 DocIU challenges. The authors received the CIKM 2023 DocIU Winner Award, sponsored by Google, Microsoft, and the Centre for data-driven geoscienc

    Empirical Analysis of Wind Power Potential at Multiple Heights for North Dakota Wind Observation Sites

    Get PDF
    Wind speed is the most critical factor that determines wind power potential and generation. In this paper, the wind speed data of multiple years from various observation sites in North Dakota, U.S. was analyzed to assess the wind power potential. The study first applied probability density functions (PDFs) to characterize the wind speed data and fit the distributions at various heights for each observation site. The fitted distributions were then used to estimate the wind power potential based on the theoretical cubic power relationship between energy potential and wind speed. Due to the complexity of functions, the numerical integration approach was employed. The following major findings were obtained from this empirical study: (1) Weibull distribution is not always the best function to fit wind speed data, while gamma and lognormal distributions produce better fitting in many occasions; (2) For different height levels at one observation site, the best performing distributions may be different; (3) The estimation accuracies of wind energy potential based on the fitted wind speed distributions range from -4% to 3.8%; (4) The rank of energy potential estimation accuracies is not always consistent with that of goodness-of-fit for wind speed distributions. In addition, a simplified approach that only relies on the hourly mean wind speed to estimate wind power potential is evaluated. Based on the theoretical cubic relationship for wind power estimation, it was found that the simplified approach may provide significantly lower estimates of wind power potential by 42-54%. As such, this approach will become more practical if this amount of difference is to be compensated.Key words: Wind speed; Distribution; Goodness-of-fit; Wind power potential; North Dakot

    MOPRD: A multidisciplinary open peer review dataset

    Full text link
    Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as they are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response to this problem, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we design a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications

    A new method of reconstructing Galactic three-dimensional structures using ultralong-wavelength radio observations

    Full text link
    The free-free absorption of low frequency radio waves by thermal electrons in the warm ionized medium of our Galaxy becomes very significant at ≲10\lesssim 10 MHz (ultralong-wavelength), and the absorption strength depends on the radio frequency. Upcoming space experiments such as the Discovering Sky at the Longest wavelength (DSL) and Farside Array for Radio Science Investigations of the Dark ages and Exoplanets (FARSIDE) will produce high-resolution multi-frequency sky maps at the ultralong-wavelength, providing a new window to observe the Universe. In this paper we propose that from these ultralong-wavelength multi-frequency maps, the three-dimensional distribution of the Galactic electrons can be reconstructed. This novel and robust reconstruction of the Galactic electron distribution will be a key science case of those space missions. Ultralong-wavelength observations will be a powerful tool for studying the astrophysics relevant to the Galactic electron distribution, for example, the impacts of supernova explosions on electron distribution, and the interaction between interstellar atoms and ionizing photons escaped from the HII regions around massive stars. An animation shows the reconstructed results using {\tt NE2001} model as input test. On ArXiv, it is given in the directory: Ancillary files. In the paper the animation is linked to Fig. 5.Comment: 16 pages, 8 figures (including one animation, on ArXiv see the ancillary files), Accepted for publication in Ap

    Automated scholarly paper review: Technologies and challenges

    Full text link
    Peer review is a widely accepted mechanism for research evaluation, playing a pivotal role in scholarly publishing. However, criticisms have long been leveled on this mechanism, mostly because of its inefficiency and subjectivity. Recent years have seen the application of artificial intelligence (AI) in assisting the peer review process. Nonetheless, with the involvement of humans, such limitations remain inevitable. In this review paper, we propose the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process. On the basis of the review and discussion, we conclude that there is already corresponding research and implementation at each stage of ASPR. We further look into the challenges in ASPR with the existing technologies. The major difficulties lie in imperfect document parsing and representation, inadequate data, defective human-computer interaction and flawed deep logical reasoning. Moreover, we discuss the possible moral & ethical issues and point out the future directions of ASPR. In the foreseeable future, ASPR and peer review will coexist in a reinforcing manner before ASPR is able to fully undertake the reviewing workload from humans

    Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

    Full text link
    A popular approach to streaming speech translation is to employ a single offline model with a \textit{wait-kk} policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.Comment: work in progres

    Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

    Full text link
    Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different encoder layers dynamically for different tasks, since recent studies reveal that the bottom layers of the Transformer encoder contain more syntactic information and the top ones contain more semantic information. Specifically, we introduce a \textit{composed layer} between the encoder and decoder to compose different encoder layers' representations to generate specific keys and values passing into different decoder layers. \textsc{CompoSition} achieves competitive results on two comprehensive and realistic benchmarks, which empirically demonstrates the effectiveness of our proposal. Codes are available at~\url{https://github.com/thinkaboutzero/COMPOSITION}.Comment: Accepted by Findings of EMNLP 202
    • …
    corecore