129 research outputs found
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Document-based Visual Question Answering poses a challenging task between
linguistic sense disambiguation and fine-grained multimodal retrieval. Although
there has been encouraging progress in document-based question answering due to
the utilization of large language and open-world prior models\cite{1}, several
challenges persist, including prolonged response times, extended inference
durations, and imprecision in matching. In order to overcome these challenges,
we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive
question features, we leverage the exceptional capabilities of RoBERTa
large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we
subject the outputs from both models to a concatenation process. This operation
allows the model to consider information from diverse sources concurrently,
strengthening its representational capability. By leveraging pre-trained models
for feature extraction, our approach has the potential to amplify the
performance of these models through concatenation. After concatenation, we
apply dimensionality reduction to the output features, reducing the model's
computational effectiveness and inference time. Empirical results demonstrate
that our proposed model achieves competitive performance on Task C of the
PDF-VQA Dataset. If the user adds any new data, they should make sure to style
it as per the instructions provided in previous sections.Comment: This paper is the technical research paper of CIKM 2023 DocIU
challenges. The authors received the CIKM 2023 DocIU Winner Award, sponsored
by Google, Microsoft, and the Centre for data-driven geoscienc
A robust and efficient statistical method for genetic association study using case and control samples from multiple cohorts
BACKGROUND: The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of caseācontrol samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. RESULTS: We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinsonās disease (PD) caseācontrol samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of sizeā<ā1Ā Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18Ā Mb and 0.18Ā Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. CONCLUSIONS: We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS
Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Domain adaptation is an important challenge for neural machine translation.
However, the traditional fine-tuning solution requires multiple extra training
and yields a high cost. In this paper, we propose a non-tuning paradigm,
resolving domain adaptation with a prompt-based method. Specifically, we
construct a bilingual phrase-level database and retrieve relevant pairs from it
as a prompt for the input sentences. By utilizing Retrieved Phrase-level
Prompts (RePP), we effectively boost the translation quality. Experiments show
that our method improves domain-specific machine translation for 6.2 BLEU
scores and improves translation constraints for 11.5% accuracy without
additional training
Development of DNDC-BC model to estimate greenhouse gas emissions from rice paddy fields under combination of biochar and controlled irrigation management.
Acknowledgments This work was supported by the National Natural Science Foundation of China 608 (51879076), SuperG (Nr: 774124; funded under EU Horizon 2020 programme), the Fundamental Research Funds for the Central Universities (B220203009), the Postgraduate Research & Practice Program of Jiangsu Province (KYCX22_0669), the Water Conservancy Science and Technology Project of Jiangxi Province 12 (202124ZDKT09). Thanks to the late Professor Changsheng Li who provided the source code of DNDC and corresponding support. We thank the China Scholarship Council (CSC) for providing a scholarship to Zewei Jiang.Peer reviewedPublisher PD
Dual-Segment Three-Phase PMSM With Dual Inverters for Leakage Current and Common-Mode EMI Reduction
In a motor drive system, the inverter working in discrete and impulse states generates a common-mode voltage (CMV) at the terminal of the stator winding neutral point. The high-frequency CMV can induce a leakage current and a common-mode (CM) electromagnetic interference (EMI), which are potential threats to personal safety and system stability. The conventional single three-phase inverter is found to be powerless in eliminating the CMV, while the two paralleled inverters can effectively eliminate the CMV theoretically, but the three coupled inductors (CIs) should be added to the motor drive system which reduces the power density and efficiency of the system. A novel method, which associates a specially designed dual-segment three-phase motor with the CMV elimination modulation algorithm, can be utilized to cancel the extra CIs without degrading the function of the leakage current and the CM EMI suppression. The design of the dual-segment three-phase permanent magnet synchronous machine is introduced, with identical back electromotive forces for two groups of windings but with little magnetic coupling between them. Simulation and experimental results are provided to verify the validity of the proposed method in CM-related reduction and CI cancellation. Compared with the zero-CM pulsewidth modulation for paralleled inverters proposed in a previous work, the proposed dual-segment three-phase motor drive can achieve a better power density by removing the CIs
Simulating soil salinity dynamics, cotton yield and evapotranspiration under drip irrigation by ensemble machine learning
We thank the China Scholarship Council (CSC) for providing a scholarship (202206710073) to Zewei Jiang. This work was supported by the Fundamental Research Funds for the Central Universities (B220203009), the Postgraduate Research & Practice Program of Jiangsu Province (KYCX22_0669), the Water Conservancy Science and Technology Project of Jiangxi Province (201921ZDKT06, 202124ZDKT09), the National Natural Science Foundation of China (51879076), the Fundamental Research Funds for the Central Universities (B210204016), Science & Technology Specific Projects in Agricultural High-tech Industrial Demonstration Area of the Yellow River Delta, Grant No: 2022SZX01.Peer reviewedPublisher PD
A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations
KEY MESSAGE: This optimized approach provides both a computational tool and a library construction protocol, which can maximize the number of genomic sequence reads that uniformly cover a plant genome and minimize the number of sequence reads representing chloroplast DNA and rRNA genes. One can implement the developed computational tool to feasibly design their own RAD-seq experiment to achieve expected coverage of sequence variant markers for large plant populations using information of the genome sequence and ideally, though not necessarily, information of the sequence polymorphism distribution in the genome. ABSTRACT: Advent of the next generation sequencing techniques motivates recent interest in developing sequence-based identification and genotyping of genome-wide genetic variants in large populations, with RAD-seq being a typical example. Without taking proper account for the fact that chloroplast and rRNA genes may occupy up to 60Ā % of the resulting sequence reads, the current RAD-seq design could be very inefficient for plant and crop species. We presented here a generic computational tool to optimize RAD-seq design in any plant species and experimentally tested the optimized design by implementing it to screen for and genotype sequence variants in four plant populations of diploid and autotetraploid Arabidopsis and potato Solanum tuberosum. Sequence data from the optimized RAD-seq experiments shows that the undesirable chloroplast and rRNA contributed sequence reads can be controlled at 3ā10Ā %. Additionally, the optimized RAD-seq method enables pre-design of the required uniformity and density in coverage of the high quality sequence polymorphic markers over the genome of interest and genotyping of large plant or crop populations at a competitive cost in comparison to other mainstream rivals in the literature. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-016-2736-9) contains supplementary material, which is available to authorized users
Methods for evaluating gene expression from Affymetrix microarray datasets
<p>Abstract</p> <p>Background</p> <p>Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.</p> <p>Results</p> <p>The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to <it>cis</it>-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.</p> <p>Conclusion</p> <p>A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.</p
- ā¦