182 research outputs found
Quasiphoton at the Subcycle Level in Strong-Field Ionization
Photon is an energy quanta of light that does not exist at the
sub-optical-cycle level. Exploiting the dynamical rotational symmetry of
circularly or elliptically polarized light pulses, however, we demonstrate the
existence of quasiphotons down to the subcycle level. We illustrate the concept
of quasiphotons in strong-field ionization through the correlated spectrum of
angular momentum and energy (SAME) of photoelectrons, both at the tunnel exit
and in the asymptotic region. Moreover, we propose a protocol based on electron
vortices to directly visualize the existence of quasiphotons. Our work paves
the pathway towards a deeper understanding of fundamental light-matter
interactions with photonic characteristics on the subcycle scale.Comment: 6 pages, 4 figure
M2C: Towards Automatic Multimodal Manga Complement
Multimodal manga analysis focuses on enhancing manga understanding with
visual and textual features, which has attracted considerable attention from
both natural language processing and computer vision communities. Currently,
most comics are hand-drawn and prone to problems such as missing pages, text
contamination, and aging, resulting in missing comic text content and seriously
hindering human comprehension. In other words, the Multimodal Manga Complement
(M2C) task has not been investigated, which aims to handle the aforementioned
issues by providing a shared semantic space for vision and language
understanding. To this end, we first propose the Multimodal Manga Complement
task by establishing a new M2C benchmark dataset covering two languages. First,
we design a manga argumentation method called MCoT to mine event knowledge in
comics with large language models. Then, an effective baseline FVP-M
using fine-grained visual prompts is proposed to support manga complement.
Extensive experimental results show the effectiveness of FVP-M method for
Multimodal Mange Complement.Comment: EMNLP2023. arXiv admin note: text overlap with arXiv:2210.1546
Uneven-Layered Coding Metamaterial Tile for Ultrawideband RCS Reduction and Diffuse Scattering
In this paper, a novel uneven-layered coding metamaterial tile is proposed for ultra-wideband radar cross section (RCS) reduction and diffuse scattering. The metamaterial tile is composed of two kinds of square ring unit cells with different layer thickness. The reflection phase difference of 180° (±37°) between two unit cells covers an ultra-wide frequency range. Due to the phase cancellation between two unit cells, the metamaterial tile has the scattering pattern of four strong lobes deviating from normal direction. The metamaterial tile and its 90-degree rotation can be encoded as the ‘0’ and ‘1’ elements to cover an object, and diffuse scattering pattern can be realized by optimizing phase distribution, leading to reductions of the monostatic and bi-static RCSs simultaneously. The metamaterial tile can achieve −10 dB RCS reduction from 6.2 GHz to 25.7 GHz with the ratio bandwidth of 4.15:1 at normal incidence. The measured and simulated results are in good agreement and validate the proposed uneven-layered coding metamaterial tile can greatly expanding the bandwidth for RCS reduction and diffuse scattering
Metasurface base on uneven layered fractal elements for ultra-wideband RCS reduction
A novel metasurface based on uneven layered fractal elements is designed and fabricated for ultra-wideband radar cross section (RCS) reduction in this paper. The proposed metasurface consists of two fractal subwavelength elements with different layer thickness. The reflection phase difference of 180◦ (±37◦) between two unit cells covers an ultra-wide frequency range. Ultra-wideband RCS reduction results from the phase cancellation between two local waves produced by these two unit cells. The diffuse scattering of electromagnetic (EM) waves is caused by the randomized phase distribution, leading to a low monostatic and bistatic RCS simultaneously. This metasurface can achieve -10dB RCS reduction in an ultra-wide frequency range from 6.6 to 23.9 GHz with a ratio bandwidth (fH/fL) of 3.62:1 under normal incidences for both x- and y-polarized waves. Both the simulation and the measurement results are consistent to verify this excellent RCS reduction performance of the proposed metasurface
GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking
Retrieval-enhanced text generation, which aims to leverage passages retrieved
from a large passage corpus for delivering a proper answer given the input
query, has shown remarkable progress on knowledge-intensive language tasks such
as open-domain question answering and knowledge-enhanced dialogue generation.
However, the retrieved passages are not ideal for guiding answer generation
because of the discrepancy between retrieval and generation, i.e., the
candidate passages are all treated equally during the retrieval procedure
without considering their potential to generate the proper answers. This
discrepancy makes a passage retriever deliver a sub-optimal collection of
candidate passages to generate answers. In this paper, we propose the
GeneRative Knowledge Improved Passage Ranking (GripRank) approach, addressing
the above challenge by distilling knowledge from a generative passage estimator
(GPE) to a passage ranker, where the GPE is a generative language model used to
measure how likely the candidate passages can generate the proper answer. We
realize the distillation procedure by teaching the passage ranker learning to
rank the passages ordered by the GPE. Furthermore, we improve the distillation
quality by devising a curriculum knowledge distillation mechanism, which allows
the knowledge provided by the GPE can be progressively distilled to the ranker
through an easy-to-hard curriculum, enabling the passage ranker to correctly
recognize the provenance of the answer from many plausible candidates. We
conduct extensive experiments on four datasets across three knowledge-intensive
language tasks. Experimental results show advantages over the state-of-the-art
methods for both passage ranking and answer generation on the KILT benchmark.Comment: 11 pages, 4 figure
HanoiT: Enhancing Context-aware Translation via Selective Context
Context-aware neural machine translation aims to use the document-level
context to improve translation quality. However, not all words in the context
are helpful. The irrelevant or trivial words may bring some noise and distract
the model from learning the relationship between the current sentence and the
auxiliary context. To mitigate this problem, we propose a novel end-to-end
encoder-decoder model with a layer-wise selection mechanism to sift and refine
the long document context. To verify the effectiveness of our method, extensive
experiments and extra quantitative analysis are conducted on four
document-level machine translation benchmarks. The experimental results
demonstrate that our model significantly outperforms previous models on all
datasets via the soft selection mechanism
SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
With the development of large language models, many remarkable linguistic
systems like ChatGPT have thrived and achieved astonishing success on many
tasks, showing the incredible power of foundation models. In the spirit of
unleashing the capability of foundation models on vision tasks, the Segment
Anything Model (SAM), a vision foundation model for image segmentation, has
been proposed recently and presents strong zero-shot ability on many downstream
2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be
explored, especially 3D object detection. With this inspiration, we explore
adapting the zero-shot ability of SAM to 3D object detection in this paper. We
propose a SAM-powered BEV processing pipeline to detect objects and get
promising results on the large-scale Waymo open dataset. As an early attempt,
our method takes a step toward 3D object detection with vision foundation
models and presents the opportunity to unleash their power on 3D vision tasks.
The code is released at https://github.com/DYZhang09/SAM3D.Comment: Technical Report. The code is released at
https://github.com/DYZhang09/SAM3
m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt
Multilingual translation supports multiple translation directions by
projecting all languages in a shared space, but the translation quality is
undermined by the difference between languages in the text-only modality,
especially when the number of languages is large. To bridge this gap, we
introduce visual context as the universal language-independent representation
to facilitate multilingual translation. In this paper, we propose a framework
to leverage the multimodal prompt to guide the Multimodal Multilingual neural
Machine Translation (m3P), which aligns the representations of different
languages with the same meaning and generates the conditional vision-language
memory for translation. We construct a multilingual multimodal instruction
dataset (InstrMulti102) to support 102 languages. Our method aims to minimize
the representation distance of different languages by regarding the image as a
central language. Experimental results show that m3P outperforms previous
text-only baselines and multilingual multimodal methods by a large margin.
Furthermore, the probing experiments validate the effectiveness of our method
in enhancing translation under the low-resource and massively multilingual
scenario.Comment: COLING 202
MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information Extraction
Cross-lingual open information extraction aims to extract structured
information from raw text across multiple languages. Previous work uses a
shared cross-lingual pre-trained model to handle the different languages but
underuses the potential of the language-specific representation. In this paper,
we propose an effective multi-stage tuning framework called MT4CrossIE,
designed for enhancing cross-lingual open information extraction by injecting
language-specific knowledge into the shared model. Specifically, the
cross-lingual pre-trained model is first tuned in a shared semantic space
(e.g., embedding matrix) in the fixed encoder and then other components are
optimized in the second stage. After enough training, we freeze the pre-trained
model and tune the multiple extra low-rank language-specific modules using
mixture-of-LoRAs for model-based cross-lingual transfer. In addition, we
leverage two-stage prompting to encourage the large language model (LLM) to
annotate the multi-lingual raw data for data-based cross-lingual transfer. The
model is trained with multi-lingual objectives on our proposed dataset
OpenIE4++ by combing the model-based and data-based transfer techniques.
Experimental results on various benchmarks emphasize the importance of
aggregating multiple plug-in-and-play language-specific modules and demonstrate
the effectiveness of MT4CrossIE in cross-lingual
OIE\footnote{\url{https://github.com/CSJianYang/Multilingual-Multimodal-NLP}}.Comment: 10 page
LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction
Fully supervised log anomaly detection methods suffer the heavy burden of
annotating massive unlabeled log data. Recently, many semi-supervised methods
have been proposed to reduce annotation costs with the help of parsed
templates. However, these methods consider each keyword independently, which
disregards the correlation between keywords and the contextual relationships
among log sequences. In this paper, we propose a novel weakly supervised log
anomaly detection framework, named LogLG, to explore the semantic connections
among keywords from sequences. Specifically, we design an end-to-end iterative
process, where the keywords of unlabeled logs are first extracted to construct
a log-event graph. Then, we build a subgraph annotator to generate pseudo
labels for unlabeled log sequences. To ameliorate the annotation quality, we
adopt a self-supervised task to pre-train a subgraph annotator. After that, a
detection model is trained with the generated pseudo labels. Conditioned on the
classification results, we re-extract the keywords from the log sequences and
update the log-event graph for the next iteration. Experiments on five
benchmarks validate the effectiveness of LogLG for detecting anomalies on
unlabeled log data and demonstrate that LogLG, as the state-of-the-art weakly
supervised method, achieves significant performance improvements compared to
existing methods.Comment: 12 page
- …