5,944 research outputs found
IRGen: Generative Modeling for Image Retrieval
While generative modeling has been ubiquitous in natural language processing
and computer vision, its application to image retrieval remains unexplored. In
this paper, we recast image retrieval as a form of generative modeling by
employing a sequence-to-sequence model, contributing to the current unified
theme. Our framework, IRGen, is a unified model that enables end-to-end
differentiable search, thus achieving superior performance thanks to direct
optimization. While developing IRGen we tackle the key technical challenge of
converting an image into quite a short sequence of semantic units in order to
enable efficient and effective retrieval. Empirical experiments demonstrate
that our model yields significant improvement over three commonly used
benchmarks, for example, 22.9\% higher than the best baseline method in
precision@10 on In-shop dataset with comparable recall@10 score
Advances and Challenges of Multi-task Learning Method in Recommender System: A Survey
Multi-task learning has been widely applied in computational vision, natural
language processing and other fields, which has achieved well performance. In
recent years, a lot of work about multi-task learning recommender system has
been yielded, but there is no previous literature to summarize these works. To
bridge this gap, we provide a systematic literature survey about multi-task
recommender systems, aiming to help researchers and practitioners quickly
understand the current progress in this direction. In this survey, we first
introduce the background and the motivation of the multi-task learning-based
recommender systems. Then we provide a taxonomy of multi-task learning-based
recommendation methods according to the different stages of multi-task learning
techniques, which including task relationship discovery, model architecture and
optimization strategy. Finally, we raise discussions on the application and
promising future directions in this area
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Visual Question Answering is a challenging task, as it requires seamless
interaction between perceptual, linguistic, and background knowledge systems.
While the recent progress of visual and natural language models like BLIP has
led to improved performance on this task, we lack understanding of the ability
of such models to perform on different kinds of questions and reasoning types.
As our initial analysis of BLIP-family models revealed difficulty with
answering fine-detail questions, we investigate the following question: Can
visual cropping be employed to improve the performance of state-of-the-art
visual question answering models on fine-detail questions? Given the recent
success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP
model. We define three controlled subsets of the popular VQA-v2 benchmark to
measure whether cropping can help model performance. Besides human cropping, we
devise two automatic cropping strategies based on multi-modal embedding by CLIP
and BLIP visual QA model gradients. Our experiments demonstrate that the
performance of BLIP model variants can be significantly improved through human
cropping, and automatic cropping methods can produce comparable benefits. A
deeper dive into our findings indicates that the performance enhancement is
more pronounced in zero-shot models than in fine-tuned models and more salient
with smaller bounding boxes than larger ones. We perform case studies to
connect quantitative differences with qualitative observations across question
types and datasets. Finally, we see that the cropping enhancement is robust, as
we gain an improvement of 4.59% (absolute) in the general VQA-random task by
simply inputting a concatenation of the original and gradient-based cropped
images. We make our code available to facilitate further innovation on visual
cropping methods for question answering.Comment: 16 pages, 5 figures, 7 table
La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
A textual and visual features-jointly driven hybrid intelligent system for digital physical education teaching quality evaluation
The utilization of intelligent computing in digital teaching quality evaluation has been a practical demand in smart cities. Currently, related research works can be categorized into two types: textual data-based approaches and visual data-based approaches. Due to the gap between their different formats and modalities, it remains very challenging to integrate them together when conducting digital teaching quality evaluation. In fact, the two types of information can both reflect distinguished knowledge from their own perspectives. To bridge this gap, this paper proposes a textual and visual features-jointly driven hybrid intelligent system for digital teaching quality evaluation. Visual features are extracted with the use of a multiscale convolution neural network by introducing receptive fields with different sizes. Textual features serve as the auxiliary contents for major visual features, and are extracted using a recurrent neural network. At last, we implement the proposed method through some simulation experiments to evaluate its practical running performance, and a real-world dataset collected from teaching activities is employed for this purpose. We obtain some groups of experimental results, which reveal that the hybrid intelligent system developed by this paper can bring more than 10% improvement of efficiency towards digital teaching quality evaluation
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning
Daily images may convey abstract meanings that require us to memorize and
infer profound information from them. To encourage such human-like reasoning,
in this work, we teach machines to predict where and when it was taken rather
than performing basic tasks like traditional segmentation or classification.
Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of
two components: 1) the Quantity module first retrospects more open-world
knowledge as the candidate language inputs; 2) the Relevance module carefully
estimates vision and language cues and infers the location and time.
Experiments show our QR-CLIP's effectiveness, and it outperforms the previous
SOTA on each task by an average of about 10% and 130% relative lift in terms of
location and time reasoning. This study lays a technical foundation for
location and time reasoning and suggests that effectively introducing
open-world knowledge is one of the panaceas for the tasks.Comment: Technical Report. Github: https://github.com/Shi-Wm/QR-CLI
2023-2024 Catalog
The 2023-2024 Governors State University Undergraduate and Graduate Catalog is a comprehensive listing of current information regarding:Degree RequirementsCourse OfferingsUndergraduate and Graduate Rules and Regulation
Introduction to Psychology
Introduction to Psychology is a modified version of Psychology 2e - OpenStax
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks
Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations.
Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes
- …