1,751 research outputs found
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence
(AGI) with abstract reasoning ability is the goal of next-generation AI. Recent
advancements in Large Language Models (LLMs), along with the emerging field of
Multimodal Large Language Models (MLLMs), have demonstrated impressive
capabilities across a wide range of multimodal tasks and applications.
Particularly, various MLLMs, each with distinct model architectures, training
data, and training stages, have been evaluated across a broad range of MLLM
benchmarks. These studies have, to varying degrees, revealed different aspects
of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs
have not been systematically investigated. In this survey, we comprehensively
review the existing evaluation protocols of multimodal reasoning, categorize
and illustrate the frontiers of MLLMs, introduce recent trends in applications
of MLLMs on reasoning-intensive tasks, and finally discuss current practices
and future directions. We believe our survey establishes a solid base and sheds
light on this important topic, multimodal reasoning
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Generative Language Models (GLMs) have shown impressive performance in tasks
such as text generation, understanding, and reasoning. However, the large model
size poses challenges for practical deployment. To solve this problem,
Quantization-Aware Training (QAT) has become increasingly popular. However,
current QAT methods for generative models have resulted in a noticeable loss of
accuracy. To counteract this issue, we propose a novel knowledge distillation
method specifically designed for GLMs. Our method, called token-scaled logit
distillation, prevents overfitting and provides superior learning from the
teacher model and ground truth. This research marks the first evaluation of
ternary weight quantization-aware training of large-scale GLMs with less than
1.0 degradation in perplexity and no loss of accuracy in a reasoning task
Study of Augmented Reality based manufacturing for further integration of quality control 4.0: a systematic literature review
Augmented Reality (AR) has gradually become a mainstream technology enabling Industry 4.0 and its maturity has also grown over time. AR has been applied to support different processes on the shop-floor level, such as assembly, maintenance, etc. As various processes in manufacturing require high quality and near-zero error rates to ensure the demands and safety of end-users, AR can also equip operators with immersive interfaces to enhance productivity, accuracy and autonomy in the quality sector. However, there is currently no systematic review paper about AR technology enhancing the quality sector. The purpose of this paper is to conduct a systematic literature review (SLR) to conclude about the emerging interest in using AR as an assisting technology for the quality sector in an industry 4.0 context. Five research questions (RQs), with a set of selection criteria, are predefined to support the objectives of this SLR. In addition, different research databases are used for the paper identification phase following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) methodology to find the answers for the predefined RQs. It is found that, in spite of staying behind the assembly and maintenance sector in terms of AR-based solutions, there is a tendency towards interest in developing and implementing AR-assisted quality applications. There are three main categories of current AR-based solutions for quality sector, which are AR-based apps as a virtual Lean tool, AR-assisted metrology and AR-based solutions for in-line quality control. In this SLR, an AR architecture layer framework has been improved to classify articles into different layers which are finally integrated into a systematic design and development methodology for the development of long-term AR-based solutions for the quality sector in the future
MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS
This thesis presents a theoretical framework for the design of user-programmable
robots. The objective of the work is to investigate multi-modal unconstrained natural
instructions given to robots in order to design a learning robot. A corpus-centred
approach is used to design an agent that can reason, learn and interact with a human in a
natural unconstrained way. The corpus-centred design approach is formalised and
developed in detail. It requires the developer to record a human during interaction and
analyse the recordings to find instruction primitives. These are then implemented into a
robot. The focus of this work has been on how to combine speech and gesture using
rules extracted from the analysis of a corpus. A multi-modal integration algorithm is
presented, that can use timing and semantics to group, match and unify gesture and
language. The algorithm always achieves correct pairings on a corpus and initiates
questions to the user in ambiguous cases or missing information. The domain of card
games has been investigated, because of its variety of games which are rich in rules and
contain sequences. A further focus of the work is on the translation of rule-based
instructions. Most multi-modal interfaces to date have only considered sequential
instructions. The combination of frame-based reasoning, a knowledge base organised as
an ontology and a problem solver engine is used to store these rules. The understanding
of rule instructions, which contain conditional and imaginary situations require an agent
with complex reasoning capabilities. A test system of the agent implementation is also
described. Tests to confirm the implementation by playing back the corpus are
presented. Furthermore, deployment test results with the implemented agent and human
subjects are presented and discussed. The tests showed that the rate of errors that are
due to the sentences not being defined in the grammar does not decrease by an
acceptable rate when new grammar is introduced. This was particularly the case for
complex verbal rule instructions which have a large variety of being expressed
Framework de planeamento de missões para frotas de drones interligados
The usage of aerial drones has become more popular as they also become
more accessible, both in economic and usability terms. Nowadays, these
vehicles can present reduced dimensions and a good cost-benefit ratio, which
makes it possible for several services and applications supported by aerial
drone networks to emerge. Some scenarios that benefit from the use of aerial
drones are the monitoring of emergency situations and natural disasters, the
patrolling of urban areas and support to police forces, and tourist applications
such as the real-time video transmission of points of interest. It is common
for the control of the drone to be dependent on human intervention in these
situations, which requires professionals specialized in its control. However,
in recent years, several solutions have emerged that enable the autonomous
flight of these vehicles, minimizing manual interference.
Taking into account the enormous diversity of use cases, many of the
existing solutions for autonomous control focus on specific scenarios. Generic
mission planning platforms also exist, but most of them only allow missions
consisting of linear waypoints to be traversed. These situations translate into
a mission support that is not very flexible.
In this dissertation, we propose a modular infrastructure that can be
used in various scenarios, enabling the autonomous control and monitoring
of a fleet of aerial drones in a mission context. This platform has two main
components, one integrated into the onboard computer of the vehicle, and the
other one in the ground control. The former allows the communication with
the flight controller so that it can collect telemetry data and send movement
instructions to the drone. The latter allows to monitor this data and send
the commands remotely, also enabling robust mission planning with multiple
drones. A mission can be described in a script that the ground module
interprets, sending the commands to the assigned vehicles. These missions
can describe different paths, modifying the behaviour of the drones according
to external factors, such as a sensor reading. It is also possible to define
plugins to be reused in various missions, for example, by integrating an
algorithm that ensures that all drones maintain connectivity.
The solution was evaluated in scenarios with a single drone and with
the collaboration of multiple drones. The tests were performed in a simulated
environment and also in an environment with real drones. The observed
behaviour is similar in both scenarios.A utilização de drones aéreos tem-se vindo a popularizar à medida que estes
se tornam mais acessíveis, quer em termos económicos quer em usabilidade.
Atualmente, estes veículos são capazes de apresentar dimensões reduzidas
e uma boa relação de custo-benefício, o que potencia que diversos serviços
e aplicações suportados por redes de drones aéreos estejam a emergir.
Alguns cenários que beneficiam da utilização de drones aéreos são a
monitorização de situações de emergência e catástrofes naturais, a patrulha
de áreas urbanas e apoio às forças policiais e aplicações turísticas como
a transmissão de vídeo em tempo real de pontos de interesse. É comum
que o controlo do drone esteja dependente de intervenção humana nestas
situações, o que requer profissionais especializados no seu controlo. No
entanto, nos últimos anos têm surgido diversas soluções que possibilitam o
vôo autónomo destes veículos, minimizando a interferência manual.
Perante a enorme diversidade de casos de aplicação, muitas das soluções
existentes para o controlo autónomo focam-se em cenários específicos
de intervenção. Existem também plataformas de planeamento genérico de
missões, mas que na sua maioria apenas permitem missões constituídas por
conjuntos lineares de pontos a ser percorridos. Estas situações traduzem-se
num suporte a missões que é pouco flexível.
Nesta dissertação propomos uma infraestrutura modular passível de
ser utilizada em cenários variados, possibilitando o controlo autónomo de
uma frota de drones aéreos num contexto de missão e a sua monitorização.
Esta plataforma tem dois componentes principais, um integrado no
computador a bordo do veículo e o outro no controlo terrestre. O primeiro
permite a comunicação com o controlador de vôo para que se possa recolher
diversos dados de telemetria e enviar instruções de movimento para o drone.
O segundo permite monitorizar esses dados e enviar os comandos remotamente,
possibilitando também um planeamento robusto de missões com
múltiplos drones. Uma missão pode ser descrita num script que o módulo
terrestre interpreta, enviando os comandos para os veículos atribuídos. Estas
missões podem descrever diversos caminhos, modificando o comportamento
dos drones de acordo com factores externos, como a leitura de um sensor.
Também é possível definir plugins para serem reutilizados em várias missões,
como por exemplo, integrando um algoritmo que garante que todos os drones
mantêm a conectividade.
A solução foi avaliada em cenários com um único drone e com a colaboração
de múltiplos drones. Os testes foram executados em ambiente
simulado e também num ambiente com drones reais. O comportamento
observado nas missões é semelhante em ambos os cenários.Mestrado em Engenharia de Computadores e Telemátic
SLAM research for port AGV based on 2D LIDAR
With the increase in international trade, the transshipment of goods at international container ports is very busy. The AGV (Automated Guided Vehicle) has been used as a new generation of automated container horizontal transport equipment. The AGV is an automated unmanned vehicle that can work 24 hours a day, increasing productivity and reducing labor costs compared to using container trucks. The ability to obtain information about the surrounding environment is a prerequisite for the AGV to automatically complete tasks in the port area. At present, the method of AGV based on RFID tag positioning and navigation has a problem of excessive cost. This dissertation has carried out a research on applying light detection and ranging (LIDAR) simultaneous localization and mapping (SLAM) technology to port AGV. In this master's thesis, a mobile test platform based on a laser range finder is developed to scan 360-degree environmental information (distance and angle) centered on the LIDAR and upload the information to a real-time database to generate surrounding environmental maps, and the obstacle avoidance strategy was developed based on the acquired information. The effectiveness of the platform was verified by the experiments from multiple scenarios. Then based on the first platform, another experimental platform with encoder and IMU sensor was developed. In this platform, the functionality of SLAM is enabled by the GMapping algorithm and the installation of the encoder and IMU sensor. Based on the established environment SLAM map, the path planning and obstacle avoidance functions of the platform were realized.Com o aumento do comércio internacional, o transbordo de mercadorias em portos internacionais de contentores é muito movimentado. O AGV (“Automated Guided Vehicle”) foi usado como uma nova geração de equipamentos para transporte horizontal de contentores de forma automatizada. O AGV é um veículo não tripulado automatizado que pode funcionar 24 horas por dia, aumentando a produtividade e reduzindo os custos de mão-de-obra em comparação com o uso de camiões porta-contentores. A capacidade de obter informações sobre o ambiente circundante é um pré-requisito para o AGV concluir automaticamente tarefas na área portuária. Atualmente, o método de AGV baseado no posicionamento e navegação de etiquetas RFID apresenta um problema de custo excessivo. Nesta dissertação foi realizada uma pesquisa sobre a aplicação da tecnologia LIDAR de localização e mapeamento simultâneo (SLAM) num AGV. Uma plataforma de teste móvel baseada num telémetro a laser é desenvolvida para examinar o ambiente em redor em 360 graus (distância e ângulo), centrado no LIDAR, e fazer upload da informação para uma base de dados em tempo real para gerar um mapa do ambiente em redor. Uma estratégia de prevenção de obstáculos foi também desenvolvida com base nas informações adquiridas. A eficácia da plataforma foi verificada através da realização de testes com vários cenários e obstáculos. Por fim, com base na primeira plataforma, uma outra plataforma experimental com codificador e sensor IMU foi também desenvolvida. Nesta plataforma, a funcionalidade do SLAM é ativada pelo algoritmo GMapping e pela instalação do codificador e do sensor IMU. Com base no estabelecimento do ambiente circundante SLAM, foram realizadas as funções de planeamento de trajetória e prevenção de obstáculos pela plataforma
Proceedings of the Space Shuttle Integrated Electronics Conference, volume 3
Proceedings of space shuttle integrated electronics conference with emphasis on data systems design - Vol.
Design and implementation of an automatic speech recognition interface for a Multipurpose Assistant Robot (MASHI)
This project focuses in the initialization of the work and in the study of online services in order to design and implement an automatic speech recognition system for the robotic platform MASHI. This system will be implemented in two Raspberry Pi 3 using a Master-Slave structure. Online resources and services will be used to maintain the wireless connection and control of the platform. As the desired functionality, this automatic speech recognition system will serve as an efficient interface for the interaction between MASHI and the people inside public buildings, the interaction of the system with other interconnected devices is also considered
- …