1,751 research outputs found

    Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

    Full text link
    Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various MLLMs, each with distinct model architectures, training data, and training stages, have been evaluated across a broad range of MLLM benchmarks. These studies have, to varying degrees, revealed different aspects of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs have not been systematically investigated. In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions. We believe our survey establishes a solid base and sheds light on this important topic, multimodal reasoning

    Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

    Full text link
    Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task

    Study of Augmented Reality based manufacturing for further integration of quality control 4.0: a systematic literature review

    Get PDF
    Augmented Reality (AR) has gradually become a mainstream technology enabling Industry 4.0 and its maturity has also grown over time. AR has been applied to support different processes on the shop-floor level, such as assembly, maintenance, etc. As various processes in manufacturing require high quality and near-zero error rates to ensure the demands and safety of end-users, AR can also equip operators with immersive interfaces to enhance productivity, accuracy and autonomy in the quality sector. However, there is currently no systematic review paper about AR technology enhancing the quality sector. The purpose of this paper is to conduct a systematic literature review (SLR) to conclude about the emerging interest in using AR as an assisting technology for the quality sector in an industry 4.0 context. Five research questions (RQs), with a set of selection criteria, are predefined to support the objectives of this SLR. In addition, different research databases are used for the paper identification phase following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) methodology to find the answers for the predefined RQs. It is found that, in spite of staying behind the assembly and maintenance sector in terms of AR-based solutions, there is a tendency towards interest in developing and implementing AR-assisted quality applications. There are three main categories of current AR-based solutions for quality sector, which are AR-based apps as a virtual Lean tool, AR-assisted metrology and AR-based solutions for in-line quality control. In this SLR, an AR architecture layer framework has been improved to classify articles into different layers which are finally integrated into a systematic design and development methodology for the development of long-term AR-based solutions for the quality sector in the future


    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    Framework de planeamento de missões para frotas de drones interligados

    Get PDF
    The usage of aerial drones has become more popular as they also become more accessible, both in economic and usability terms. Nowadays, these vehicles can present reduced dimensions and a good cost-benefit ratio, which makes it possible for several services and applications supported by aerial drone networks to emerge. Some scenarios that benefit from the use of aerial drones are the monitoring of emergency situations and natural disasters, the patrolling of urban areas and support to police forces, and tourist applications such as the real-time video transmission of points of interest. It is common for the control of the drone to be dependent on human intervention in these situations, which requires professionals specialized in its control. However, in recent years, several solutions have emerged that enable the autonomous flight of these vehicles, minimizing manual interference. Taking into account the enormous diversity of use cases, many of the existing solutions for autonomous control focus on specific scenarios. Generic mission planning platforms also exist, but most of them only allow missions consisting of linear waypoints to be traversed. These situations translate into a mission support that is not very flexible. In this dissertation, we propose a modular infrastructure that can be used in various scenarios, enabling the autonomous control and monitoring of a fleet of aerial drones in a mission context. This platform has two main components, one integrated into the onboard computer of the vehicle, and the other one in the ground control. The former allows the communication with the flight controller so that it can collect telemetry data and send movement instructions to the drone. The latter allows to monitor this data and send the commands remotely, also enabling robust mission planning with multiple drones. A mission can be described in a script that the ground module interprets, sending the commands to the assigned vehicles. These missions can describe different paths, modifying the behaviour of the drones according to external factors, such as a sensor reading. It is also possible to define plugins to be reused in various missions, for example, by integrating an algorithm that ensures that all drones maintain connectivity. The solution was evaluated in scenarios with a single drone and with the collaboration of multiple drones. The tests were performed in a simulated environment and also in an environment with real drones. The observed behaviour is similar in both scenarios.A utilização de drones aéreos tem-se vindo a popularizar à medida que estes se tornam mais acessíveis, quer em termos económicos quer em usabilidade. Atualmente, estes veículos são capazes de apresentar dimensões reduzidas e uma boa relação de custo-benefício, o que potencia que diversos serviços e aplicações suportados por redes de drones aéreos estejam a emergir. Alguns cenários que beneficiam da utilização de drones aéreos são a monitorização de situações de emergência e catástrofes naturais, a patrulha de áreas urbanas e apoio às forças policiais e aplicações turísticas como a transmissão de vídeo em tempo real de pontos de interesse. É comum que o controlo do drone esteja dependente de intervenção humana nestas situações, o que requer profissionais especializados no seu controlo. No entanto, nos últimos anos têm surgido diversas soluções que possibilitam o vôo autónomo destes veículos, minimizando a interferência manual. Perante a enorme diversidade de casos de aplicação, muitas das soluções existentes para o controlo autónomo focam-se em cenários específicos de intervenção. Existem também plataformas de planeamento genérico de missões, mas que na sua maioria apenas permitem missões constituídas por conjuntos lineares de pontos a ser percorridos. Estas situações traduzem-se num suporte a missões que é pouco flexível. Nesta dissertação propomos uma infraestrutura modular passível de ser utilizada em cenários variados, possibilitando o controlo autónomo de uma frota de drones aéreos num contexto de missão e a sua monitorização. Esta plataforma tem dois componentes principais, um integrado no computador a bordo do veículo e o outro no controlo terrestre. O primeiro permite a comunicação com o controlador de vôo para que se possa recolher diversos dados de telemetria e enviar instruções de movimento para o drone. O segundo permite monitorizar esses dados e enviar os comandos remotamente, possibilitando também um planeamento robusto de missões com múltiplos drones. Uma missão pode ser descrita num script que o módulo terrestre interpreta, enviando os comandos para os veículos atribuídos. Estas missões podem descrever diversos caminhos, modificando o comportamento dos drones de acordo com factores externos, como a leitura de um sensor. Também é possível definir plugins para serem reutilizados em várias missões, como por exemplo, integrando um algoritmo que garante que todos os drones mantêm a conectividade. A solução foi avaliada em cenários com um único drone e com a colaboração de múltiplos drones. Os testes foram executados em ambiente simulado e também num ambiente com drones reais. O comportamento observado nas missões é semelhante em ambos os cenários.Mestrado em Engenharia de Computadores e Telemátic

    SLAM research for port AGV based on 2D LIDAR

    Get PDF
    With the increase in international trade, the transshipment of goods at international container ports is very busy. The AGV (Automated Guided Vehicle) has been used as a new generation of automated container horizontal transport equipment. The AGV is an automated unmanned vehicle that can work 24 hours a day, increasing productivity and reducing labor costs compared to using container trucks. The ability to obtain information about the surrounding environment is a prerequisite for the AGV to automatically complete tasks in the port area. At present, the method of AGV based on RFID tag positioning and navigation has a problem of excessive cost. This dissertation has carried out a research on applying light detection and ranging (LIDAR) simultaneous localization and mapping (SLAM) technology to port AGV. In this master's thesis, a mobile test platform based on a laser range finder is developed to scan 360-degree environmental information (distance and angle) centered on the LIDAR and upload the information to a real-time database to generate surrounding environmental maps, and the obstacle avoidance strategy was developed based on the acquired information. The effectiveness of the platform was verified by the experiments from multiple scenarios. Then based on the first platform, another experimental platform with encoder and IMU sensor was developed. In this platform, the functionality of SLAM is enabled by the GMapping algorithm and the installation of the encoder and IMU sensor. Based on the established environment SLAM map, the path planning and obstacle avoidance functions of the platform were realized.Com o aumento do comércio internacional, o transbordo de mercadorias em portos internacionais de contentores é muito movimentado. O AGV (“Automated Guided Vehicle”) foi usado como uma nova geração de equipamentos para transporte horizontal de contentores de forma automatizada. O AGV é um veículo não tripulado automatizado que pode funcionar 24 horas por dia, aumentando a produtividade e reduzindo os custos de mão-de-obra em comparação com o uso de camiões porta-contentores. A capacidade de obter informações sobre o ambiente circundante é um pré-requisito para o AGV concluir automaticamente tarefas na área portuária. Atualmente, o método de AGV baseado no posicionamento e navegação de etiquetas RFID apresenta um problema de custo excessivo. Nesta dissertação foi realizada uma pesquisa sobre a aplicação da tecnologia LIDAR de localização e mapeamento simultâneo (SLAM) num AGV. Uma plataforma de teste móvel baseada num telémetro a laser é desenvolvida para examinar o ambiente em redor em 360 graus (distância e ângulo), centrado no LIDAR, e fazer upload da informação para uma base de dados em tempo real para gerar um mapa do ambiente em redor. Uma estratégia de prevenção de obstáculos foi também desenvolvida com base nas informações adquiridas. A eficácia da plataforma foi verificada através da realização de testes com vários cenários e obstáculos. Por fim, com base na primeira plataforma, uma outra plataforma experimental com codificador e sensor IMU foi também desenvolvida. Nesta plataforma, a funcionalidade do SLAM é ativada pelo algoritmo GMapping e pela instalação do codificador e do sensor IMU. Com base no estabelecimento do ambiente circundante SLAM, foram realizadas as funções de planeamento de trajetória e prevenção de obstáculos pela plataforma

    Proceedings of the Space Shuttle Integrated Electronics Conference, volume 3

    Get PDF
    Proceedings of space shuttle integrated electronics conference with emphasis on data systems design - Vol.

    Design and implementation of an automatic speech recognition interface for a Multipurpose Assistant Robot (MASHI)

    Get PDF
    This project focuses in the initialization of the work and in the study of online services in order to design and implement an automatic speech recognition system for the robotic platform MASHI. This system will be implemented in two Raspberry Pi 3 using a Master-Slave structure. Online resources and services will be used to maintain the wireless connection and control of the platform. As the desired functionality, this automatic speech recognition system will serve as an efficient interface for the interaction between MASHI and the people inside public buildings, the interaction of the system with other interconnected devices is also considered