7,969 research outputs found

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Using ChatGPT and other LLMs in Professional Environments

    Get PDF
    Large language models like ChatGPT, Google’s Bard, and Microsoft’s new Bing, to name a few, are developing rapidly in recent years, becoming very popular in different environments, and supporting a wide range of tasks. A deep look into their outcomes reveals several limitations and challenges that can be further improved. The main challenge of these models is the possibility of generating biased or inaccurate results, since these models rely on large amounts of data with no access to unpublic information. Moreover, these language models need to be properly monitored and trained to prevent generating inappropriate or offensive content and to ensure that they are used ethically and safely. This study investigates the use of ChatGPT and other large language models such as Blender, and BERT in professional environments. It has been found that none of the large language models, including ChatGPT, have been used in unstructured dialogues. Moreover, involving the models in professional environments requires extensive training and monitoring by domain professionals or fine-tuning through API

    Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

    Full text link
    Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-Visual series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-Visual-Base and Ziya-Visual-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\url{https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}

    IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

    Full text link
    Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese. However, there is a need to develop ToD systems for other regional or local languages to broaden their ability to comprehend the dialogue contexts in various languages. This paper introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian. We extend two English ToD datasets to Indonesian, comprising four different domains by delexicalization to efficiently reduce the size of annotations. To ensure a high-quality data collection, we hire native speakers to manually translate the dialogues. Along with the original English datasets, these new Indonesian datasets serve as an effective benchmark for evaluating Indonesian and English ToD systems as well as exploring the potential benefits of cross-lingual and bilingual transfer learning approaches.Comment: 2023 1st Workshop in South East Asian Language Processing (SEALP), Co-located with AACL 202
    • …
    corecore