Search CORE

412,710 research outputs found

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

Author: Chen Chi
Li Peng
Liu Yang
Luo Fuwen
Mi Xiaoyue
Qin Ruoyu
Sun Maosong
Publication venue
Publication date: 14/09/2023
Field of study

Recently, Multimodal Large Language Models (MLLMs) that enable Large Language Models (LLMs) to interpret images through visual instruction tuning have achieved significant success. However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment. In this paper, we propose Position-enhanced Visual Instruction Tuning (PVIT), which extends the functionality of MLLMs by integrating an additional region-level vision encoder. This integration promotes a more detailed comprehension of images for the MLLM. In addition, to efficiently achieve a fine-grained alignment between the vision modules and the LLM, we design multiple data generation strategies to construct an image-region-language instruction dataset. Finally, we present both quantitative experiments and qualitative analysis that demonstrate the superiority of the proposed model. Code and data will be released at https://github.com/PVIT-official/PVIT

arXiv.org e-Print Archive

Primary students' spatial visualization and spatial orientation: an evidence base for instruction

Author: Diezmann Carmel
Lowrie Thomas
Publication venue: 'American Association on Intellectual and Developmental Disabilities (AAIDD)'
Publication date: 01/01/2009
Field of study

This paper reports on the performance of 58 11 to 12-year-olds on a spatial visualization task and a spatial orientation task. The students completed these tasks and explained their thinking during individual interviews. The qualitative data were analysed to inform pedagogical content knowledge for spatial activities. The study revealed that “matching” or “matching and eliminating” were the typical strategies that students employed on these spatial tasks. However, errors in making associations between parts of the same or different shapes were noted. Students also experienced general difficulties with visual memory and language use to explain their thinking. The students’ specific difficulties in spatial visualization related to obscured items, the perspective used, and the placement and orientation of shapes

ACU Research Bank

Queensland University of Technology ePrints Archive

University of Canberra Research Repository

SVIQUEL: A Spacial Visual Query and Exploration Language

Author: Kaushik Sudhir
Rundensteiner Elke A.
Publication venue: Digital WPI
Publication date: 01/12/1997
Field of study

Abstract The need to analyze and query spatial data is becoming increasingly important with the advent of applications such as Geographic Information Systems Image Databases and Remote Sensing The focus of our research is to support spatial data analysis by developing a direct manipulation environment to visually query as well as browse spatial data and to review the visual results for trend analysis In this paper we present a visual query language SVIQUELE which allows us to specify the relative spatial position both topology and directionE between objects using direct manipulation This query language builds upon the notion of dynamic query lters and signi cantly extends them to support integrated querying of both topological and directional types of spatial data In order to facilitate continuous querying as required by a direct manipulation environment we designed an integrated neighborhood model for both kinds of spatial relationships topology and directionE Our spatial query palette SVIQUEL allows us to query over any of the continuous sets of neighboring values SVIQUEL is complimented by a Spatial Query Disambiguation diagram SQUADE which gives qualitative visual representations of the quantitative query This increases the utility of the system for spatial browsing of data with no particular query in mind Mapping functions between the quantitative SVIQUEL and the qualitative SQUAD have been developed The resulting tight coupling between SVIQUEL and SQUAD allows the users to work with either qualitative query speci cations or at a quantitative level of detail depending on his particular needs as well as to freely switch between the two while working in a continuous data exploration mod

DigitalCommons@WPI

Visualising text-based data: Identifying the potential of visual knowledge production through design practice

Author: Lorber-Kasunic J
Sweetapple K
Publication venue: Studies in Material Thinking
Publication date: 28/08/2015
Field of study

An increase in the availability of digitised data coupled with the development of digital tools has enabled humanities scholars to visualise data in ways that were previously difficult, if not impossible. While digitisation has led to an increase in the use of methods that chart, graph and map text-based data, opportunities for visual methods that are non-aggregative remain underdeveloped. In this paper we use ‘Writing Rights’, a collaborative project between design and humanities scholars that examines the process of writing the ‘Déclaration des Droits de l’Homme et du Citoyen’ (1789), to explore this issue. Through a series of visual experiments we discuss how the production of knowledge is enacted textually, within the written language, and graphically with the visual arrangement of the text. We argue that by drawing on the domain expertise of design, with its commitment to the semantic potential of the visual, practices that more wholly account for the qualitative nature of humanities data can be developed

OPUS - University of Technology Sydney

ANALISIS PEMANFAATAN MEDIA PEMBELAJARAN AUDIO VISUAL UNTUK MENINGKATKAN HASIL BELAJAR BAHASA INDONESIA SISWA SD

Author: Cahyani Berliana Henu
Khosiyono Banun Havifah Cahyo
Nisa Ana Fitrotun
Prihati Wahyuning
Publication venue: Program Studi Pendidikan Guru Sekolah Dasar FKIP Universitas Pasundan
Publication date: 09/12/2023
Field of study

This study aims to evaluate the effect of using audio-visual learning media on student learning outcomes in Indonesian language learning at SD Negeri 1 Maron. Indonesian language has an important role in daily life in Indonesia, so good Indonesian language skills are needed, especially for elementary school students. Audio-visual learning media is considered as a tool that can help students understand learning materials more easily and fun. However, there are still many teachers who have not utilized this media optimally in the learning process. Therefore, this study was conducted to determine the extent to which the use of audio-visual learning media can improve the learning outcomes of Indonesian language elementary school students.The research method used was qualitative using the case study method. This research involved teachers (homeroom teachers) and 4th grade students of SD Negeri 1 Maron as research subjects. Data were collected through interviews, observations, and documentation. The results showed that the use of audio visual media in learning can help students understand the learning material better and attract students' attention. Teachers also stated that the use of audio visual media can increase students' interest in learning Indonesian. Thus, this research makes a valuable contribution to improving the quality of Indonesian language education in primary schools through the use of audio visual media

Pasundan University Journal

Valley: Video Assistant with Large Language model Enhanced abilitY

Author: Dong Junwei
Hu Linmei
Li Da
Lu Pengcheng
Luo Ruipu
Qiu Minghui
Wang Tao
Wei Zhongyu
Yang Min
Zhao Ziwang
Publication venue
Publication date: 08/10/2023
Field of study

Large language models (LLMs), with their remarkable conversational capabilities, have demonstrated impressive performance across various applications and have emerged as formidable AI assistants. In view of this, it raises an intuitive question: Can we harness the power of LLMs to build multimodal AI assistants for visual applications? Recently, several multi-modal models have been developed for this purpose. They typically pre-train an adaptation module to align the semantics of the vision encoder and language model, followed by fine-tuning on instruction-following data. However, despite the success of this pipeline in image and language understanding, its effectiveness in joint video and language understanding has not been widely explored. In this paper, we aim to develop a novel multi-modal foundation model capable of comprehending video, image, and language within a general framework. To achieve this goal, we introduce Valley, a Video Assistant with Large Language model Enhanced abilitY. The Valley consists of a LLM, a temporal modeling module, a visual encoder, and a simple projection module designed to bridge visual and textual modes. To empower Valley with video comprehension and instruction-following capabilities, we construct a video instruction dataset and adopt a two-stage tuning procedure to train it. Specifically, we employ ChatGPT to facilitate the construction of task-oriented conversation data encompassing various tasks, including multi-shot captions, long video descriptions, action recognition, causal relationship inference, etc. Subsequently, we adopt a pre-training-then-instructions-tuned pipeline to align visual and textual modalities and improve the instruction-following capability of Valley. Qualitative experiments demonstrate that Valley has the potential to function as a highly effective video assistant that can make complex video understanding scenarios easy

arXiv.org e-Print Archive

Profil Gaya Belajar pada Mahasiswa Program Studi Bahasa Jepang

Author: Ardipradja Ari Rahmat Utama
Widiyani Anggun
Publication venue: 'IPM2KPE'
Publication date: 10/05/2023
Field of study

The purpose of this study was to determine the learning style profile of students of the STBA YAPARI-ABA Japanese Language Studies Program in Bandung. The method used in this study is a qualitative descriptive method with observation and questionnaires as data collection instruments. The number of respondents in this study were 100 Japanese language study program students from 4 levels of study who were selected by random sampling technique. The results of this study are the mapping of student learning styles from level 1 to level 4. Although the tendency for learning styles in each dimension can be observed, there are several dimensions with respondents without a tendency showing a dominant number. When viewed from the profile of each respondent's study level, the profile combination is different. Level 1 has a learning style profile sensing - visual - reflective - sequential - introverted - inductive, level 2 has a learning style profile sensing - visual - active - global - extroverted - inductive, while levels 3 and level 4 have the same learning style profile, namely sensing – visual – reflective – global – introverted – deductive. In conclusion, from this study, the learning style profiles of students of the STBA YAPARI-ABA Bandung Japanese Language Study Program obtained a sensing – visual – reflective – global – introverted – inductive pattern.   Keywords: Learning Style, Japanese Language Students, Profil

Institut Penelitian Matematika Komputer, Keperawatan, Pendidikan dan Ekonomi (IPM2KPE): Open Journal System

Evidence-Based Dialogue Maps as a research tool to evaluate the quality of school pupils’ scientific argumentation

Author: Alexandra Okada
Conklin J.
Reason P.
Simon Buckingham Shum
Toulmin S.
van Eemeren F.H.
Walton D.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

This pilot study focuses on the potential of Evidence-based Dialogue Mapping as a participatory action research tool to investigate young teenagers’ scientific argumentation. Evidence-based Dialogue Mapping is a technique for representing graphically an argumentative dialogue through Questions, Ideas, Pros, Cons and Data. Our research objective is to better understand the usage of Compendium, a Dialogue Mapping software tool, as both (1) a learning strategy to scaffold school pupils’ argumentation and (2) as a method to investigate the quality of their argumentative essays. The participants were a science teacher-researcher, a knowledge mapping researcher and 20 pupils, 12-13 years old, in a summer science course for “gifted and talented” children in the UK. This study draws on multiple data sources: discussion forum, science teacher-researcher’s and pupils’ Dialogue Maps, pupil essays, and reflective comments about the uses of mapping for writing. Through qualitative analysis of two case studies, we examine the role of Evidence-based Dialogue Maps as a mediating tool in scientific reasoning: as conceptual bridges for linking and making knowledge intelligible; as support for the linearisation task of generating a coherent document outline; as a reflective aid to rethinking reasoning in response to teacher feedback; and as a visual language for making arguments tangible via cartographic conventions

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Open Research Online (The Open University)

CREATING VISUAL ASSETS OF KOREAN VOCABULARY CARD GAME NOLJA

Author: Maharani Satyaning
Publication venue: 'Universitas Ciputra Surabaya'
Publication date: 12/10/2021
Field of study

The Korean vocabulary card game Nolja is a game designed to assist the learning process of Korean vocabulary in an interactive way and suitable for women aged 15-25 years. However, as a new game, Nolja has a fatal problem with its design, namely, the placement of illustrations on the player card which makes the game finish too quickly, there are no tools to assess language proficiency on the card, the typeface on the card is less legible, and strokes on the illustration are too thin, Therefore, it is necessary to design a visual asset for the Nolja vocabulary card game. Visual asset design for Nolja includes game theme design, illustration, educational game system, layout, and content arrangement. This design aims to produce a Nolja card game design that is interactive, educational, and suitable for women aged 15-25 years who have the motivation and function to convey Korean language subject matter well. The design is carried out by qualitative methods through interviews with experts and extreme users in the game and learning the Korean language, distributing surveys to 100 target markets. This design is also supported by data collection from literature studies from books, journals, and articles from the Internet. The design produces a visual asset for the Nolja vocabulary card game with a stronger educational game system and a design style that matches the target market's wishes so that it can be well received by the market.   Keyword: Korean language, Visual learning media, Flashcards, Picture card games, Visual assets

Universitas Ciputra Surabaya e-Journal