6,058 research outputs found

    Visual-Semantic Learning

    Get PDF
    Visual-semantic learning is an attractive and challenging research direction aiming to understand complex semantics of heterogeneous data from two domains, i.e., visual signals (i.e., images and videos) and natural language (i.e., captions and questions). It requires memorizing the rich information in a single modality and a joint comprehension of multiple modalities. Artificial intelligence (AI) systems with human-level intelligence are claimed to learn like humans, such as efficiently leveraging brain memory for better comprehension, rationally incorporating common-sense knowledge into reasoning, quickly gaining in-depth understanding given a few samples, and analyzing relationships among abundant and informative events. However, these intelligence capacities are effortless for humans but challenging for machines. To bridge the discrepancy between human-level intelligence and present-day visual-semantic learning, we start from its basic understanding ability by studying the visual question answering (e.g., Image-QA and Video-QA) tasks from the perspectives of memory augmentation and common-sense knowledge incorporation. Furthermore, we stretch it to a more challenging situation with limited and partially unlabeled training data (i.e., Few-shot Visual-Semantic Learning) to imitate the fast learning ability of humans. Finally, to further enhance visual-semantic performance in natural videos with numerous spatio-temporal dynamics, we investigate exploiting event-correlated information for a comprehensive understanding of cross-modal semantics. To study the essential visual-semantic understanding ability of the human brain with memory, we first propose a novel Memory Augmented Deep Recurrent Neural Network (i.e., MA-DRNN) model for Video-QA, which features a new method for encoding videos and questions, and memory augmentation using the emerging Differentiable Neural Computer (i.e., DNC). Specifically, we encode semantic (i.e., questions) information before visual (i.e., videos) information, which leads to better visual-semantic representations. Moreover, we leverage Differentiable Neural Computer (with external memory) to store and retrieve valuable information in questions and videos and model the long-term visual-semantic dependency. In addition to basic understanding, to tackle visual-semantic reasoning that requires external knowledge beyond visible contents (e.g., KB-Image-QA), we propose a novel framework that endows the model with capabilities of answering more general questions and achieves better exploitation of external knowledge through generating Multiple Clues for Reasoning with Memory Neural Networks (i.e., MCR-MemNN). Specifically, a well-defined detector is adopted to predict image-question-related relation phrases, each delivering two complementary clues to retrieve the supporting facts from an external knowledge base (i.e., KB). These facts are encoded into a continuous embedding space using a content-addressable memory. Afterward, mutual interactions between visual-semantic representation and the supporting facts stored in memory are captured to distill the most relevant information in three modalities (i.e., image, question, and KB). Finally, the optimal answer is predicted by choosing the supporting fact with the highest score. Furthermore, to enable a fast, in-depth understanding given a small number of samples, especially with heterogeneity in the multi-modal scenarios such as image question answering (i.e., Image-QA) and image captioning (i.e., IC), we study the few-shot visual-semantic learning and present the Hierarchical Graph ATtention Network (i.e., HGAT). This two-stage network models the intra- and inter-modal relationships with limited image-text samples. The main contributions of HGAT can be summarized as follows: 1) it sheds light on tackling few-shot multi-modal learning problems, which focuses primarily, but not exclusively, on visual and semantic modalities, through better exploitation of the intra-relationship of each modality and an attention-based co-learning framework between modalities using a hierarchical graph-based architecture; 2) it achieves superior performance on both visual question answering and image captioning in the few-shot setting; 3) it can be easily extended to the semi-supervised setting where image-text samples are partially unlabeled. Although various attention mechanisms have been utilized to manage contextualized representations by modeling intra- and inter-modal relationships of the two modalities, one limitation of the predominant visual-semantic methods is the lack of reasoning with event correlation, sensing, and analyzing relationships among abundant and informative events contained in the video. To this end, we introduce the dense caption modality as a new auxiliary and distill event-correlated information to infer the correct answer. We propose a novel end-to-end trainable model, Event-Correlated Graph Neural Networks (EC-GNNs), to perform cross-modal reasoning over information from the three modalities (i.e., caption, video, and question). Besides exploiting a new modality, we employ cross-modal reasoning modules to explicitly model inter-modal relationships and aggregate relevant information across different modalities. We propose a question-guided self-adaptive multi-modal fusion module to collect the question-oriented and event-correlated evidence through multi-step reasoning. To evaluate our proposed models, we conduct extensive experiments on VTW, MSVD-QA, and TGIF-QA datasets for Video-QA task, Toronto COCO-QA, Visual Genome-QA datasets for few-shot Image-QA task, COCO-FITB dataset for few-shot IC task, and FVQA, Visual7W + ConceptNet datasets for KB-Image-QA task. The experimental results justify these models’ effectiveness and superiority over baseline methods

    The Impact of Co-presence and Visual Elements in 3D VLEs on Interpersonal Emotional Connection in Telecollaboration

    Get PDF
    The purpose of this study is to examine participant\u27s perception of the usefulness of the visual elements in 3D Virtual Learning Environments, which represent co-presence, in developing interpersonal emotional connections with their partners in the initial stage of telecollaboration. To fulfill the purpose, two Japanese students and two American students were paired and participated in conversational sessions in two different virtual environments: one where they shared the environments with their partners and the other where they did not.;The participants had five twenty-minute conversational sessions in Japanese in Second Life. By following single subject research designs, the quantitative data were obtained from the results of a Likert scale, which was adapted from the measurement of social presence while the qualitative data were obtained from narrative reflections from participants and conversation analysis.;Both kinds of data were analyzed together and the following conclusions were reached: (1) learners may find avatars useful as a cue to remember the contents of the conversation, (2) 3-D VLEs may help native speakers or non-native speakers with higher proficiency to enforce emotional connections, (3) for non-native speakers, 3-D VLEs may bring positive effects, a sense of connection with their partners, and a negative effect, uncomfortableness, (4) other factors, such as topic of the conversation, gain impacts on emotional connections as the collaboration goes on

    Context visuals in L2 listening tests: the effectiveness of photographs and video vs. audio-only format

    Get PDF
    Although visual support in the form of pictures and video has been widely used in language teaching, there appears to be a dearth of research on the role of visual aids in L2 listening tests (Buck, 2000; Ockey, 2007) and the absence of sound theoretical perspectives on this issue (Ginther, 2001; Gruba, 1999). The existing studies of the role of visual support in L2 listening tests yielded inconclusive results. While some studies showed that visuals can improve test-takers\u27 performance on L2 listening tests (e.g., Ginther, 2002), others revealed no facilitative effect of visuals on listening comprehension of test-takers (e.g., Coniam, 2001; Gruba, 1993; Ockey, 2007).;The given study, conducted at Iowa State University in Spring 2008, investigated the influence of context visuals, namely a single photograph and video, on test-takers performance on a computer-based Listening Test developed specifically for this study. The Listening Test, consisting of six listening passages and 30 multiple-choice questions, was administered to 34 international students from three English listening classes. In particular, the study examined whether test-takers perform differently on three types of listening passages: passages with a single photograph, video-mediated listening passages, and audio-only listening passages. In addition, participants\u27 responses on the Post-Test Questionnaire were analyzed to determine whether their preferences of visual stimuli in listening tests corresponded with their actual performance on different types of visuals.;The results indicated that while no difference was found between the scores for photo-mediated and audio-only listening passages, participants\u27 performance on video-mediated listening passages was significantly lower

    DIGITAL LITERACY IN FOREIGN LANGUAGE THROUGH TEXT MINING AND FANFICTION WRITING

    Get PDF
    This study investigates how digital literacy in a foreign language (FL) may be supported by the use of a digital resource that can aid the processes of reading and writing. Thus, this research is based on studies by Feldman and Sanger (2006) about text mining, and on research by Black (2007; 2009) about the incorporation of a text genre  typical  from the Internet, the fanfiction (text based on existing media), in language learning. Through the use of the text mining resource Sobek, which promotes the extraction of frequent terms present in a text, the participants of this study created digital media narratives in English as a foreign language (FL). The undergraduate Brazilian students who participated in the research used the tool Sobek to mediate the production of fanfictions. In the proposed task, each student read a fanfiction and used the mining tool to develop graphs with recurrent terms found in the story. From the data analysis, it was observed that the use of a digital tool supported text production in the FL, and its following practice of digital literacy, as the authors relied on the mining resource to create new fanfictions

    Letramento digital em língua estrangeira por meio da mineração de texto e da escrita de fanfiction

    Get PDF
    This study investigates how digital literacy in a for-eign language (FL) may be supported by the use of a digital resource that can aid the processes of reading and writing. Thus, this research is based on studies by Feldman and Sanger (2006) about text min-ing, and on research by Black (2007; 2009) about the incorporation of a text genre typical from the Internet, the fanfiction (text based on existing media), in language learning. Through the use of the text mining resource Sobek, which promotes the extraction of frequent terms present in a text, the participants of this study created digital media narratives in English as a foreign language (FL). The under-graduate Brazilian students who participated in the research used the tool Sobek to mediate the production of fanfictions. In the proposed task, each student read a fanfiction and used the mining tool to de-velop graphs with recurrent terms found in the story. From the data analysis, it was observed that the use of a digital tool supported text production in the FL, and its following practice of digital literacy, as the authors relied on the mining resource to create new fanfictions.Este estudo investiga como o letramento digital em uma língua estrangeira (LE) pode ser apoiado pelo uso de um recurso digital que auxilie os processos de leitura e escrita. Assim, esta pesquisa se baseia em estudos de Feldman e Sanger (2006) sobre mineração de texto e em pesquisas de Black (2007; 2009) sobre a incorporação de um gênero de texto característico da Internet, a fanfiction (textos baseados em diversas mídias), no aprendizado de língua inglesa. Por meio do uso do recurso de mineração de texto Sobek, que promove a extração de termos frequentes presentes em um texto, os participantes deste estudo criaram narrativas de mídia digital em inglês como LE. Os estudantes de graduação que participaram da pesquisa utilizaram a ferramenta Sobek para mediar a produção de fanfictions.Na tarefa proposta, cada aluno leu uma fanfiction e usou a ferramenta de mi-neração para desenvolver gráficos com termos recorrentes encontra-dos na história. A partir da análise dos dados, observou-se que o uso de uma ferramenta digital apoiava a produção de texto na LE, e uma prática subsequente de letramento digital, pois os autores contavam com o recurso de mineração para criar novos exemplos de fanfiction

    The role of content-rich visuals in the L2 academic listening assessment construct

    Get PDF
    Despite the growing recognition that second language (L2) listening is a skill incorporating the ability to process visual information along with the auditory stimulus, standardized L2 listening assessments have been predominantly operationalizing this language skill as visual-free (Buck, 2001; Kang, Gutierrez Arvizu, Chaipuapae, & Lesnov, 2016). This study has attempted to clarify the nature of the L2 academic listening assessment construct regarding the role of visual information. This goal was achieved by developing an interpretive argument for including video-based visuals in L2 academic listening tests. Particular attention was paid to the role of content-related visuals that provided graphical illustration, description, or explanation of the auditory listening message. Using Kane’s validity framework, the explanation inference was of primary concern to this study because it is used to justify the measured construct (Kane, 1992; 2004; 2006; 2013). The explanation inference was supported by two types of evidence. First, the performances of 143 English as a second language (ESL) and English as a foreign language (EFL) students on an academic English listening comprehension test were quantitatively analyzed for the effect of delivery mode (i.e., audio-only vs video-based) and its relationships with test-takers’ listening proficiency (i.e., lower vs higher), item video-dependence (i.e., whether or not an item was cued by video), item type (i.e., local vs global), and viewing behavior (self-reported on a scale from 1-did not watch the video to 5-watched all of the video). Analyses were based on both classical test theory (i.e., ANOVA and correlations) and item response theory (i.e., Rasch analysis). In the video-based version of the test, content-rich videos were used, defined as videos containing relevant graphical content-related visual cues for 60% of the video length. The findings showed that video-dependent items were easier with videos than without for both lower-level and higher-level test-takers, regardless of item type. Video-independent items were unexpectedly harder with videos in general. In particular, video-independent global items were harder in the video-based mode than in the audio-only mode for lower-level test-takers. Viewing behavior had a weak positive relationship with listening comprehension, regardless of proficiency. Second, stakeholders’ perceptions about using content-rich videos were investigated. Using a questionnaire, the same 143 test-takers provided their perceptions of test difficulty, motivation towards listening, listening authenticity, and whether content-rich videos should be used in high-stakes academic listening tests. The effects of mode and proficiency on these perceptions were examined. Similarly, 310 ESL and EFL teachers provided their opinions about the effects of content-rich videos on listening difficulty, motivation, authenticity, and using content-rich videos in L2 listening tests. The effects of teachers’ background (i.e., professional location, education level, and teaching-related experience) on their perceptions were examined. Test-takers found the video-based mode easier than the audio-only mode; however, their perceptions of motivation, authenticity, and using videos in tests were not affected by mode. Regarding video use perceptions, test-takers were in favor of including content-rich videos in L2 academic listening tests. Teachers were more favorable towards the video-based mode than the audio-only mode in terms of listening difficulty, motivation, authenticity, and using videos in L2 academic listening tests. The study has discussed how these findings supported the interpretive argument for including content-rich video-based visual information into the assessment construct of L2 academic listening comprehension. Challenges revealed by the findings were also addressed, with limitations acknowledged. The study also offered theoretical and practical implications for the field of L2 assessment. As its primary implication, the study recommends that test developers start using content-rich visual information in L2 academic listening tests

    Investigating the Instructional and Assessment Strategies That Teachers Use in Reading Classes in Elementary Schools: A UAE Study

    Get PDF
    The purpose of this study is to investigate the reading strategies; and assessment strategies teachers use in their reading classes. Furthermore, it aims to explore the types of reading difficulties students face while reading English. The research is based on three research questions. Firstly, what kind of instructional strategies do elementary school teachers use in their reading classes? Secondly, what type of assessment strategies do elementary school teachers use to assess reading? Lastly, what do elementary school teachers perceive as the difficulties that their ESL learners face while reading in English? The main objectives of this research are to explore the different types of reading strategies used by teachers in reading classes, reading problems that learners face in reading classes and the types of assessments strategies used by teachers. To answer the research questions, the researcher employed a combination of qualitative and quantitative methods. The study includes a questionnaire and classroom observations. The questionnaire was distributed to 186 teachers in 13 public schools in Al Ain and the researcher observed six teachers from three different grades. The results showed that the reading strategies teachers preferred using were to predict the content of the story by looking at the cover picture or the title, reading aloud and retelling the story. The findings showed that most of the teachers frequently used answering questions, reading aloud and retelling the story as assessment strategies in the classrooms. In addition, the study revealed that the teachers viewed the inability of students to guess the meaning of words through the text as a reading difficulty student’s face. Moreover, the study revealed that the students in elementary schools, in Al Ain, faced some difficulties in reading and have difficulty in pronouncing the words, using the features of the text and identifying the main idea of the text

    Effects on reading comprehension using a four square vocabulary design

    Get PDF
    The purpose of this study was to determine if utilizing a four square vocabulary box when presented with difficult science text would improve student reading comprehension. Three high school students who receive special education services participated in the eight week study. Participants received explicit instruction from the author on how to use a four square vocabulary box when reading expository text. In addition, participants were presented with an overview of lesson objectives, background knowledge was introduced, comprehension of text was assessed, and vocabulary was reviewed. Pretest and posttests were given to indicate if the use of a four square vocabulary box would increase student reading comprehension. Results indicated student reading levels did not increase. This study raises the question about ways to instruct students with disabilities on how to understand difficult vocabulary found in expository text and if more practice time is needed for students to improve reading comprehension

    Reading in the Content Area: Its Impact on Teaching in the Social Studies Classroom

    Get PDF
    This study focused on evaluating the sufficiency of research in reading in the content area used to instruct classroom teachers. The research used was conducted between 1970 and 2000 and incorporated into textbooks written between 1975 and 2005. Studies examined were those reported in the following journals: Review of Educational Research, Review of Research in Education, Social Education, Theory and Research in Social Education, Reading Research Quarterly, and Research in the Teaching of English. Some attention was also given to two major educational curriculum and issue journals- Educational Leadership and Phi Delta Kappan as these sources might identify relevant research studies for further investigation. References cited in more than one text helped identify and establish a baseline of those studies considered most significant by textbook authors. The findings of this study showed that the majority of citations looked at the following themes: -Learners acquire meaning from the printed page through thought. -Reading can and should be done for different purposes using a variety of materials. -A number of techniques can be used to teach reading skills. -Reading materials need to be selected according to changes in a child‘s interests. -Reading ability is the level of reading difficulty that students can cope with. It depends on ability rather than age or grade level. -Readability contributes to both the reader‘s degree of comprehension and the need for teacher assistance when reading difficulty exceeds the reader‘s capability. -Reading instruction, in some form, needs to be carried on into the secondary grades. Research findings from the 1970s were concerned with reading strategies, reading skills, reading comprehension, readability, attitudes towards reading, vocabulary, study skills, and content area reading programs. In the 1980s research cited in content area reading books looked at reading comprehension, reading skills, vocabulary, learning strategies, curriculum issues, purposes for reading and writing, content area reading programs, readability, schema theory, thinking skills, summarizing, comprehension strategies, and cooperative learning. By the 1990s more research cited in content area reading books focused on reading strategies, curriculum issues, how to read documents and graphs, reading skills, vocabulary, attitudes towards reading, reading comprehension, and activating background knowledge
    corecore