2,486 research outputs found

    An Ontology based Text-to-Picture Multimedia m-Learning System

    Get PDF
    Multimedia Text-to-Picture is the process of building mental representation from words associated with images. From the research aspect, multimedia instructional message items are illustrations of material using words and pictures that are designed to promote user realization. Illustrations can be presented in a static form such as images, symbols, icons, figures, tables, charts, and maps; or in a dynamic form such as animation, or video clips. Due to the intuitiveness and vividness of visual illustration, many text to picture systems have been proposed in the literature like, Word2Image, Chat with Illustrations, and many others as discussed in the literature review chapter of this thesis. However, we found that some common limitations exist in these systems, especially for the presented images. In fact, the retrieved materials are not fully suitable for educational purposes. Many of them are not context-based and didn’t take into consideration the need of learners (i.e., general purpose images). Manually finding the required pedagogic images to illustrate educational content for learners is inefficient and requires huge efforts, which is a very challenging task. In addition, the available learning systems that mine text based on keywords or sentences selection provide incomplete pedagogic illustrations. This is because words and their semantically related terms are not considered during the process of finding illustrations. In this dissertation, we propose new approaches based on the semantic conceptual graph and semantically distributed weights to mine optimal illustrations that match Arabic text in the children’s story domain. We combine these approaches with best keywords and sentences selection algorithms, in order to improve the retrieval of images matching the Arabic text. Our findings show significant improvements in modelling Arabic vocabulary with the most meaningful images and best coverage of the domain in discourse. We also develop a mobile Text-to-Picture System that has two novel features, which are (1) a conceptual graph visualization (CGV) and (2) a visual illustrative assessment. The CGV shows the relationship between terms associated with a picture. It enables the learners to discover the semantic links between Arabic terms and improve their understanding of Arabic vocabulary. The assessment component allows the instructor to automatically follow up the performance of learners. Our experiments demonstrate the efficiency of our multimedia text-to-picture system in enhancing the learners’ knowledge and boost their comprehension of Arabic vocabulary

    Intelligent iconic pictorial database system

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

    Full text link
    Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years. Despite some progress, existing methods are mostly limited to highly curated datasets (e.g., K400) and exhibit unsatisfactory out-of-the-box representations. We argue that it is due to the fact that they only capture pixel-level knowledge rather than spatiotemporal commonsense, which is far away from cognition-level video understanding. Inspired by the great success of image-text pre-training (e.g., CLIP), we take the first step to exploit language semantics to boost transferable spatiotemporal representation learning. We introduce a new pretext task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR scripts by attending to learned video representations. We do not rely on descriptive captions and learn purely from video, i.e., leveraging the natural transcribed speech knowledge to provide noisy but useful semantics over time. Furthermore, rather than the simple concept learning in vision-caption contrast, we encourage cognition-level temporal commonsense reasoning via narrative reorganization. The advantages enable our model to contextualize what is happening like human beings and seamlessly apply to large-scale uncurated video data in the real world. Note that our method differs from ones designed for video-text alignment (e.g., Frozen) and multimodal representation learning (e.g., Merlot). Our method demonstrates strong out-of-the-box spatiotemporal representations on diverse video benchmarks, e.g., +13.6% gains over VideoMAE on SSV2 via linear probing

    Deficits of knowledge versus executive control in semantic cognition: Insights from cued naming

    Get PDF
    Deficits of semantic cognition in semantic dementia and in aphasia consequent on CVA (stroke) are qualitatively different. Patients with semantic dementia are characterised by progressive degradation of central semantic representations, whereas multimodal semantic deficits in stroke aphasia reflect impairment of executive processes that help to direct and control semantic activation in a task-appropriate fashion [Jefferies, E., & Lambon Ralph, M. A. (2006). Semantic impairment in stroke aphasia vs. semantic dementia: A case-series comparison. Brain 129, 2132-2147]. We explored interactions between these two aspects of semantic cognition by examining the effects of cumulative phonemic cueing on picture naming in case series of these two types of patient. The stroke aphasic patients with multimodal semantic deficits cued very readily and demonstrated near-perfect name retrieval when cumulative phonemic cues reached or exceeded the target name's uniqueness point. Therefore, knowledge of the picture names was largely intact for the aphasic patients, but they were unable to retrieve this information without cues that helped to direct activation towards the target response. Equivalent phonemic cues engendered significant but much more limited benefit to the semantic dementia patients: their naming was still severely impaired even when most of the word had been provided. In contrast to the pattern in the stroke aphasia group, successful cueing was mainly confined to the more familiar un-named pictures. We propose that this limited cueing effect in semantic dementia follows from the fact that concepts deteriorate in a graded fashion [Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., & Hodges, J. R., et al. (2004). The structure and deterioration of semantic memory: A neuropsychological and computational investigation. Psychological Review 111, 205-235]. For partially degraded items, the residual conceptual knowledge may be insufficient to drive speech production to completion but these items might reach threshold when they are bolstered by cues. (C) 2007 Elsevier Ltd. All rights reserved

    Knowledge transformers : a link between learning and creativity

    Get PDF
    The purpose of this paper is to investigate whether knowledge transformers which are featured in the learning process, are also present in the creative process. This is achieved by reviewing models and theories of creativity and identifying the existence of the knowledge transformers. The investigation shows that there is some evidence to show that the creative process can be explained through knowledge transformers. Hence, it is suggested that one of links between learning and creativity is through the knowledge transformers

    Knowledge transformers : a link between learning and creativity

    Get PDF
    The purpose of this paper is to investigate whether knowledge transformers that are featured in the learning process are also present in the creative process. First, this was achieved by reviewing accounts of inventions and discoveries with the view of explaining them in terms of knowledge transformers. Second, this was achieved by reviewing models and theories of creativity and identifying the existence of the knowledge transformers. The investigation shows that there is some evidence to show that the creative process can be explained through knowledge transformers. Hence, it is suggested that one of links between learning and creativity is through the knowledge transformers

    The Emotional Facet of Subjective and Neural Indices of Similarity.

    Get PDF
    Emotional similarity refers to the tendency to group stimuli together because they evoke the same feelings in us. The majority of research on similarity perception that has been conducted to date has focused on non-emotional stimuli. Different models have been proposed to explain how we represent semantic concepts, and judge the similarity among them. They are supported from behavioural and neural evidence, often combined by using Multivariate Pattern Analyses. By contrast, less is known about the cognitive and neural mechanisms underlying the judgement of similarity between real-life emotional experiences. This review summarizes the major findings, debates and limitations in the semantic similarity literature. They will serve as background to the emotional facet of similarity that will be the focus of this review. A multi-modal and overarching approach, which relates different levels of neuroscientific explanation (i.e., computational, algorithmic and implementation), would be the key to further unveil what makes emotional experiences similar to each other

    Contextual Bag-Of-Visual-Words and ECOC-Rank for Retrieval and Multi-class Object Recognition

    Get PDF
    Projecte Final de Màster UPC realitzat en col.laboració amb Dept. Matemàtica Aplicada i Anàlisi, Universitat de BarcelonaMulti-class object categorization is an important line of research in Computer Vision and Pattern Recognition fields. An artificial intelligent system is able to interact with its environment if it is able to distinguish among a set of cases, instances, situations, objects, etc. The World is inherently multi-class, and thus, the eficiency of a system can be determined by its accuracy discriminating among a set of cases. A recently applied procedure in the literature is the Bag-Of-Visual-Words (BOVW). This methodology is based on the natural language processing theory, where a set of sentences are defined based on word frequencies. Analogy, in the pattern recognition domain, an object is described based on the frequency of its parts appearance. However, a general drawback of this method is that the dictionary construction does not take into account geometrical information about object parts. In order to include parts relations in the BOVW model, we propose the Contextual BOVW (C-BOVW), where the dictionary construction is guided by a geometricaly-based merging procedure. As a result, objects are described as sentences where geometrical information is implicitly considered. In order to extend the proposed system to the multi-class case, we used the Error-Correcting Output Codes framework (ECOC). State-of-the-art multi-class techniques are frequently defined as an ensemble of binary classifiers. In this sense, the ECOC framework, based on error-correcting principles, showed to be a powerful tool, being able to classify a huge number of classes at the same time that corrects classification errors produced by the individual learners. In our case, the C-BOVW sentences are learnt by means of an ECOC configuration, obtaining high discriminative power. Moreover, we used the ECOC outputs obtained by the new methodology to rank classes. In some situations, more than one label is required to work with multiple hypothesis and find similar cases, such as in the well-known retrieval problems. In this sense, we also included contextual and semantic information to modify the ECOC outputs and defined an ECOC-rank methodology. Altering the ECOC output values by means of the adjacency of classes based on features and classes relations based on ontologies, we also reporteda significant improvement in class-retrieval problems
    corecore