378 research outputs found

    Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment

    Get PDF
    Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other

    Artificial Intelligence methodologies to early predict student outcome and enrich learning material

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Emerging technologies for learning report (volume 3)

    Get PDF

    Multimodal behavioral and physiological signals as indicators of cognitive load

    Full text link

    Multimedia Development of English Vocabulary Learning in Primary School

    Get PDF
    In this paper, we describe a prototype of web-based intelligent handwriting education system for autonomous learning of Bengali characters. Bengali language is used by more than 211 million people of India and Bangladesh. Due to the socio-economical limitation, all of the population does not have the chance to go to school. This research project was aimed to develop an intelligent Bengali handwriting education system. As an intelligent tutor, the system can automatically check the handwriting errors, such as stroke production errors, stroke sequence errors, stroke relationship errors and immediately provide a feedback to the students to correct themselves. Our proposed system can be accessed from smartphone or iPhone that allows students to do practice their Bengali handwriting at anytime and anywhere. Bengali is a multi-stroke input characters with extremely long cursive shaped where it has stroke order variability and stroke direction variability. Due to this structural limitation, recognition speed is a crucial issue to apply traditional online handwriting recognition algorithm for Bengali language learning. In this work, we have adopted hierarchical recognition approach to improve the recognition speed that makes our system adaptable for web-based language learning. We applied writing speed free recognition methodology together with hierarchical recognition algorithm. It ensured the learning of all aged population, especially for children and older national. The experimental results showed that our proposed hierarchical recognition algorithm can provide higher accuracy than traditional multi-stroke recognition algorithm with more writing variability

    Multi-Modal Deep Learning to Understand Vision and Language

    Get PDF
    Developing intelligent agents that can perceive and understand the rich visual world around us has been a long-standing goal in the field of artificial intelligence. In the last few years, significant progress has been made towards this goal and deep learning has been attributed to recent incredible advances in general visual and language understanding. Convolutional neural networks have been used to learn image representations while recurrent neural networks have demonstrated the ability to generate text from visual stimuli. In this thesis, we develop methods and techniques using hybrid convolutional and recurrent neural network architectures that connect visual data and natural language utterances. Towards appreciating these methods, this work is divided into two broad groups. Firstly, we introduce a general purpose attention mechanism modeled using a continuous function for video understanding. The use of an attention based hierarchical approach along with automatic boundary detection advances state-of-the-art video captioning results. We also develop techniques for summarizing and annotating long videos. In the second part, we introduce architectures along with training techniques to produce a common connection space where natural language sentences are efficiently and accurately connected with visual modalities. In this connection space, similar concepts lie close, while dissimilar concepts lie far apart, irrespective` of their modality. We discuss four modality transformations: visual to text, text to visual, visual to visual and text to text. We introduce a novel attention mechanism to align multi-modal embeddings which are learned through a multi-modal metric loss function. The common vector space is shown to enable bidirectional generation of images and text. The learned common vector space is evaluated on multiple image-text datasets for cross-modal retrieval and zero-shot retrieval. The models are shown to advance the state-of-the-art on tasks that require joint processing of images and natural language

    Temporal Segmentation of Video Lectures: a speech-based optimization framework

    Get PDF
    Video lectures are very popular nowadays. Following the new teaching trends, students are increasingly seeking educational videos on the web for the most different purposes: learn something new, review content for exams or just out of curiosity. Unfortunately, finding specific content in this type of video is not an easy task. Many video lectures are extensive and cover several topics, and not all of these topics are relevant to the user who has found the video. The result is that the user spends so much time trying to find a topic of interest in the middle of content irrelevant to him. The temporal segmentation of video lectures in topics can solve this problem allowing users to navigate of a non-linear way through all topics of a video lecture. However, temporal video lecture segmentation is a time-consuming task and must be automatized. For this reason, in this work we propose an optimization framework for the temporal video lecture segmentation problem. Our proposal only uses information from the teacher’s speech, therefore it does not depend on any additional resources such as slides, textbooks or manually generated subtitles. This makes our proposal versatile, as we can apply it to a wide range of different video lectures, as it only requires the teacher’s speech on the video. To do this, we formulate this problem as a linear programming model where we combine prosodic and semantic features from speech that may indicate topic transitions. To optimize this model, we use a elitist genetic algorithm with local search. Through the experiments, we were able to evaluate different aspects of our approach such as sensibility to parameter variation and convergence behavior. Also, we show that our method was capable of overcoming state-of-the-art methods, both in Recall and in F1-Score, in two different datasets of video lectures. Finally, we provide the implementation of our framework so that other researchers can contribute and reproduce our results.As videoaulas são muito populares hoje em dia. Seguindo as novas tendências de ensino, estudantes procuram cada vez mais por vídeos educacionais na Web com os mais diferentes propósitos: aprender algo novo, revisar conteúdo para exames ou apenas por curiosidade. Infelizmente, encontrar conteúdo específico nesse tipo de vídeo não é uma tarefa fácil. Muitas videoaulas são extensas e abrangem vários tópicos, sendo que nem todos são relevantes para o usuário que encontrou o vídeo. O resultado disso é que o usuário acaba gastando muito tempo ao tentar encontrar um tópico de interesse em meio a conteúdo que é irrelevante para ele. A segmentação temporal de videoaulas em tópicos pode resolver esse problema ao permitir que os usuários naveguem de maneira não-linear entre os tópicos existentes em uma videoaula. No entanto, se trata de uma tarefa dispendiosa que precisa ser automatizada. Por esse motivo, neste trabalho, propomos um framework de otimização para o problema de segmentação temporal de videoaulas. Nossa proposta utiliza apenas informações da fala do professor, portanto, não depende de recursos adicionais, como slides, livros didáticos ou legendas geradas manualmente. Isso a torna versátil, pois podemos aplicá-la a uma ampla variedade de videoaulas, uma vez que requer apenas que o discurso do professor esteja presente. Para fazer isso, formulamos o problema como um modelo de programação linear, onde combinamos recursos prosódicos e semânticos da fala que podem indicar transições de tópicos. Para otimizar esse modelo, usamos um algoritmo genético elitista com busca local. Através dos experimentos, fomos capazes de avaliar diferentes aspectos de nossa abordagem, como sua sensibilidade à variação de parâmetros e comportamento de convergência. Além disso, mostramos que nosso método foi capaz de superar métodos do estado da arte, tanto em Recall quanto em F1-Score, em dois conjuntos diferentes de videoaulas. Por fim, disponibilizamos a implementação de nosso framework para que outros pesquisadores possam contribuir e reproduzir nossos resultados.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
    corecore