1,833 research outputs found

    Multiple Intelligences in the Mathematics Classroom: A Curriculum Project on Linear Equations and Inequalities in One Variable

    Get PDF
    This curriculum project explores the utilization of Howard Gardener’s Multiple Intelligences theory in the mathematics classroom. There are eight distinct multiple intelligences that can be found in heterogeneous classrooms of students and often students have a blend of these eight intelligences. This curriculum project discusses different methods for integrating certain multiple intelligences into the mathematics classroom. The multiple intelligences that have been included in this curriculum project are Verbal/Linguistic, Logical/Mathematical, Visual/Spatial, Kinesthetic, Interpersonal, and Intrapersonal. The Algebra unit this curriculum project focuses on is Linear Equations and Inequalities in One Variable. The curriculum provided includes a variety of ways to incorporate the multiple intelligences theory throughout each of the thirteen lessons within the unit

    A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition

    Get PDF
    When compared with traditional local shops where the customer has a personalised service, in large retail departments, the client has to make his purchase decisions independently, mostly supported by the information available in the package. Additionally, people are becoming more aware of the importance of the food ingredients and demanding about the type of products they buy and the information provided in the package, despite it often being hard to interpret. Big shops such as supermarkets have also introduced important challenges for the retailer due to the large number of different products in the store, heterogeneous affluence and the daily needs of item repositioning. In this scenario, the automatic detection and recognition of products on the shelves or off the shelves has gained increased interest as the application of these technologies may improve the shopping experience through self-assisted shopping apps and autonomous shopping, or even benefit stock management with real-time inventory, automatic shelf monitoring and product tracking. These solutions can also have an important impact on customers with visual impairments. Despite recent developments in computer vision, automatic grocery product recognition is still very challenging, with most works focusing on the detection or recognition of a small number of products, often under controlled conditions. This paper discusses the challenges related to this problem and presents a review of proposed methods for retail product label processing, with a special focus on assisted analysis for customer support, including for the visually impaired. Moreover, it details the public datasets used in this topic and identifies their limitations, and discusses future research directions of related fields.info:eu-repo/semantics/publishedVersio

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Teoria del Senyal i Comunicacion

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Postprint (published version

    AI-assisted patent prior art searching - feasibility study

    Get PDF
    This study seeks to understand the feasibility, technical complexities and effectiveness of using artificial intelligence (AI) solutions to improve operational processes of registering IP rights. The Intellectual Property Office commissioned Cardiff University to undertake this research. The research was funded through the BEIS Regulators’ Pioneer Fund (RPF). The RPF fund was set up to help address barriers to innovation in the UK economy

    Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

    Get PDF
    International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

    Multimedia Retrieval

    Get PDF

    Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita

    Get PDF
    Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE
    • …
    corecore