68 research outputs found

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    Apraxia World: Deploying a Mobile Game and Automatic Speech Recognition for Independent Child Speech Therapy

    Get PDF
    Children with speech sound disorders typically improve pronunciation quality by undergoing speech therapy, which must be delivered frequently and with high intensity to be effective. As such, clinic sessions are supplemented with home practice, often under caregiver supervision. However, traditional home practice can grow boring for children due to monotony. Furthermore, practice frequency is limited by caregiver availability, making it difficult for some children to reach therapy dosage. To address these issues, this dissertation presents a novel speech therapy game to increase engagement, and explores automatic pronunciation evaluation techniques to afford children independent practice. Children with speech sound disorders typically improve pronunciation quality by undergoing speech therapy, which must be delivered frequently and with high intensity to be effective. As such, clinic sessions are supplemented with home practice, often under caregiver supervision. However, traditional home practice can grow boring for children due to monotony. Furthermore, practice frequency is limited by caregiver availability, making it difficult for some children to reach therapy dosage. To address these issues, this dissertation presents a novel speech therapy game to increase engagement, and explores automatic pronunciation evaluation techniques to afford children independent practice. The therapy game, called Apraxia World, delivers customizable, repetition-based speech therapy while children play through platformer-style levels using typical on-screen tablet controls; children complete in-game speech exercises to collect assets required to progress through the levels. Additionally, Apraxia World provides pronunciation feedback according to an automated pronunciation evaluation system running locally on the tablet. Apraxia World offers two advantages over current commercial and research speech therapy games; first, the game provides extended gameplay to support long therapy treatments; second, it affords some therapy practice independence via automatic pronunciation evaluation, allowing caregivers to lightly supervise instead of directly administer the practice. Pilot testing indicated that children enjoyed the game-based therapy much more than traditional practice and that the exercises did not interfere with gameplay. During a longitudinal study, children made clinically-significant pronunciation improvements while playing Apraxia World at home. Furthermore, children remained engaged in the game-based therapy over the two-month testing period and some even wanted to continue playing post-study. The second part of the dissertation explores word- and phoneme-level pronunciation verification for child speech therapy applications. Word-level pronunciation verification is accomplished using a child-specific template-matching framework, where an utterance is compared against correctly and incorrectly pronounced examples of the word. This framework identified mispronounced words better than both a standard automated baseline and co-located caregivers. Phoneme-level mispronunciation detection is investigated using a technique from the second-language learning literature: training phoneme-specific classifiers with phonetic posterior features. This method also outperformed the standard baseline, but more significantly, identified mispronunciations better than student clinicians

    Legal Knowledge and Information Systems - JURIX 2017: The Thirtieth Annual Conference

    Get PDF
    The proceedings of the 30th International Conference on Legal Knowledge and Information Systems – JURIX 2017. For three decades, the JURIX conferences have been held under the auspices of the Dutch Foundation for Legal Knowledge Based Systems (www.jurix.nl). In the time, it has become a European conference in terms of the diverse venues throughout Europe and the nationalities of participants

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    A Software Engineered Voice-Enabled Job Recruitment Portal System

    Get PDF
    The inability of job seekers to get timely job information regarding the status of the application submitted via conventional job portal system which is usually dependent on accessibility to the Internet has made so many job applicants to lose their placements. Worse still, the epileptic services offered by Internet Service Providers and the poor infrastructures in most developing countries have greatly hindered the expected benefits from Internet usage. These have led to cases of online vacancies notifications unattended to simply because a job seeker is neither aware nor has access to the Internet. With an increasing patronage of mobile phones, a self-service job vacancy notification with audio functionality or an automated job vacancy notification to all qualified job seekers through mobile phones will simply provide a solution to these challenges. In this paper, we present a Voice-enabled Job Recruitment Portal (JRP) System. The system is accessed through two interfaces – the voice user’s interface (VUI) and web interface. The VUI was developed using VoiceXML and the web interface using PHP, and both interfaces integrated with Apache and MySQL as the middleware and back-end component respectively. The JRP proposed in this paper takes the hassle of job hunting from job seekers, provides job status information in real-time to the job seeker and offers other benefits such as, cost, effectiveness, speed, accuracy, ease of documentation, convenience and better logistics to the employer in seeking the right candidate for a job

    Sensor-rich real-time adaptive gesture and affordance learning platform for electronic music control

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2004.Includes bibliographical references (p. [151]-156).Acoustic musical instruments have traditionally featured static mappings from input gesture to output sound, their input affordances being tied to the physics of their sound-production mechanism. More recently, the advent of digital sound synthesizers and electronic music controllers has abolished the tight coupling between input gesture and resultant sound, making an exponentially large range of input-to-output mappings possible, as well as an infinite set of possible timbres. This revolutionary change in the way sound can be produced and controlled brings with it the burden of design: Compelling and natural mappings from gesture to sound now must be created in order to create a playable electronic music instrument. The goal of this thesis is to present a device that allows flexible assignment of input gesture to output sound, so acting as a laboratory to help further understanding about the connection from gesture to sound. An embodied multi-degree-of-freedom gestural input device was constructed. The device was built to support six-degree-of-freedom inertial sensing, five isometric buttons, two digital buttons, two-axis bend sensing, isometric rotation sensing, and isotonic electric field sensing of position. Software was written to handle the incoming serial data, and to implement a trainable interface by which a user can explore the sounds possible with the device, associate a custom inertial gesture with a sound for later playback, make custom input degree-of-freedom (DOF) to effect modulation mappings, and play with the resulting configuration. A user study with 25 subjects was run to evaluate the system in terms of its engaging-ness, enjoyability, ability to inspire interest in future play and performance,(cont.) ease of gesturing and novelty. In addition to these subjective measures, implicit data was collected about the types of gesture-to-sound and input-DOF-to-effect mappings that the subjects created. Favorable and interesting results were found in the data from the study, indicating that a flexible trainable musical instrument is not only a compelling performance tool, but is a useful laboratory for understanding the connection between human gesture and sound.by Jeffrey Merrill.S.M

    Applied and Computational Linguistics

    Get PDF
    Розглядається сучасний стан прикладної та комп’ютерної лінгвістики, проаналізовано лінгвістичні теорії 20-го – початку 21-го століть під кутом розмежування різних аспектів мови з метою формалізованого опису у електронних лінгвістичних ресурсах. Запропоновано критичний огляд таких актуальних проблем прикладної (комп’ютерної) лінгвістики як укладання комп’ютерних лексиконів та електронних текстових корпусів, автоматична обробка природної мови, автоматичний синтез та розпізнавання мовлення, машинний переклад, створення інтелектуальних роботів, здатних сприймати інформацію природною мовою. Для студентів та аспірантів гуманітарного профілю, науково-педагогічних працівників вищих навчальних закладів України

    Designing Embodied Interactive Software Agents for E-Learning: Principles, Components, and Roles

    Get PDF
    Embodied interactive software agents are complex autonomous, adaptive, and social software systems with a digital embodiment that enables them to act on and react to other entities (users, objects, and other agents) in their environment through bodily actions, which include the use of verbal and non-verbal communicative behaviors in face-to-face interactions with the user. These agents have been developed for various roles in different application domains, in which they perform tasks that have been assigned to them by their developers or delegated to them by their users or by other agents. In computer-assisted learning, embodied interactive pedagogical software agents have the general task to promote human learning by working with students (and other agents) in computer-based learning environments, among them e-learning platforms based on Internet technologies, such as the Virtual Linguistics Campus (www.linguistics-online.com). In these environments, pedagogical agents provide contextualized, qualified, personalized, and timely assistance, cooperation, instruction, motivation, and services for both individual learners and groups of learners. This thesis develops a comprehensive, multidisciplinary, and user-oriented view of the design of embodied interactive pedagogical software agents, which integrates theoretical and practical insights from various academic and other fields. The research intends to contribute to the scientific understanding of issues, methods, theories, and technologies that are involved in the design, implementation, and evaluation of embodied interactive software agents for different roles in e-learning and other areas. For developers, the thesis provides sixteen basic principles (Added Value, Perceptible Qualities, Balanced Design, Coherence, Consistency, Completeness, Comprehensibility, Individuality, Variability, Communicative Ability, Modularity, Teamwork, Participatory Design, Role Awareness, Cultural Awareness, and Relationship Building) plus a large number of specific guidelines for the design of embodied interactive software agents and their components. Furthermore, it offers critical reviews of theories, concepts, approaches, and technologies from different areas and disciplines that are relevant to agent design. Finally, it discusses three pedagogical agent roles (virtual native speaker, coach, and peer) in the scenario of the linguistic fieldwork classes on the Virtual Linguistics Campus and presents detailed considerations for the design of an agent for one of these roles (the virtual native speaker)
    corecore