9 research outputs found

    A global method for music symbol recognition in typeset music sheets

    Get PDF
    International audienceThis paper presents an optical music recognition (OMR) system that can automatically recognize the main musical symbols of a scanned paper-based music score. Two major stages are distinguished: the first one, using low-level pre-processing, detects the isolated objects and outputs some hypotheses about them; the second one has to take the final correct decision, through high-level processing including contextual information and music writing rules. This article exposes both stages of the method: after explaining in detail the first one, the symbol analysis process, it shows through first experiments that its outputs can efficiently be used as inputs for a high-level decision process

    Semantische Segmentierung von Ankerkomponenten von Elektromotoren

    Get PDF

    Group interaction and group tracking for video-surveillance in underground railway stations

    Get PDF
    International audienceIn this paper we propose an approach to recognize behaviors of groups of people in the subway. Violent behavior or vandalism performed by a group can be detected in order to alert subway security. The proposed system is composed of 3 main layers: the detection of people in the video, the detection and tracking of groups among the detected individuals and the detection of events and scenarios of interest based on tracked actors (groups). The main focus of this paper are the group tracking and event detection layers

    Native language identification of fluent and advanced non-native writers

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio

    Definição de uma biblioteca para apoio à decisão de avaliação de orelhas proeminentes.

    Get PDF
    A utilização de ferramentas digitais, como apoio na prática da ciência médica, começa cada vez mais a ganhar importância, à medida que a tecnologia em geral vai ganhando maturidade, e que os profissionais de saúde também vão ganhando confiança nas mesmas. De tal forma que os profissionais de saúde já começam a procurar software desenvolvido à medida das suas necessidades, não ficando à espera que as empresas que dominam o mercado disponibilizem sistemas que lhes podem (ou não) ser úteis. O objetivo servir de apoio a médicos para o cálculo de um índice fotográfico digital, desenvolvido para auxiliar na decisão clínica formal que motiva a indicação para um procedimento cirúrgico eletivo estético em idade pediátrica, designado por Otoplastia. Para tal, foi necessário criar um sistema que detete da forma mais automática e precisa possível a posição das orelhas, auxiliando no cálculo de medidas fotográficas digitais, de forma a ser averiguada a protrusão auricular nas respetivas imagens, auxiliando na sua caracterização clínica, servindo de suporte num modelo de apoio à decisão clínica para proposição de intervenção cirúrgica corretiva. O valor do índice calculado foi obtido mediante o trabalho de Doutoramento do especialista em cirurgia pediátrica Mestre José Lopes do Santos, utilizando apenas software livre e de código aberto, assim como vocacionado para dispositivos móveis com o sistema operativo Android. Para concretização do objetivo proposto, foi explorado o OpenCV como sistema de processamento de imagem, dada a sua portabilidade para várias plataformas, tendo sido analisadas e aprimoradas diversas abordagens para a deteção automática de posicionamento de elementos faciais. A solução mobile desenvolvida foi avaliada comparando os resultados obtidos com os valores do método de medição digital tradicional calculados através do computados pessoal, tendo contribuindo com sucesso para um mais eficaz tempo de consulta.The usage of digital tools, as medical practice support, is constantly gaining importance as technology evolves, and as health care professionals start gaining confidence on such tools. This is patent in such ways that these professionals are starting to search for custom made software, which suits their needs, instead of waiting for the dominant players in medical software to release systems that may (or may not) be useful to their practice. This is the case of this Masters’ project, which will serve as support to healthcare professionals for the calculation of a photographic index, created to assert the real necessity for a corrective surgical intervention on infants. For such task, an automatic and as precise as possible ear position system will be developed, which shows the index value (automatically calculated) to evaluate if the patient’s ears can be classified as “prominent ears” or not, and by that decide if such patient is a otoplasty surgery candidate or not. The challenge for this work is to, as mentioned, detect and mark as accurately as possible the region of both the patient’s ears, allowing Doctors to easily define manually the exact area of each ear, giving after that the calculated index, based on the Doctoral work of MasterJosé Lopes dos Santos, using only open-source and free software, and directed to mobile devices (with initial focus on Android devices). To tackle on this challenge, OpenCV as image processing system will be explored, due to its portability, and also analyzed the best approaches for automatic head features estimation. To assert the developed solution, a comparison between the efficiency of the developed application and the hand-made calculation done by a doctor will be made

    Video-Based Environment Perception for Automated Driving using Deep Neural Networks

    Get PDF
    Automatisierte Fahrzeuge benötigen eine hochgenaue Umfeldwahrnehmung, um sicher und komfortabel zu fahren. Gleichzeitig müssen die Perzeptionsalgorithmen mit der verfügbaren Rechenleistung die Echtzeitanforderungen der Anwendung erfüllen. Kamerabilder stellen eine sehr wichtige Informationsquelle für automatisierte Fahrzeuge dar. Sie beinhalten mehr Details als Daten von anderen Sensoren wie Lidar oder Radar und sind oft vergleichsweise günstig. Damit ist es möglich, ein automatisiertes Fahrzeug mit einem Surround-View Sensor-Setup auszustatten, ohne die Gesamtkosten zu stark zu erhöhen. In dieser Arbeit präsentieren wir einen effizienten und genauen Ansatz zur videobasierten Umfeldwahrnehmung für automatisierte Fahrzeuge. Er basiert auf Deep Learning und löst die Probleme der Objekterkennung, Objektverfolgung und der semantischen Segmentierung von Kamerabildern. Wir schlagen zunächst eine schnelle CNN-Architektur zur gleichzeitigen Objekterkennung und semantischen Segmentierung vor. Diese Architektur ist skalierbar, so dass Genauigkeit leicht gegen Rechenzeit eingetauscht werden kann, indem ein einziger Skalierungsfaktor geändert wird. Wir modifizieren diese Architektur daraufhin, um Embedding-Vektoren für jedes erkannte Objekt vorherzusagen. Diese Embedding-Vektoren werden als Assoziationsmetrik bei der Objektverfolgung eingesetzt. Sie werden auch für einen neuartigen Algorithmus zur Non-Maximum Suppression eingesetzt, den wir FeatureNMS nennen. FeatureNMS kann in belebten Szenen, in denen die Annahmen des klassischen NMS-Algorithmus nicht zutreffen, einen höheren Recall erzielen. Wir erweitern anschlie{\ss}end unsere CNN-Architektur für Einzelbilder zu einer Mehrbild-Architektur, welche zwei aufeinanderfolgende Videobilder als Eingabe entgegen nimmt. Die Mehrbild-Architektur schätzt den optischen Fluss zwischen beiden Videobildern innerhalb des künstlichen neuronalen Netzwerks. Dies ermöglicht es, einen Verschiebungsvektor zwischen den Videobildern für jedes detektierte Objekt zu schätzen. Diese Verschiebungsvektoren werden ebenfalls als Assoziationsmetrik bei der Objektverfolgung eingesetzt. Zuletzt präsentieren wir einen einfachen Tracking-by-Detection-Ansatz, der wenig Rechenleistung erfordert. Er benötigt einen starken Objektdetektor und stützt sich auf die Embedding- und Verschiebungsvektoren, die von unserer CNN-Architektur geschätzt werden. Der hohe Recall des Objektdetektors führt zu einer häufigen Detektion der verfolgten Objekte. Unsere diskriminativen Assoziationsmetriken, die auf den Embedding- und Verschiebungsvektoren basieren, ermöglichen eine zuverlässige Zuordnung von neuen Detektionen zu bestehenden Tracks. Diese beiden Bestandteile erlauben es, ein einfaches Bewegungsmodell mit Annahme einer konstanten Geschwindigkeit und einem Kalman-Filter zu verwenden. Die von uns vorgestellten Methoden zur videobasierten Umfeldwahrnehmung erreichen gute Resultate auf den herausfordernden Cityscapes- und BDD100K-Datensätzen. Gleichzeitig sind sie recheneffizient und können die Echtzeitanforderungen der Anwendung erfüllen. Wir verwenden die vorgeschlagene Architektur erfolgreich innerhalb des Wahrnehmungs-Moduls eines automatisierten Versuchsfahrzeugs. Hier hat sie sich in der Praxis bewähren können

    Konvolutionäre neuronale Netze in der industriellen Bildverarbeitung und Robotik

    Get PDF
    In the first part of this dissertation, a framework for the design of a CNN for FPGAs is presented, consisting of a preprocessing algorithm, an augmentation technique, a custom quantization scheme and a pruning step of the CNN. The combination of conventional image processing with neural networks is shown in the second part by an example from robotics, where an image-based visual servoing process is successfully conducted for a gripping process of a robot

    Konvolutionäre neuronale Netze in der industriellen Bildverarbeitung und Robotik

    Get PDF
    In the first part of this dissertation, a framework for the design of a CNN for FPGAs is presented, consisting of a preprocessing algorithm, an augmentation technique, a custom quantization scheme and a pruning step of the CNN. The combination of conventional image processing with neural networks is shown in the second part by an example from robotics, where an image-based visual servoing process is successfully conducted for a gripping process of a robot

    Forum Bildverarbeitung 2020

    Get PDF
    Image processing plays a key role for fast and contact-free data acquisition in many technical areas, e.g., in quality control or robotics. These conference proceedings of the “Forum Bildverarbeitung”, which took place on 26.-27.11.202 in Karlsruhe as a common event of the Karlsruhe Institute of Technology and the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation, contain the articles of the contributions
    corecore