1,624 research outputs found
Combining representations for improved sketch recognition
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 89-96).Sketching is a common means of conveying, representing, and preserving information, and it has become a subject of research as a method for human-computer interaction, specifically in the area of computer-aided design. Digitally collected sketches contain both spatial and temporal information; additionally, they may contain a conceptual structure of shapes and sub shapes. These multiple aspects suggest several ways of representing sketches, each with advantages and disadvantages for recognition. Most existing sketch recognitions systems are based on a single representation and do not use all available information. We propose combining several representations and systems as a way to improve recognition accuracy. This thesis presents two methods for combining recognition systems. The first improves recognition by improving segmentation, while the second seeks to predict how well systems will recognize a given domain or symbol and combine their outputs accordingly. We show that combining several recognition systems based on different representations can improve the accuracy of existing recognition methods.by Sonya J. Cates.Ph.D
Mathematical Expression Recognition based on Probabilistic Grammars
[EN] Mathematical notation is well-known and used all over the
world. Humankind has evolved from simple methods representing
countings to current well-defined math notation able to account for
complex problems. Furthermore, mathematical expressions constitute a
universal language in scientific fields, and many information
resources containing mathematics have been created during the last
decades. However, in order to efficiently access all that information,
scientific documents have to be digitized or produced directly in
electronic formats.
Although most people is able to understand and produce mathematical
information, introducing math expressions into electronic devices
requires learning specific notations or using editors. Automatic
recognition of mathematical expressions aims at filling this gap
between the knowledge of a person and the input accepted by
computers. This way, printed documents containing math expressions
could be automatically digitized, and handwriting could be used for
direct input of math notation into electronic devices.
This thesis is devoted to develop an approach for mathematical
expression recognition. In this document we propose an approach for
recognizing any type of mathematical expression (printed or
handwritten) based on probabilistic grammars. In order to do so, we
develop the formal statistical framework such that derives several
probability distributions. Along the document, we deal with the
definition and estimation of all these probabilistic sources of
information. Finally, we define the parsing algorithm that globally
computes the most probable mathematical expression for a given input
according to the statistical framework.
An important point in this study is to provide objective performance
evaluation and report results using public data and standard
metrics. We inspected the problems of automatic evaluation in this
field and looked for the best solutions. We also report several
experiments using public databases and we participated in several
international competitions. Furthermore, we have released most of the
software developed in this thesis as open source.
We also explore some of the applications of mathematical expression
recognition. In addition to the direct applications of transcription
and digitization, we report two important proposals. First, we
developed mucaptcha, a method to tell humans and computers apart by
means of math handwriting input, which represents a novel application
of math expression recognition. Second, we tackled the problem of
layout analysis of structured documents using the statistical
framework developed in this thesis, because both are two-dimensional
problems that can be modeled with probabilistic grammars.
The approach developed in this thesis for mathematical expression
recognition has obtained good results at different levels. It has
produced several scientific publications in international conferences
and journals, and has been awarded in international competitions.[ES] La notación matemática es bien conocida y se utiliza en todo el
mundo. La humanidad ha evolucionado desde simples métodos para
representar cuentas hasta la notación formal actual capaz de modelar
problemas complejos. Además, las expresiones matemáticas constituyen
un idioma universal en el mundo científico, y se han creado muchos
recursos que contienen matemáticas durante las últimas décadas. Sin
embargo, para acceder de forma eficiente a toda esa información, los
documentos científicos han de ser digitalizados o producidos
directamente en formatos electrónicos.
Aunque la mayoría de personas es capaz de entender y producir
información matemática, introducir expresiones matemáticas en
dispositivos electrónicos requiere aprender notaciones especiales o
usar editores. El reconocimiento automático de expresiones matemáticas
tiene como objetivo llenar ese espacio existente entre el conocimiento
de una persona y la entrada que aceptan los ordenadores. De este modo,
documentos impresos que contienen fórmulas podrían digitalizarse
automáticamente, y la escritura se podría utilizar para introducir
directamente notación matemática en dispositivos electrónicos.
Esta tesis está centrada en desarrollar un método para reconocer
expresiones matemáticas. En este documento proponemos un método para
reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en
gramáticas probabilísticas. Para ello, desarrollamos el marco
estadístico formal que deriva varias distribuciones de probabilidad. A
lo largo del documento, abordamos la definición y estimación de todas
estas fuentes de información probabilística. Finalmente, definimos el
algoritmo que, dada cierta entrada, calcula globalmente la expresión
matemática más probable de acuerdo al marco estadístico.
Un aspecto importante de este trabajo es proporcionar una evaluación
objetiva de los resultados y presentarlos usando datos públicos y
medidas estándar. Por ello, estudiamos los problemas de la evaluación
automática en este campo y buscamos las mejores soluciones. Asimismo,
presentamos diversos experimentos usando bases de datos públicas y
hemos participado en varias competiciones internacionales. Además,
hemos publicado como código abierto la mayoría del software
desarrollado en esta tesis.
También hemos explorado algunas de las aplicaciones del reconocimiento
de expresiones matemáticas. Además de las aplicaciones directas de
transcripción y digitalización, presentamos dos propuestas
importantes. En primer lugar, desarrollamos mucaptcha, un método para
discriminar entre humanos y ordenadores mediante la escritura de
expresiones matemáticas, el cual representa una novedosa aplicación
del reconocimiento de fórmulas. En segundo lugar, abordamos el
problema de detectar y segmentar la estructura de documentos
utilizando el marco estadístico formal desarrollado en esta tesis,
dado que ambos son problemas bidimensionales que pueden modelarse con
gramáticas probabilísticas.
El método desarrollado en esta tesis para reconocer expresiones
matemáticas ha obtenido buenos resultados a diferentes niveles. Este
trabajo ha producido varias publicaciones en conferencias
internacionales y revistas, y ha sido premiado en competiciones
internacionales.[CA] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La
humanitat ha evolucionat des de simples mètodes per representar
comptes fins a la notació formal actual capaç de modelar
problemes complexos. A més, les expressions matemàtiques
constitueixen un idioma universal al món científic, i s'han creat
molts recursos que contenen matemàtiques durant les últimes
dècades. No obstant això, per accedir de forma eficient a tota
aquesta informació, els documents científics han de ser
digitalitzats o produïts directament en formats electrònics.
Encara que la majoria de persones és capaç d'entendre i produir
informació matemàtica, introduir expressions matemàtiques en
dispositius electrònics requereix aprendre notacions especials o usar
editors. El reconeixement automàtic d'expressions matemàtiques
té per objectiu omplir aquest espai existent entre el coneixement
d'una persona i l'entrada que accepten els ordinadors. D'aquesta
manera, documents impresos que contenen fórmules podrien
digitalitzar-se automàticament, i l'escriptura es podria utilitzar per
introduir directament notació matemàtica en dispositius electrònics.
Aquesta tesi està centrada en desenvolupar un mètode per reconèixer
expressions matemàtiques. En aquest document proposem un mètode per
reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en
gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el
marc estadístic formal que deriva diverses distribucions de
probabilitat. Al llarg del document, abordem la definició i estimació
de totes aquestes fonts d'informació probabilística. Finalment,
definim l'algorisme que, donada certa entrada, calcula globalment
l'expressió matemàtica més probable d'acord al marc estadístic.
Un aspecte important d'aquest treball és proporcionar una avaluació
objectiva dels resultats i presentar-los usant dades públiques i
mesures estàndard. Per això, estudiem els problemes de l'avaluació
automàtica en aquest camp i busquem les millors solucions. Així
mateix, presentem diversos experiments usant bases de dades públiques
i hem participat en diverses competicions internacionals. A més, hem
publicat com a codi obert la majoria del software desenvolupat en
aquesta tesi.
També hem explorat algunes de les aplicacions del reconeixement
d'expressions matemàtiques. A més de les aplicacions directes de
transcripció i digitalització, presentem dues propostes
importants. En primer lloc, desenvolupem mucaptcha, un mètode per
discriminar entre humans i ordinadors mitjançant l'escriptura
d'expressions matemàtiques, el qual representa una nova aplicació del
reconeixement de fórmules. En segon lloc, abordem el problema de
detectar i segmentar l'estructura de documents utilitzant el marc
estadístic formal desenvolupat en aquesta tesi, donat que ambdós són
problemes bidimensionals que poden modelar-se amb gramàtiques
probabilístiques.
El mètode desenvolupat en aquesta tesi per reconèixer expressions
matemàtiques ha obtingut bons resultats a diferents nivells. Aquest
treball ha produït diverses publicacions en conferències
internacionals i revistes, i ha sigut premiat en competicions
internacionals.Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665TESI
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Contributions to Medical Image Segmentation and Signal Analysis Utilizing Model Selection Methods
This thesis presents contributions to model selection techniques, especially based on information theoretic criteria, with the goal of solving problems appearing in signal analysis and in medical image representation, segmentation, and compression.The field of medical image segmentation is wide and is quickly developing to make use of higher available computational power. This thesis concentrates on several applications that allow the utilization of parametric models for image and signal representation. One important application is cell nuclei segmentation from histological images. We model nuclei contours by ellipses and thus the complicated problem of separating overlapping nuclei can be rephrased as a model selection problem, where the number of nuclei, their shapes, and their locations define one segmentation. In this thesis, we present methods for model selection in this parametric setting, where the intuitive algorithms are combined with more principled ones, namely those based on the minimum description length (MDL) principle. The results of the introduced unsupervised segmentation algorithm are compared with human subject segmentations, and are also evaluated with the help of a pathology expert.Another considered medical image application is lossless compression. The objective has been to add the task of image segmentation to that of image compression such that the image regions can be transmitted separately, depending on the region of interest for diagnosis. The experiments performed on retinal color images show that our modeling, in which the MDL criterion selects the structure of the linear predictive models, outperforms publicly available image compressors such as the lossless version of JPEG 2000.For time series modeling, the thesis presents an algorithm which allows detection of changes in time series signals. The algorithm is based on one of the most recent implementations of the MDL principle, the sequentially normalized maximum likelihood (SNML) models.This thesis produces contributions in the form of new methods and algorithms, where the simplicity of information theoretic principles are combined with a rather complex and problem dependent modeling formulation, resulting in both heuristically motivated and principled algorithmic solutions
Multiple Object Tracking in Light Microscopy Images Using Graph-based and Deep Learning Methods
Multi-Objekt-Tracking (MOT) ist ein Problem der Bildanalyse, welches die Lokalisierung und Verknüpfung von Objekten in einer Bildsequenz über die Zeit umfasst, mit zahlreichen Anwendungen in Bereichen wie autonomes Fahren, Robotik oder Überwachung. Neben technischen Anwendungsgebieten besteht auch ein großer Bedarf an MOT in biomedizinischen Anwendungen. So können beispielsweise Experimente, die mittels Lichtmikroskopie über mehrere Stunden oder Tage hinweg erfasst wurden, Hunderte oder sogar Tausende von ähnlich aussehenden Objekten enthalten, was eine manuelle Analyse unmöglich macht. Um jedoch zuverlässige Schlussfolgerungen aus den verfolgten Objekten abzuleiten, ist eine hohe Qualität der prädizierten Trajektorien erforderlich. Daher werden domänenspezifische MOT-Ansätze benötigt, die in der Lage sind, die Besonderheiten von lichtmikroskopischen Daten zu berücksichtigen. In dieser Arbeit werden daher zwei neuartige Methoden für das MOT-Problem in Lichtmikroskopie-Bildern erarbeitet sowie Ansätze zum Vergleich der Tracking-Methoden vorgestellt.
Um die Performanz der Tracking-Methode von der Qualität der Segmentierung zu unterscheiden, wird ein Ansatz vorgeschlagen, der es ermöglicht die Tracking-Methode getrennt von der Segmentierung zu analysieren, was auch eine Untersuchung der Robustheit von Tracking-Methoden gegeben verschlechterter Segmentierungsdaten erlaubt. Des Weiteren wird eine graphbasierte Tracking-Methode vorgeschlagen, welche eine Brücke zwischen einfach anzuwendenden, aber weniger performanten Tracking-Methoden und performanten Tracking-Methoden mit vielen schwer einstellbaren Parametern schlägt. Die vorgeschlagene Tracking-Methode hat nur wenige manuell einstellbare Parameter und ist einfach auf 2D- und 3D-Datensätze anwendbar. Durch die Modellierung von Vorwissen über die Form des Tracking-Graphen ist die vorgeschlagene Tracking-Methode außerdem in der Lage, bestimmte Arten von Segmentierungsfehlern automatisch zu korrigieren. Darüber hinaus wird ein auf Deep Learning basierender Ansatz vorgeschlagen, der die Aufgabe der Instanzsegmentierung und Objektverfolgung gleichzeitig in einem einzigen neuronalen Netzwerk erlernt. Außerdem lernt der vorgeschlagene Ansatz Repräsentationen zu prädizieren, die für den Menschen verständlich sind. Um die Performanz der beiden vorgeschlagenen Tracking-Methoden im Vergleich zu anderen aktuellen, domänenspezifischen Tracking-Ansätzen zu zeigen, werden sie auf einen domänenspezifischen Benchmark angewendet. Darüber hinaus werden weitere Bewertungskriterien für Tracking-Methoden eingeführt, welche zum Vergleich der beiden vorgeschlagenen Tracking-Methoden herangezogen werden
Metrics reloaded: Pitfalls and recommendations for image analysis validation
The field of automatic biomedical image analysis crucially depends on robust and meaningful performance metrics for algorithm validation. Current metric usage, however, is often ill-informed and does not reflect the underlying domain interest. Here, we present a comprehensive framework that guides researchers towards choosing performance metrics in a problem-aware manner. Specifically, we focus on biomedical image analysis problems that can be interpreted as a classification task at image, object or pixel level. The framework first compiles domain interest-, target structure-, data set- and algorithm output-related properties of a given problem into a problem fingerprint, while also mapping it to the appropriate problem category, namely image-level classification, semantic segmentation, instance segmentation, or object detection. It then guides users through the process of selecting and applying a set of appropriate validation metrics while making them aware of potential pitfalls related to individual choices. In this paper, we describe the current status of the Metrics Reloaded recommendation framework, with the goal of obtaining constructive feedback from the image analysis community. The current version has been developed within an international consortium of more than 60 image analysis experts and will be made openly available as a user-friendly toolkit after community-driven optimization
A Survey of Knowledge Representation in Service Robotics
Within the realm of service robotics, researchers have placed a great amount
of effort into learning, understanding, and representing motions as
manipulations for task execution by robots. The task of robot learning and
problem-solving is very broad, as it integrates a variety of tasks such as
object detection, activity recognition, task/motion planning, localization,
knowledge representation and retrieval, and the intertwining of
perception/vision and machine learning techniques. In this paper, we solely
focus on knowledge representations and notably how knowledge is typically
gathered, represented, and reproduced to solve problems as done by researchers
in the past decades. In accordance with the definition of knowledge
representations, we discuss the key distinction between such representations
and useful learning models that have extensively been introduced and studied in
recent years, such as machine learning, deep learning, probabilistic modelling,
and semantic graphical structures. Along with an overview of such tools, we
discuss the problems which have existed in robot learning and how they have
been built and used as solutions, technologies or developments (if any) which
have contributed to solving them. Finally, we discuss key principles that
should be considered when designing an effective knowledge representation.Comment: Accepted for RAS Special Issue on Semantic Policy and Action
Representations for Autonomous Robots - 22 Page
Pictures in Your Mind: Using Interactive Gesture-Controlled Reliefs to Explore Art
Tactile reliefs offer many benefits over the more classic raised line drawings or tactile diagrams, as depth, 3D shape, and surface textures are directly perceivable. Although often created for blind and visually impaired (BVI) people, a wider range of people may benefit from such multimodal material. However, some reliefs are still difficult to understand without proper guidance or accompanying verbal descriptions, hindering autonomous exploration.
In this work, we present a gesture-controlled interactive audio guide (IAG) based on recent low-cost depth cameras that can be operated directly with the hands on relief surfaces during tactile exploration. The interactively explorable, location-dependent verbal and captioned descriptions promise rapid tactile accessibility to 2.5D spatial information in a home or education setting, to online resources, or as a kiosk installation at public places.
We present a working prototype, discuss design decisions, and present the results of two evaluation studies: the first with 13 BVI test users and the second follow-up study with 14 test users across a wide range of people with differences and difficulties associated with perception, memory, cognition, and communication. The participant-led research method of this latter study prompted new, significant and innovative developments
- …