2 research outputs found

    Abordagens livres de segmentação para reconhecimento automático de cadeias numéricas manuscritas utilizando aprendizado profundo

    Get PDF
    Orientador: Prof Dr. Luiz Eduardo Soares de OliveiraCoorientador: Prof. Dr. Robert SabourinTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Tecnologia. Defesa : Curitiba, 12/03/2019Inclui referências: p.83-90Resumo: Nas ultimas decadas, o reconhecimento de cadeias numericas manuscritas tem sido abordado de maneira similar por varios autores, no que se refere ao tratamento de digitos conectados. A necessidade de segmentar esses componentes e um consenso. Dessa forma, as propostas se concentram em determinar os pontos de segmentacao aplicando heuristicas sobre caracteristicas extraidas do objeto, plano de fundo, contorno, entre outras. No entanto, a producao de digitos fragmentados, ocasionando a sobre-segmentacao da cadeia, e um problema comum entre essas abordagem. Assim, as metologias sao categorizadas pela forma como manipulam os componentes resultantes desse processo: (a) Naquelas que produzem apenas uma segmentacao possivel, ou (b) naquelas que definem um conjunto de hipoteses de segmentacao, alem de um metodo de fusao para determinar a hipotese mais provavel. Apesar da segunda categoria apresentar taxas de reconhecimento mais elevadas, o custo computacional torna-se um aspecto desfavoravel, devido as recorrentes consultas ao classificador pelas inumeras hipoteses produzidas. Alem disso, a variabilidade dos tipos de conexao entre os digitos e a falta de contexto, como a informacao sobre a quantidade de digitos, denotam a limitacao de abordagens baseadas em processos heuristicos. Visando evitar estes problemas, evidenciamos ser possivel superar os metodos tradicionais implementando modelos baseados em aprendizado profundo para classificar digitos conectados diretamente, reduzindo a etapa de segmentacao a um processo de deteccao de componente conexo. Alem disso, aproveitando os avancos na area de deteccao de objetos, apresentamos uma nova abordagem para o problema, na qual, digitos passam a ser compreendidos como objetos em uma imagem e neste cenario, uma sequencia de digitos e uma sequencia de objetos. Para validar nossas hipoteses, experimentos realizados em bases de conhecimento geral avaliaram nossas propostas com os trabalhos presentes na literatura em termos de reconhecimento, correta segmentacao e custo computacional. Os resultados atingiram taxas de reconhecimento em torno 97% quando aplicado a uma base de duplas de digitos conectados e 95% para as amostras de cadeias da base NIST SD19, superando os niveis do estado da arte. Alem das altas taxas de reconhecimento, tambem houve significativa reducao de consultas ao classificador (custo computacional), principalmente em casos complexos, superando o desempenho dos trabalhos presentes no estado da arte, denotando o potencial das abordagens propostas.Abstract: Over the last decades, the recognition of handwritten digit strings has been approached in a similar way by several authors, regarding the connected digits issue. The segmentation of these components is a consensus. In this way, the approaches attempt to determining the segmentation points by applying heuristics on features extracted from the object, background, contour, etc. However, the production of fragmented digits, causing the over-segmentation of the string is a common problem among these approaches. Thus, the methodologies are categorized by the way they manipulate the components resulting from this process: (a) those ones that produce only a possible segmentation, or (b) those ones that define a set of segmentation hypotheses and a fusion method to determine the best hypothesis. Although the second category has higher recognition rates, the computational cost becomes an unfavorable aspect, due to the recurrent classifier calls to classify the hypotheses produced. Therefore, the variability of the connection types and the lack of context, such as the number of digits present in the string, denote the limitation of approaches based on heuristic processes. In order to avoid these problems, we believe that is possible to overcome traditional methods by implementing models based on deep learning to classify connected digits directly, reducing the segmentation step to a connected component detection process. In addition, taking advantage of advances of object detection field, we propose a new approach to the problem, in which, digits are understood as objects in an image and in this scenario, a sequence of digits is a sequence of objects. To validate our hypotheses, experiments were carried out in well-known datasets, evaluating our proposals against state-of-art in terms of recognition, correct segmentation and computational cost. The results achieved recognition rates of 97% when applied to a base of connected digit pairs, and 95% for the NIST SD19 samples, surpassing state-of-art levels. Besides the high recognition rates, it has a significant reduction in terms of classifier calls (computational cost), especially in complex cases, surpassing the performance of the works present in the state of the art, denoting the potential of the proposed approaches

    Geometric data understanding : deriving case specific features

    Get PDF
    There exists a tradition using precise geometric modeling, where uncertainties in data can be considered noise. Another tradition relies on statistical nature of vast quantity of data, where geometric regularity is intrinsic to data and statistical models usually grasp this level only indirectly. This work focuses on point cloud data of natural resources and the silhouette recognition from video input as two real world examples of problems having geometric content which is intangible at the raw data presentation. This content could be discovered and modeled to some degree by such machine learning (ML) approaches like deep learning, but either a direct coverage of geometry in samples or addition of special geometry invariant layer is necessary. Geometric content is central when there is a need for direct observations of spatial variables, or one needs to gain a mapping to a geometrically consistent data representation, where e.g. outliers or noise can be easily discerned. In this thesis we consider transformation of original input data to a geometric feature space in two example problems. The first example is curvature of surfaces, which has met renewed interest since the introduction of ubiquitous point cloud data and the maturation of the discrete differential geometry. Curvature spectra can characterize a spatial sample rather well, and provide useful features for ML purposes. The second example involves projective methods used to video stereo-signal analysis in swimming analytics. The aim is to find meaningful local geometric representations for feature generation, which also facilitate additional analysis based on geometric understanding of the model. The features are associated directly to some geometric quantity, and this makes it easier to express the geometric constraints in a natural way, as shown in the thesis. Also, the visualization and further feature generation is much easier. Third, the approach provides sound baseline methods to more traditional ML approaches, e.g. neural network methods. Fourth, most of the ML methods can utilize the geometric features presented in this work as additional features.Geometriassa käytetään perinteisesti tarkkoja malleja, jolloin datassa esiintyvät epätarkkuudet edustavat melua. Toisessa perinteessä nojataan suuren datamäärän tilastolliseen luonteeseen, jolloin geometrinen säännönmukaisuus on datan sisäsyntyinen ominaisuus, joka hahmotetaan tilastollisilla malleilla ainoastaan epäsuorasti. Tämä työ keskittyy kahteen esimerkkiin: luonnonvaroja kuvaaviin pistepilviin ja videohahmontunnistukseen. Nämä ovat todellisia ongelmia, joissa geometrinen sisältö on tavoittamattomissa raakadatan tasolla. Tämä sisältö voitaisiin jossain määrin löytää ja mallintaa koneoppimisen keinoin, esim. syväoppimisen avulla, mutta joko geometria pitää kattaa suoraan näytteistämällä tai tarvitaan neuronien lisäkerros geometrisia invariansseja varten. Geometrinen sisältö on keskeinen, kun tarvitaan suoraa avaruudellisten suureiden havainnointia, tai kun tarvitaan kuvaus geometrisesti yhtenäiseen dataesitykseen, jossa poikkeavat näytteet tai melu voidaan helposti erottaa. Tässä työssä tarkastellaan datan muuntamista geometriseen piirreavaruuteen kahden esimerkkiohjelman suhteen. Ensimmäinen esimerkki on pintakaarevuus, joka on uudelleen virinneen kiinnostuksen kohde kaikkialle saatavissa olevan datan ja diskreetin geometrian kypsymisen takia. Kaarevuusspektrit voivat luonnehtia avaruudellista kohdetta melko hyvin ja tarjota koneoppimisessa hyödyllisiä piirteitä. Toinen esimerkki koskee projektiivisia menetelmiä käytettäessä stereovideosignaalia uinnin analytiikkaan. Tavoite on löytää merkityksellisiä paikallisen geometrian esityksiä, jotka samalla mahdollistavat muun geometrian ymmärrykseen perustuvan analyysin. Piirteet liittyvät suoraan johonkin geometriseen suureeseen, ja tämä helpottaa luonnollisella tavalla geometristen rajoitteiden käsittelyä, kuten väitöstyössä osoitetaan. Myös visualisointi ja lisäpiirteiden luonti muuttuu helpommaksi. Kolmanneksi, lähestymistapa suo selkeän vertailumenetelmän perinteisemmille koneoppimisen lähestymistavoille, esim. hermoverkkomenetelmille. Neljänneksi, useimmat koneoppimismenetelmät voivat hyödyntää tässä työssä esitettyjä geometrisia piirteitä lisäämällä ne muiden piirteiden joukkoon
    corecore