28 research outputs found

    Recognition Technology for Four Arithmetic Operations

    Get PDF
    Numeral recognition is an important research direction in field of pattern recognition, and it has broad application prospects. Aiming at four arithmetic operations of general printed formats, this article adopts a multiple hybrid recognition method and is applied to automatically calculating. This method mainly uses BP neural network and template matching method to distinguish the numerals and operators, in order to increase the operation speed and recognition accuracy. Sample images of four arithmetic operations are extracted from printed books, and they are used for testing the performance of proposed recognition method. The experiments show that the method provides correct recognition rate of 96% and correct calculation rate of 89%

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    How much do we know about the User-Item Matrix?: Deep Feature Extraction for Recommendation

    Get PDF
    Collaborative filtering-based recommender systems typically operate on a high-dimensional sparse user-item matrix. Matrix completion is one of the most common formulations where rows and columns represent users and items, and predicting user’s ratings in items corresponds to filling in the missing entries of the matrix. In practice, it is a very challenging task to predict one's interest based on millions of other users having each seen a small subset of thousands of items. We considered how to extract the key features of users and items in the rating matrix to capture their features in a low-dimensional vector and how to create embeddings that well represent the characteristics of users and items by exploring what kind of user/item information to use in the matrix. However, recent studies have focused on utilising side information, such as user's age or movie's genre, but it is not always available and is hard to extract. More importantly, there has been no recent research on how to efficiently extract the important latent features from a sparse data matrix with no side information (1st problem). The next (2nd) problem is that most matrix completion techniques have mainly focused on semantic similarity between users and items with data structure transformation from a rating matrix to a user/item similarity matrix or a graph, neglecting the position of each element (user, item and rating) in the matrix. However, we think that a position is one of the fundamental points in matrix completion, since a specific point to be filled is presented based on the positions of its row and column in the matrix. In order to address the first (1st) problem, we aim to generalise and represent a high-dimensional sparse user-item matrix entry into a low-dimensional space with a small number of important features, and propose a Global-Local Kernel-based matrix completion framework, named GLocal-K, which is divided into two major stages. First, we pre-train an autoencoder with the local kernelised weight matrix, which transforms the data from one space into the feature space by using a 2d-RBF kernel. Then, the pre-trained autoencoder is fine-tuned with the rating matrix, produced by a convolution-based global kernel, which captures the characteristics of each item. GLocal-K outperforms the state-of-the-art baselines on three collaborative filtering benchmarks. However, it cannot show its superior feature extraction ability when the data is very large or too extremely sparse. For the aforementioned second (2nd) problem and the GLocal-K's limitation, we propose a novel position-enhanced user/item representation training model for recommendation, SUPER-Rec. We first capture the rating position in a matrix using relative positional rating encoding and store the position-enhanced rating information and its user-item relationship to a fixed dimension of embedding that is not affected by the matrix size. Then, we apply the trained position-enhanced user and item representations to the simplest traditional machine learning models to highlight the pure novelty of the SUPER-Rec representation. We contribute to the first formal introduction and quantitative analysis of the position-enhanced user/item representation in the recommendation domain and produce a principled discussion about SUPER-Rec with the incredibly excellent RMSE/MAE/NDCG/AUC results (i.e., both rating and ranking prediction accuracy) by an enormous margin compared with various state-of-the-art matrix completion models on both explicit and implicit feedback datasets. For example, SUPER-Rec showed the 28.2% RMSE error decrease in ML-1M compared to the best baseline, while the error decrease by 0.3% to 4.1% was prevalent among all the baselines

    Sistema baseado em técnicas de compressão para o reconhecimento de dígitos manuscritos

    Get PDF
    Mestrado em Engenharia Eletrónica e TelecomunicaçõesO reconhecimento de dígitos manuscritos é uma habilidade humana adquirida. Com pouco esforço, um humano pode reconhecer adequadamente em milissegundos uma sequência de dígitos manuscritos. Com o auxílio de um computador, esta tarefa de reconhecimento pode ser facilmente automatizada, melhorando um número significativo de processos. A separação do correio postal, a verificação de cheques bancários e operações que têm como entrada de dados dígitos manuscritos estão incluídas num amplo conjunto de aplicações que podem ser realizadas de forma mais eficaz e automatizada. Nos últimos anos, várias técnicas e métodos foram propostos para automatizar o mecanismo de reconhecimento de dígitos manuscritos. No entanto, para resolver esta desafiante questão de reconhecimento de imagem são utilizadas técnicas complexas e computacionalmente muito exigentes de machine learning, como é o caso do deep learning. Nesta dissertação é introduzida uma nova solução para o problema do reconhecimento de dígitos manuscritos, usando métricas de similaridade entre imagens de dígitos. As métricas de similaridade são calculadas com base na compressão de dados, nomeadamente pelo uso de Modelos de Contexto Finito.The Recognition of Handwritten Digits is a human-acquired ability. With little e ort, a human can properly recognize, in milliseconds, a sequence of handwritten digits. With the help of a computer, the task of handwriting recognition can be easily automated, improving and making a signi cant number of processes faster. The postal mail sorting, bank check veri cation and handwritten digit data entry operations are in a wide group of applications that can be performed in a more e ective and automated way. In the recent past years, a number of techniques and methods have been proposed to automate the handwritten digit recognition mechanism. However, to solve this challenging question of image recognition, there are used complex and computationally demanding machine learning techniques, as it is the case of deep learning. In this dissertation is introduced a novel solution to the problem of handwritten digit recognition, using metrics of similarity between digit images. The metrics are computed based on data compression, namely by the use of Finite Context Models

    Deep learning in food category recognition

    Get PDF
    Integrating artificial intelligence with food category recognition has been a field of interest for research for the past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The modern advent of big data and the development of data-oriented fields like deep learning have provided advancements in food category recognition. With increasing computational power and ever-larger food datasets, the approach’s potential has yet to be realized. This survey provides an overview of methods that can be applied to various food category recognition tasks, including detecting type, ingredients, quality, and quantity. We survey the core components for constructing a machine learning system for food category recognition, including datasets, data augmentation, hand-crafted feature extraction, and machine learning algorithms. We place a particular focus on the field of deep learning, including the utilization of convolutional neural networks, transfer learning, and semi-supervised learning. We provide an overview of relevant studies to promote further developments in food category recognition for research and industrial applicationsMRC (MC_PC_17171)Royal Society (RP202G0230)BHF (AA/18/3/34220)Hope Foundation for Cancer Research (RM60G0680)GCRF (P202PF11)Sino-UK Industrial Fund (RP202G0289)LIAS (P202ED10Data Science Enhancement Fund (P202RE237)Fight for Sight (24NN201);Sino-UK Education Fund (OP202006)BBSRC (RM32G0178B8

    Kısmi ve tam yüz görüntüleri üzerinde makine öğrenmesi yöntemleriyle yüz ifadesi tespiti

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Yüz ifadeleri insanlar arası iletişimin önemli bir parçası olduğu gibi insan makine etkileşiminde de önemli rol oynamaktadır. Suçlu tespiti, sürücü dikkatinin izlenmesi, hasta takibi gibi önemli konularda karar vermede yüz ifadesi tespiti kullanılmaktadır. Bu sebeple, yüz ifadelerinin sistemler aracılığı ile otomatik tespiti popüler bir makine öğrenmesi çalışma alanıdır. Bu tez çalışmasında yüz ifadesi sınıflandırma çalışmaları yapılmıştır. Yapılan yüz ifadesi tespiti uygulamaları genel olarak iki başlık altında toplanabilir. Bunlardan ilki kısmi yüz görüntülerinin klasik makine öğrenmesi yöntemleriyle analizi ve ikincisi ise tüm yüz görüntülerinin derin öğrenme yöntemleri ile analiz edilmesidir. Geliştirilen ilk uygulamada, yüz görüntülerinden duygu tespiti için literatürdeki çalışmalardan farklı olarak sadece göz ve kaşların bulunduğu bölgeler kullanılarak sınıflandırma yapılmış ve yüksek başarım elde edilmiştir. Önerilen bu yöntem sayesinde yüz ifadesi tespitleri alt yüz kapanmalarından veya ağız hareketlerinden etkilenmeyecek, gürbüz özniteliklerin seçimi ile daha az öznitelikle sınırlı kaynaklara sahip cihazlarda çalışabilecek niteliktedir. Ayrıca önerilen sistemin genelleme yeteneğinin yüksek olduğu karşılaştırmalı olarak deneysel çalışmalarla ortaya konulmuştur. Tez kapsamında yapılan diğer yüz ifadesi sınıflandırma çalışmaları tüm yüz görüntüleri kullanılarak derin öğrenme yöntemleri ile gerçeklenmiştir. Önerilen yaklaşımlardan birisi yüz bölütleme çalışmasıdır. Bu çalışmalar ile elde edilen bölütlenmiş görüntüde yüz ifadesi ile ilgili öznitelikler korunmakta, kişisel herhangi bir veri saklanmamakta ve böylece kişisel gizlilik de korunmuş olmaktadır. Ayrıca bölütlenmiş görüntü ile orijinal yüz görüntüsünün birleşimi; yüz ifadesi için önemli olan kaş, göz ve ağız bölgelerine odaklanılarak yüz ifadelerinin tanınma başarımının arttırılması sağlamıştır.Facial expressions are important for interpersonal communication also play an important role in human machine interaction. Facial expressions are used in many areas such as criminal detection, driver attention monitoring, patient monitoring. Therefore, automatic facial expression recognition systems are a popular machine learning problem. In this thesis study, facial expression recognition studies are performed. In general, the applications of facial expression recognition can be grouped under two topic in this thesis: analysis of partial facial images with classical machine learning methods and analysis of whole facial images with deep learning methods. In the first application, classification of the facial expressions from facial images was performed using only eye and eyebrows regions. This approach is different from the studies which are studied facial expression recognition in the literature and high success rate was achieved. With this approach, proposed system is more robust for under facial occlusions and mouth motion during speech. Further, according to our experiments, the generalization ability of the proposed system is high. In this thesis, the rest of the facial expression recognition applications was developed with whole face images using deep learning techniques. One of the proposed methods is segmentation of facial parts with CNN. After segmentation process, facial segmented images were obtained. With this segmented images, personal privacy is protected because the segmented images don't include any personal information. Also, the success rate of the classification was increased with combining original raw image and segmented image. Because; eyes, eyebrows and mouth are crucial for facial expression recognition and segmented images have these areas. Therefore, the proposed CNN architecture for classification forces the earlier layers of the CNN system to learn to detect and localize the facial regions, thus providing decoupled and guided training

    Nonlinear Parametric and Neural Network Modelling for Medical Image Classification

    Get PDF
    System identification and artificial neural networks (ANN) are families of algorithms used in systems engineering and machine learning respectively that use structure detection and learning strategies to build models of complex systems by taking advantage of input-output type data. These models play an essential role in science and engineering because they fill the gap in those cases where we know the input-output behaviour of a system, but there is not a mathematical model to understand and predict its changes in future or even prevent threats. In this context, the nonlinear approximation of systems is nowadays very popular since it better describes complex instances. On the other hand, digital image processing is an area of systems engineering that is expanding the analysis dimension level in a variety of real-life problems while it is becoming more attractive and affordable over time. Medicine has made the most of it by supporting important human decision-making processes through computer-aided diagnosis (CAD) systems. This thesis presents three different frameworks for breast cancer detection, with approaches ranging from nonlinear system identification, nonlinear system identification coupled with simple neural networks, to multilayer neural networks. In particular, the nonlinear system identification approaches termed the Nonlinear AutoRegressive with eXogenous inputs (NARX) model and the MultiScales Radial Basis Function (MSRBF) neural networks appear for the first time in image processing. Along with the above contributions takes place the presentation of the Multilayer-Fuzzy Extreme Learning Machine (ML-FELM) neural network for faster training and more accurate image classification. A central research aim is to take advantage of nonlinear system identification and multilayer neural networks to enhance the feature extraction process, while the classification in CAD systems is bolstered. In the case of multilayer neural networks, the extraction is carried throughout stacked autoencoders, a bottleneck network architecture that promotes a data transformation between layers. In the case of nonlinear system identification, the goal is to add flexible models capable of capturing distinctive features from digital images that might be shortly recognised by simpler approaches. The purpose of detecting nonlinearities in digital images is complementary to that of linear models since the goal is to extract features in greater depth, in which both linear and nonlinear elements can be captured. This aim is relevant because, accordingly to previous work cited in the first chapter, not all spatial relationships existing in digital images can be explained appropriately with linear dependencies. Experimental results show that the methodologies based on system identification produced reliable images models with customised mathematical structure. The models came to include nonlinearities in different proportions, depending upon the case under examination. The information about nonlinearity and model structure was used as part of the whole image model. It was found that, in some instances, the models from different clinical classes in the breast cancer detection problem presented a particular structure. For example, NARX models of the malignant class showed higher non-linearity percentage and depended more on exogenous inputs compared to other classes. Regarding classification performance, comparisons of the three new CAD systems with existing methods had variable results. As for the NARX model, its performance was superior in three cases but was overcame in two. However, the comparison must be taken with caution since different databases were used. The MSRBF model was better in 5 out of 6 cases and had superior specificity in all instances, overcoming in 3.5% the closest model in this line. The ML-FELM model was the best in 6 out of 6 cases, although it was defeated in accuracy by 0.6% in one case and specificity in 0.22% in another one

    Design and Implementation of Hardware Accelerators for Neural Processing Applications

    Full text link
    Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations
    corecore