3,073 research outputs found
Regularization and Kernelization of the Maximin Correlation Approach
Robust classification becomes challenging when each class consists of
multiple subclasses. Examples include multi-font optical character recognition
and automated protein function prediction. In correlation-based
nearest-neighbor classification, the maximin correlation approach (MCA)
provides the worst-case optimal solution by minimizing the maximum
misclassification risk through an iterative procedure. Despite the optimality,
the original MCA has drawbacks that have limited its wide applicability in
practice. That is, the MCA tends to be sensitive to outliers, cannot
effectively handle nonlinearities in datasets, and suffers from having high
computational complexity. To address these limitations, we propose an improved
solution, named regularized maximin correlation approach (R-MCA). We first
reformulate MCA as a quadratically constrained linear programming (QCLP)
problem, incorporate regularization by introducing slack variables in the
primal problem of the QCLP, and derive the corresponding Lagrangian dual. The
dual formulation enables us to apply the kernel trick to R-MCA so that it can
better handle nonlinearities. Our experimental results demonstrate that the
regularization and kernelization make the proposed R-MCA more robust and
accurate for various classification tasks than the original MCA. Furthermore,
when the data size or dimensionality grows, R-MCA runs substantially faster by
solving either the primal or dual (whichever has a smaller variable dimension)
of the QCLP.Comment: Submitted to IEEE Acces
Accuracy Affecting Factors for Optical Handwritten Character Recognition
Optiline kirjatuvastus viitab tehnikale, mis konverteerib trükitud, kirjutatud või prinditud teksi masinkodeeritud tekstiks, võimaldades sellega paberdokumentide nagu passide, arvete, meditsiiniliste vormide või tšekkide automaatset töötlemist. Mustrituvastus, tehisintellekt ja arvuti nägemine on kõik teadusharud, mis võimaldavad optilist kirjatuvastust. Optilise kirjatuvastuse kasutus võimaldaks paljudel kasvavatel informatsiooni süsteemidel mugavat üleminekut paberformaadilt digitaalsele. Tänapäeval on optilisest kirjatuvastusest väljaskasvanud mitme sammuline protsess: segmenteerimine, andmete eeltöötlus, iseloomulike tunnuste tuletamine, klassifitseerimine, andmete järeltöötlus ja rakenduse spetsiifiline optimiseerimine. See lõputöö pakub välja tehnikaid, millega üleüldiselt tõsta optiliste kirjatuvastussüsteemide täpsust, näidates eeltöötluse, iseloomulike tunnuste tuletamise ja morfoloogilise töötluse mõju. Lisaks võrreldakse erinevate enimkasutatud klassifitseerijate tulemusi. Kasutades selles töös mainitud meetodeid saavutati täpsus üle 98% ja koguti märkimisväärselt suur andmebaas käsitsi kirjutatud jaapani keele hiragana tähestiku tähti.Optical character recognition (OCR) refers to a technique that converts images of typed, handwritten or printed text into machine-encoded text enabling automatic processing paper records such as passports, invoices, medical forms, receipts, etc. Pattern recognition, artificial intelligence and computer vision are all research fields that enable OCR. Using OCR on handwritten text could greatly benefit many of the emerging information systems by ensuring smooth transition from paper format to digital world. Nowadays, OCR has evolved into a multi-step process: segmentation, pre-processing, feature extraction, classification, post-processing and application-specific optimization. This thesis proposes techniques to improve the overall accuracy of the OCR systems by showing the affects of pre-processing, feature extraction and morphological processing. It also compares accuracies of different well-known and commonly used classifiers in the field. Using the proposed techniques an accuracy of over 98% was achieved. Also a dataset of handwritten Japanese Hiragana characters with a considerable variability was collected as a part of this thesis
Artificial Eye for the Blind
The main backbone of our Artificial Eye model is the Raspberry pi3 which is
connected to the webcam ,ultrasonic proximity sensor, speaker and we also run
all our software models i.e object detection, Optical Character recognition,
google text to speech conversion and the Mycroft voice assistance model. At
first the ultrasonic proximity sensor will be measuring the distance between
itself and any obstacle in front of it .When the Proximity sensor detects any
obstacle in front within its specified range, the blind person will hear an
audio prompt about an obstacle in his way at a certain distance. At this time
the Webcam will capture an image in front of it and the Object detection model
and the Optical Character Recognition model will begin to run on the Raspberry
pi. The imat of the blind person. The text and the object detected are conveyed
to the blind pege captured is first sent through the Tesseract OCR module to
detect any texts in the image and then through the Object detection model to
detect the objects in fronrson by converting the texts to speech by using the
gTTS module. Along with the above mentioned process going on there will be an
active MYCROFT voice assistant model which can be used to interact with the
blind person. The blind person can ask about the weather , daily news , any
information on the internet ,etcComment: 23 pages , 16 figure
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Extracting structured information from 2D images
Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines
- …