16,112 research outputs found
Analysis of a high-resolution hand-written digits data set with writer characteristics
The contributions in this article are two-fold. First, we introduce a new
hand-written digit data set that we collected. It contains high-resolution
images of hand-written digits together with various writer characteristics
which are not available in the well-known MNIST database. The data set is
publicly available and is designed to create new research opportunities.
Second, we perform a first analysis of this new data set. We begin with simple
supervised tasks. We assess the predictability of the writer characteristics
gathered, the effect of using some of those characteristics as predictors in
classification task and the effect of higher resolution images on
classification accuracy. We also explore semi-supervised applications; we can
leverage the high quantity of hand-written digits data sets already existing
online to improve the accuracy of various classifications task with noticeable
success. Finally, we also demonstrate the generative perspective offered by
this new data set; we are able to generate images that mimics the writing style
of specific writers. The data set provides new research opportunities and our
analysis establishes benchmarks and showcases some of the new opportunities
made possible with this new data set.Comment: Data set available here :
https://drive.google.com/drive/folders/1f2o1kjXLvcxRgtmMMuDkA2PQ5Zato4Or?usp=sharin
Sequence-to-Sequence Contrastive Learning for Text Recognition
We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
Detecting the Authors of Texts by Neural Network Committee Machines
This paper proposes a means of using a boosting by filtering algorithm in artificial neural networks to identify the author of a text. This approach involves filtering the training examples by different versions of a weak learning algorithm. It assures the availability of a large source of examples, with the examples being either discarded or kept during training. An advantage of this approach is that it allows for a small memory requirement. Once the network has been trained, its hidden layer activations are recorded as a representation of the selected lexical descriptors of an author. This stored information can then be used to identify the texts written by the same author. Texts studied are literary works of two Bosnian writers, Ivo Andrić (1892-1975) and M. Meša Selimović (1910-1982). The data collected by counting syntactic characteristics in 1466 paragraphs of "na drini ćupria" by Ivo Andrić, and "derviš i smirt" by M. Meša Selimović each
- …