1,022 research outputs found

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div

    Modular Networks: Learning to Decompose Neural Computation

    Get PDF
    Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end. In contrast to existing approaches, training does not rely on regularization to enforce diversity in module use. We apply modular networks both to image recognition and language modeling tasks, where we achieve superior performance compared to several baselines. Introspection reveals that modules specialize in interpretable contexts.Comment: NIPS 201

    Deep learning methods for knowledge base population

    Get PDF
    Knowledge bases store structured information about entities or concepts of the world and can be used in various applications, such as information retrieval or question answering. A major drawback of existing knowledge bases is their incompleteness. In this thesis, we explore deep learning methods for automatically populating them from text, addressing the following tasks: slot filling, uncertainty detection and type-aware relation extraction. Slot filling aims at extracting information about entities from a large text corpus. The Text Analysis Conference yearly provides new evaluation data in the context of an international shared task. We develop a modular system to address this challenge. It was one of the top-ranked systems in the shared task evaluations in 2015. For its slot filler classification module, we propose contextCNN, a convolutional neural network based on context splitting. It improves the performance of the slot filling system by 5.0% micro and 2.9% macro F1. To train our binary and multiclass classification models, we create a dataset using distant supervision and reduce the number of noisy labels with a self-training strategy. For model optimization and evaluation, we automatically extract a labeled benchmark for slot filler classification from the manual shared task assessments from 2012-2014. We show that results on this benchmark are correlated with slot filling pipeline results with a Pearson's correlation coefficient of 0.89 (0.82) on data from 2013 (2014). The combination of patterns, support vector machines and contextCNN achieves the best results on the benchmark with a micro (macro) F1 of 51% (53%) on test. Finally, we analyze the results of the slot filling pipeline and the impact of its components. For knowledge base population, it is essential to assess the factuality of the statements extracted from text. From the sentence "Obama was rumored to be born in Kenya", a system should not conclude that Kenya is the place of birth of Obama. Therefore, we address uncertainty detection in the second part of this thesis. We investigate attention-based models and make a first attempt to systematize the attention design space. Moreover, we propose novel attention variants: External attention, which incorporates an external knowledge source, k-max average attention, which only considers the vectors with the k maximum attention weights, and sequence-preserving attention, which allows to maintain order information. Our convolutional neural network with external k-max average attention sets the new state of the art on a Wikipedia benchmark dataset with an F1 score of 68%. To the best of our knowledge, we are the first to integrate an uncertainty detection component into a slot filling pipeline. It improves precision by 1.4% and micro F1 by 0.4%. In the last part of the thesis, we investigate type-aware relation extraction with neural networks. We compare different models for joint entity and relation classification: pipeline models, jointly trained models and globally normalized models based on structured prediction. First, we show that using entity class prediction scores instead of binary decisions helps relation classification. Second, joint training clearly outperforms pipeline models on a large-scale distantly supervised dataset with fine-grained entity classes. It improves the area under the precision-recall curve from 0.53 to 0.66. Third, we propose a model with a structured prediction output layer, which globally normalizes the score of a triple consisting of the classes of two entities and the relation between them. It improves relation extraction results by 4.4% F1 on a manually labeled benchmark dataset. Our analysis shows that the model learns correct correlations between entity and relation classes. Finally, we are the first to use neural networks for joint entity and relation classification in a slot filling pipeline. The jointly trained model achieves the best micro F1 score with a score of 22% while the neural structured prediction model performs best in terms of macro F1 with a score of 25%

    Automated Assessment of the Aftermath of Typhoons Using Social Media Texts

    Full text link
    Disasters are one of the major threats to economics and human societies, causing substantial losses of human lives, properties and infrastructures. It has been our persistent endeavors to understand, prevent and reduce such disasters, and the popularization of social media is offering new opportunities to enhance disaster management in a crowd-sourcing approach. However, social media data is also characterized by its undue brevity, intense noise, and informality of language. The existing literature has not completely addressed these disadvantages, otherwise vast manual efforts are devoted to tackling these problems. The major focus of this research is on constructing a holistic framework to exploit social media data in typhoon damage assessment. The scope of this research covers data collection, relevance classification, location extraction and damage assessment while assorted approaches are utilized to overcome the disadvantages of social media data. Moreover, a semi-supervised or unsupervised approach is prioritized in forming the framework to minimize manual intervention. In data collection, query expansion strategy is adopted to optimize the search recall of typhoon-relevant information retrieval. Multiple filtering strategies are developed to screen the keywords and maintain the relevance to search topics in the keyword updates. A classifier based on a convolutional neural network is presented for relevance classification, with hashtags and word clusters as extra input channels to augment the information. In location extraction, a model is constructed by integrating Bidirectional Long Short-Time Memory and Conditional Random Fields. Feature noise correction layers and label smoothing are leveraged to handle the noisy training data. Finally, a multi-instance multi-label classifier identifies the damage relations in four categories, and the damage categories of a message are integrated with the damage descriptions score to obtain damage severity score for the message. A case study is conducted to verify the effectiveness of the framework. The outcomes indicate that the approaches and models developed in this study significantly improve in the classification of social media texts especially under the framework of semi-supervised or unsupervised learning. Moreover, the results of damage assessment from social media data are remarkably consistent with the official statistics, which demonstrates the practicality of the proposed damage scoring scheme

    Resolving pronominal anaphora using commonsense knowledge

    Get PDF
    Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web

    Named Entity Recognition in Chinese Clinical Text

    Get PDF
    Objective: Named entity recognition (NER) is one of the fundamental tasks in natural language processing (NLP). In the medical domain, there have been a number of studies on NER in English clinical notes; however, very limited NER research has been done on clinical notes written in Chinese. The goal of this study is to develop corpora, methods, and systems for NER in Chinese clinical text. Materials and methods: To study entities in Chinese clinical text, we started with building annotated clinical corpora in Chinese. We developed an NER annotation guideline in Chinese by extending the one used in the 2010 i2b2 NLP challenge. We randomly selected 400 admission notes and 400 discharge summaries from Peking Union Medical College Hospital (PUMCH) in China. For each note, four types of entities including clinical problems, procedures, labs, and medications were annotated according to the developed guideline. In addition, an annotation tool was developed to assist two MD students to annotate Chinese clinical documents. A comparison of entity distribution between Chinese and English clinical notes (646 English and 400 Chinese discharge summaries) was performed using the annotated corpora, to identify the important features for NER. In the NER study, two-thirds of the 400 notes were used for training the NER systems and one-third were used for testing. We investigated the effects of different types of features including bag-of-characters, word segmentation, part-of-speech, and section information, with different machine learning (ML) algorithms including Conditional Random Fields (CRF), Support Vector Machines (SVM), Maximum Entropy (ME), and Structural Support Vector Machines (SSVM) on the Chinese clinical NER task. All classifiers were trained on the training dataset, evaluated on the test set, and microaveraged precision, recall, and F-measure were reported. Results: Our evaluation on the independent test set showed that most types of features were beneficial to Chinese NER systems, although the improvements were limited. By combining word segmentation and section information, the system achieved the highest performance, indicating that these two types of features are complementary to each other. When the same types of optimized features were used, CRF and SSVM outperformed SVM and ME. More specifically, SSVM reached the highest performance among the four algorithms, with F-measures of 93.51% and 90.01% for admission notes and discharge summaries respectively. Conclusions: In this study, we created large annotated datasets of Chinese admission notes and discharge summaries and then systematically evaluated different types of features (e.g., syntactic, semantic, and segmentation information) and four ML algorithms including CRF, SVM, SSVM, and ME for clinical NER in Chinese. To the best of our knowledge, this is one of the earliest comprehensive effort in Chinese clinical NER research and we believe it will provide valuable insights to NLP research in Chinese clinical text. Our results suggest that both word segmentation and section information improves NER in Chinese clinical text, and SSVM, a recent sequential labelling algorithm, outperformed CRF and other classification algorithms. Our best system achieved F-measures of 90.01% and 93.52% on Chinese discharge summaries and admission notes, respectively, indicating a promising start on Chinese NLP research

    End-to-end named entity recognition for spoken Finnish

    Get PDF
    Named entity recognition is a natural language processing task in which the system tries to find named entities and classify them in predefined categories. The categories can vary, depending on the domain in which they are going to be used but some of the most common include: person, location, organization, date and product. Named entity recognition is an integral part of other large natural language processing tasks, such as information retrieval, text summarization, machine translation, and question answering. Doing named entity recognition is a difficult task due to the lack of annotated data for certain languages or domains. Named entity ambiguity is another challenging aspect that arises when doing named entity recognition. Often times, a word can represent a person, organization, product, or any other category, depending on the context it appears in. Spoken data, which can be the output of a speech recognition system, imposes additional challenges to the named entity recognition system. Named entities are often capitalized and the system learns to rely on capitalization in order to detect the entities, which is neglected in the speech recognition output. The standard way of doing named entity recognition from speech involves a pipeline approach of two systems. First, a speech recognition system transcribes the speech and generates the transcripts, after which a named entity recognition system annotates the transcripts with the named entities. Since the speech recognition system is not perfect and makes errors, those errors are propagated to the named entity recognition system, which is hard to recover from. In this thesis, we present two approaches of doing named entity recognition from Finnish speech in an end-to-and manner, where one system generates the transcripts and the annotations. We will explore the strengths and weaknesses of both approaches and see how they compare to the standard pipeline approach
    • …
    corecore