87,455 research outputs found

    Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014

    Get PDF
    Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and suffers from human bias. We present a semi-automatic transfer topic labeling method that seeks to remedy these problems. Domain-specific codebooks form the knowledge-base for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches 1935-2014, using the coding instructions of the Comparative Agendas Project to label topics. We show that our method works well for a majority of the topics we estimate; but we also find that institution-specific topics, in particular on subnational governance, require manual input. We validate our results using human expert coding

    Open Directory Project based universal taxonomy for Personalization of Online (Re)sources

    Get PDF
    Content personalization reflects the ability of content classification into (predefined) thematic units or information domains. Content nodes in a single thematic unit are related to a greater or lesser extent. An existing connection between two available content nodes assumes that the user will be interested in both resources (but not necessarily to the same extent). Such a connection (and its value) can be established through the process of automatic content classification and labeling. One approach for the classification of content nodes is the use of a predefined classification taxonomy. With the help of such classification taxonomy it is possible to automatically classify and label existing content nodes as well as create additional descriptors for future use in content personalization and recommendation systems. For these purposes existing web directories can be used in creating a universal, purely content based, classification taxonomy. This work analyzes Open Directory Project (ODP) web directory and proposes a novel use of its structure and content as the basis for such a classification taxonomy. The goal of a unified classification taxonomy is to allow for content personalization from heterogeneous sources. In this work we focus on the overall quality of ODP as the basis for such a classification taxonomy and the use of its hierarchical structure for automatic labeling. Due to the structure of data in ODP different grouping schemes are devised and tested to find the optimal content and structure combination for a proposed classification taxonomy as well as automatic labeling processes. The results provide an in-depth analysis of ODP and ODP based content classification and automatic labeling models. Although the use of ODP is well documented, this question has not been answered to date

    Explicit fuzzy modeling of shapes and positioning for handwritten Chinese character recognition

    No full text
    International audienceIn this paper, we present a new method for on-line Chinese character recognition that relies on an explicit description of characters structure. Contrary to most of known structural approaches, this model can describe characters written in a fluent style, thanks to a flexible fuzzy modeling of shapes and positioning of their structural components (primitives and radicals). We designed a process for incremental training of the models cooperated with automatic structural labeling for minimizing the required manual task in model design. First experiments show that the method is able to recognize non-regularly written characters and has a convincing generalization ability

    Automated Identification of Wood Surface Defects Based on Deep Learning

    Get PDF
    Wood plates are widely used in the interior design of houses primarily for their aesthetic value. However, considering its esthetical values, surface defect detection is necessary. The development of computer vision and CNN-based object detection methods has opened the way for wood surface defect detection process automation. This paper investigates deep-learning applications for automatic wood surface defect detection. It includes the evaluation of deep learning algorithms, including data generation and labeling, preprocessing, model training, and evaluation. Many adjustments regarding the dataset size, the model, and the modification of the neural network were made to evaluate the model's performance in the specified challenge. The results indicate that modifications can increase the YOLOv5s performance in detection. The model with GCNet added and trained in 4800 images has achieved 88.1% of mAP. The paper also evaluates the time performance of models based on different GPU units. The results show that in A100 40GB GPU, the maximum time to process a wood plate is 2.2 seconds. Finally, an Active learning approach for the continual increase in performance while detecting with the smaller size of manual labeling has been implemented. After detecting 500 images in 5 cycles, the model achieved 98.8% of mAP. This scientific paper concludes that YOLOv5s modified model is suitable for wood surface defect detection. It can perform with high accuracy in real time. Moreover, applying the active learning approach can facilitate the labeling process by increasing the performance during detection

    Annotating Object Instances with a Polygon-RNN

    Full text link
    We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. In particular, our approach takes as input an image crop and sequentially produces vertices of the polygon outlining the object. This allows a human annotator to interfere at any time and correct a vertex if needed, producing as accurate segmentation as desired by the annotator. We show that our approach speeds up the annotation process by a factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We further show generalization capabilities of our approach to unseen datasets
    corecore