9,654 research outputs found

    A Rule-based Methodology and Feature-based Methodology for Effect Relation Extraction in Chinese Unstructured Text

    Get PDF
    The Chinese language differs significantly from English, both in lexical representation and grammatical structure. These differences lead to problems in the Chinese NLP, such as word segmentation and flexible syntactic structure. Many conventional methods and approaches in Natural Language Processing (NLP) based on English text are shown to be ineffective when attending to these language specific problems in late-started Chinese NLP. Relation Extraction is an area under NLP, looking to identify semantic relationships between entities in the text. The term “Effect Relation” is introduced in this research to refer to a specific content type of relationship between two entities, where one entity has a certain “effect” on the other entity. In this research project, a case study on Chinese text from Traditional Chinese Medicine (TCM) journal publications is built, to closely examine the forms of Effect Relation in this text domain. This case study targets the effect of a prescription or herb, in treatment of a disease, symptom or body part. A rule-based methodology is introduced in this thesis. It utilises predetermined rules and templates, derived from the characteristics and pattern observed in the dataset. This methodology achieves the F-score of 0.85 in its Named Entity Recognition (NER) module; 0.79 in its Semantic Relationship Extraction (SRE) module; and the overall performance of 0.46. A second methodology taking a feature-based approach is also introduced in this thesis. It views the RE task as a classification problem and utilises mathematical classification model and features consisting of contextual information and rules. It achieves the F-scores of: 0.73 (NER), 0.88 (SRE) and overall performance of 0.41. The role of functional words in the contemporary Chinese language and in relation to the ERs in this research is explored. Functional words have been found to be effective in detecting the complex structure ER entities as rules in the rule-based methodology

    Clothing Co-Parsing by Joint Image Segmentation and Labeling

    Full text link
    This paper aims at developing an integrated system of clothing co-parsing, in order to jointly parse a set of clothing images (unsegmented but annotated with tags) into semantic configurations. We propose a data-driven framework consisting of two phases of inference. The first phase, referred as "image co-segmentation", iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM (E-SVM) technique [23]. In the second phase (i.e. "region co-labeling"), we construct a multi-image graphical model by taking the segmented regions as vertices, and incorporate several contexts of clothing configuration (e.g., item location and mutual interactions). The joint label assignment can be solved using the efficient Graph Cuts algorithm. In addition to evaluate our framework on the Fashionista dataset [30], we construct a dataset called CCP consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89% recognition rate on the Fashionista and the CCP datasets, respectively, which are superior compared with state-of-the-art methods.Comment: 8 pages, 5 figures, CVPR 201

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Sentiment Analysis of Assamese Text Reviews: Supervised Machine Learning Approach with Combined n-gram and TF-IDF Feature

    Get PDF
    Sentiment analysis (SA) is a challenging application of natural language processing (NLP) in various Indian languages. However, there is limited research on sentiment categorization in Assamese texts. This paper investigates sentiment categorization on Assamese textual data using a dataset created by translating Bengali resources into Assamese using Google Translator. The study employs multiple supervised ML methods, including Decision Tree, K-nearest neighbour, Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine, combined with n-gram and Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction methods. The experimental results show that Multinomial Naive Bayes and Support Vector Machine have over 80% accuracy in analyzing sentiments in Assamese texts, while the Unigram model performs better than higher-order n-gram models in both datasets. The proposed model is shown to be an effective tool for sentiment classification in domain-independent Assamese text data

    Sino-African Philosophy: A Re-“Constructive Engagement”

    Get PDF
    “Constructive-Engagement” is a meta-philosophical and meta-methodological “strategy” suggested by Chinese and comparative philosophy scholar Bo Mou for analyzing and enriching philosophical exchange. In this paper, I will use this strategy towards an end, on a scale, and with a topic not attempted before. I will use it as a “template” for redesigning a poorly developing area of cross-cultural comparison I call Sino-African reflective studies (SARS). My goal in this work-in-progress is to design a plan for reconstituting SARS as Sino-African philosophy (SAP), an inclusive yet coherent field of research and innovation unified through organizing principles. I will design the overhaul of SARS in three stages. First, by surveying SARS for its basic features including its structural flaws. Second, by remapping SARS in line with “renovation” principles drawn from its literature. Third, by blueprinting SARS in line with “construction” principles theorized from the constructive-engagement strategy (CES)

    Local Image Patterns for Counterfeit Coin Detection and Automatic Coin Grading

    Get PDF
    Abstract Local Image Patterns for Counterfeit Coin Detection and Automatic Coin Grading Coins are an essential part of our life, and we still use them for everyday transactions. We have always faced the issue of the counterfeiting of the coins, but it has become worse with time due to the innovation in the technology of counterfeiting, making it more difficult for detection. Through this thesis, we propose a counterfeit coin detection method that is robust and applicable to all types of coins, whether they have letters on them or just images or both of these characteristics. We use two different types of feature extraction methods. The first one is SIFT (Scale Invariant Feature transform) features, and the second one is RFR (Rotation and Flipping invariant Regional Binary Patterns) features to make our system complete in all aspects and very generic at the same time. The feature extraction methods used here are scale, rotation, illumination, and flipping invariant. We concatenate both our feature sets and use them to train our classifiers. Our feature sets highly complement each other in a way that SIFT provides us with most discriminative features that are scale and rotation invariant but do not consider the spatial value when we cluster them, and here our second set of features comes into play as it considers the spatial structure of each coin image. We train SVM classifiers with two different sets of features from each image. The method has an accuracy of 99.61% with both high and low-resolution images. We also took pictures of the coins at 90Ëš and 45Ëš angles using the mobile phone camera, to check the robustness of our proposed method, and we achieved promising results even with these low-resolution pictures. Also, we work on the problem of Coin Grading, which is another issue in the field of numismatic studies. Our algorithm proposed above is customized according to the coin grading problem and calculates the coin wear and assigns a grade to it. We can use this grade to remove low-quality coins from the system, which are otherwise sold to coin collectors online for a considerable price. Coin grading is currently done by coin experts manually and is a time consuming and expensive process. We use digital images and apply computer vision and machine learning algorithms to calculate the wear on the coin and then assign it a grade based on its quality level. Our method calculates the amount of wear on coins and assign them a label and achieve an accuracy of 98.5%

    A history and theory of textual event detection and recognition

    Get PDF

    On the persistence of race:Unique skulls and average tissue depths in the practice of forensic craniofacial depiction

    Get PDF
    The (re-)surfacing of race in forensic practices has received plenty of attention from STS scholars, especially in connection with modern forensic genetic technologies. In this article, I describe the making of facial depictions based on the skulls of unknown deceased individuals. Based on ethnographic research in the field of craniofacial identification and forensic art, I present a material-semiotic analysis of how race comes to matter in the face-making process. The analysis sheds light on how race as a translation device enables oscillation between the individual skull and population data, and allows for slippage between categories that otherwise do not neatly map on to one another. The subsuming logic of race is ingrained – in that it sits at the bases of standard choices and tools – in methods and technologies. However, the skull does not easily let itself be reduced to a racial type. Moreover, the careful efforts of practitioners to articulate the individual characteristics of each skull provide clues for how similarities and differences can be done without the effect of producing race. Such methods value the skull itself as an object of interest, rather than treat it as a vehicle for practicing race science. I argue that efforts to undo the persistence of race in forensic anthropology should focus critical attention on the socio-material configuration of methods and technologies, including data practices and reference standards.<br/
    • …
    corecore