586 research outputs found

    A Text Classifier Based on Sentence Category VSM

    Get PDF
    PACLIC 20 / Wuhan, China / 1-3 November, 200

    Patent Analytics Based on Feature Vector Space Model: A Case of IoT

    Full text link
    The number of approved patents worldwide increases rapidly each year, which requires new patent analytics to efficiently mine the valuable information attached to these patents. Vector space model (VSM) represents documents as high-dimensional vectors, where each dimension corresponds to a unique term. While originally proposed for information retrieval systems, VSM has also seen wide applications in patent analytics, and used as a fundamental tool to map patent documents to structured data. However, VSM method suffers from several limitations when applied to patent analysis tasks, such as loss of sentence-level semantics and curse-of-dimensionality problems. In order to address the above limitations, we propose a patent analytics based on feature vector space model (FVSM), where the FVSM is constructed by mapping patent documents to feature vectors extracted by convolutional neural networks (CNN). The applications of FVSM for three typical patent analysis tasks, i.e., patents similarity comparison, patent clustering, and patent map generation are discussed. A case study using patents related to Internet of Things (IoT) technology is illustrated to demonstrate the performance and effectiveness of FVSM. The proposed FVSM can be adopted by other patent analysis studies to replace VSM, based on which various big data learning tasks can be performed

    Topic modeling for entity linking using keyphrase

    Get PDF
    This paper proposes an Entity Linking system that applies a topic modeling ranking. We apply a novel approach in order to provide new relevant elements to the model. These elements are keyphrases related to the queries and gathered from a huge Wikipedia-based knowledge resourcePeer ReviewedPostprint (author’s final draft

    DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks

    Get PDF
    We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives of INEX ad-hoc topics is shown to improve retrieval eectiveness

    Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques

    Get PDF
    The text document classification tasks passes under the Automatic Classification (also known as pattern Recognition) problem in Machine Learning and Text Mining. It is necessary to classify large text documents into specific classes, to make clear and search simply. Classified data are easy for users to browse. The important issue in usual text document classification is representing the features for classification of an unknown document into predefined categories. The Combination of classifiers is fused together to increase the accuracy classification result in a single text document. This paper states a novel fusion approach to classify text documents by considering ES-VSM and Bigram representation models for text documents. ES-VSM: Enhanced Sentence –Vector Space Model is an advanced feature of the sentence based vector space model and extension to simple VSM will be considered for the constructive representation of text documents. The main objective of the study is to boost the accuracy of text classification by accounting for the features extracted from the text document. The proposed system concatenates two different representation models of the text documents for designing two different classifiers and feeds them as one input to the classifier. An enhanced S-VSM and interval-valued representation model are considered for the effective representation of text documents. A word level neural network Bigram representation of text documents is proposed for effective capturing of semantic information present in the text data. A Proposed approach improves the overall accuracy of text document classification to a significant extent. Keywords: ES-VSM; Fusion, Text Document Classification, Neural Network, Text Representation, Machine learning. DOI: 10.7176/NMMC/93-03 Publication date:September 30th 2020

    Deep Multimodal Image-Repurposing Detection

    Full text link
    Nefarious actors on social media and other platforms often spread rumors and falsehoods through images whose metadata (e.g., captions) have been modified to provide visual substantiation of the rumor/falsehood. This type of modification is referred to as image repurposing, in which often an unmanipulated image is published along with incorrect or manipulated metadata to serve the actor's ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR) dataset, a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr. We also present a novel, end-to-end, deep multimodal learning model for assessing the integrity of an image by combining information extracted from the image with related information from a knowledge base. The proposed method is compared against state-of-the-art techniques on existing datasets as well as MEIR, where it outperforms existing methods across the board, with AUC improvement up to 0.23.Comment: To be published at ACM Multimeda 2018 (orals
    corecore