300 research outputs found

    Measuring Short Text Semantic Similarity with Deep Learning Models

    Get PDF
    Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken, which is a subfield of artificial intelligence (AI). The development of NLP applications is challenging because computers traditionally require humans to speak" to them in a programming language that is precise, unambiguous and highly structured, or through a limited number of clearly enunciated voice commands. We study the use of deep learning models, the state-of-the-art artificial intelligence (AI) method, for the problem of measuring short text semantic similarity in NLP area. In particular, we propose a novel deep neural network architecture to identify semantic similarity for pairs of question sentence. In the proposed network, multiple channels of knowledge for pairs of question text can be utilized to improve the representation of text. Then a dense layer is used to learn a classifier for classifying duplicated question pairs. Through extensive experiments on the Quora test collection, our proposed approach has shown remarkable and significant improvement over strong baselines, which verifies the effectiveness of the deep models as well as the proposed deep multi-channel framework

    MultiSiam: A Multiple Input Siamese Network For Social Media Text Classification And Duplicate Text Detection

    Full text link
    Social media accounts post increasingly similar content, creating a chaotic experience across platforms, which makes accessing desired information difficult. These posts can be organized by categorizing and grouping duplicates across social handles and accounts. There can be more than one duplicate of a post, however, a conventional Siamese neural network only considers a pair of inputs for duplicate text detection. In this paper, we first propose a multiple-input Siamese network, MultiSiam. This condensed network is then used to propose another model, SMCD (Social Media Classification and Duplication Model) to perform both duplicate text grouping and categorization. The MultiSiam network, just like the Siamese, can be used in multiple applications by changing the sub-network appropriately

    Paraphrase Detection Using Manhattan's Recurrent Neural Networks and Long Short-Term Memory

    Get PDF
    Natural Language Processing (NLP) is a part of artificial intelligence that can extract sentence structures from natural language. Some discussions about NLP are widely used, such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) to summarize papers with many sentences in them. Siamese Similarity is a term that applies repetitive twin network architecture to machine learning for sentence similarity. This architecture is also called Manhattan LSTM, which can be applied to the case of detecting paraphrase sentences. The paraphrase sentence must be recognized by machine learning first. Word2vec is used to convert sentences to vectors so they can be recognized in machine learning. This research has developed paraphrase sentence detection using Siamese Similarity with word2vec embedding. The experimental results showed that the amount of training data is dominant to the new data compared to the number of times and the variation in training data. Obtained data accuracy, 800,000 pairs provide accuracy reaching 99% of training data and 82.4% of new data. These results are better than the accuracy of the new data, with half of the training data only yielding 64%. While the amount of training data did not effect on training data

    Neural Paraphrase Identification of Questions with Noisy Pretraining

    Full text link
    We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (Parikh et al., 2016) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset

    Advancing duplicate question detection with deep learning

    Get PDF

    An Inclusive Report on Robust Malware Detection and Analysis for Cross-Version Binary Code Optimizations

    Get PDF
    Numerous practices exist for binary code similarity detection (BCSD), such as Control Flow Graph, Semantics Scrutiny, Code Obfuscation, Malware Detection and Analysis, vulnerability search, etc. On the basis of professional knowledge, existing solutions often compare particular syntactic aspects retrieved from binary code. They either have substantial performance overheads or have inaccurate detection. Furthermore, there aren't many tools available for comparing cross-version binaries, which may differ not only in programming with proper syntax but also marginally in semantics. This Binary code similarity detection is existing for past 10 years, but this research area is not yet systematically analysed. The paper presents a comprehensive analysis on existing Cross-version Binary Code Optimization techniques on four characteristics: 1. Structural analysis, 2. Semantic Analysis, 3. Syntactic Analysis, 4. Validation Metrics.  It helps the researchers to best select the suitable tool for their necessary implementation on binary code analysis. Furthermore, this paper presents scope of the area along with future directions of the research
    • …
    corecore