467 research outputs found

    A Review on Identification of Contextual Similar Sentences

    Get PDF
    The task of identifying contextual similar sentences plays a crucial role in various natural language processing applications such as information retrieval, paraphrase detection, and question answering systems. This paper presents a comprehensive review of the methodologies, techniques, and advancements in the identification of contextual similar sentences. Beginning with an overview of the importance and challenges associated with this task, the paper delves into the various approaches employed, including traditional similarity metrics, deep learning architectures, and transformer-based models. Furthermore, the review explores different datasets and evaluation metrics used to assess the performance of these methods. Additionally, the paper discusses recent trends, emerging research directions, and potential applications in the field. By synthesizing existing literature, this review aims to provide researchers and practitioners with insights into the state-of-the-art techniques and future avenues for advancing the identification of contextual similar sentences

    Revisiting the challenges and surveys in text similarity matching and detection methods

    Get PDF
    The massive amount of information from the internet has revolutionized the field of natural language processing. One of the challenges was estimating the similarity between texts. This has been an open research problem although various studies have proposed new methods over the years. This paper surveyed and traced the primary studies in the field of text similarity. The aim was to give a broad overview of existing issues, applications, and methods of text similarity research. This paper identified four issues and several applications of text similarity matching. It classified current studies based on intrinsic, extrinsic, and hybrid approaches. Then, we identified the methods and classified them into lexical-similarity, syntactic-similarity, semantic-similarity, structural-similarity, and hybrid. Furthermore, this study also analyzed and discussed method improvement, current limitations, and open challenges on this topic for future research directions

    Combining Supervised and Unsupervised Learning to Detect and Semantically Aggregate Crisis-Related Twitter Content

    Get PDF
    Twitter is an immediate and almost ubiquitous platform and therefore can be a valuable source of information during disasters. Current methods for identifying and classifying crisis-related content are often based on single tweets, i.e., already known information from the past is neglected. In this paper, the combination of tweet-wise pre-trained neural networks and unsupervised semantic clustering is proposed and investigated. The intention is to (1) enhance the generalization capability of pre-trained models, (2) to be able to handle massive amounts of stream data, (3) to reduce information overload by identifying potentially crisis-related content, and (4) to obtain a semantically aggregated data representation that allows for further automated, manual and visual analyses. Latent representations of each tweet based on pre-trained sentence embedding models are used for both, clustering and tweet classification. For a fast, robust and time-continuous processing, subsequent time periods are clustered individually according to a Chinese restaurant process. Clusters without any tweet classified as crisis-related are pruned. Data aggregation over time is ensured by merging semantically similar clusters. A comparison of our hybrid method to a similar clustering approach, as well as first quantitative and qualitative results from experiments with two different labeled data sets demonstrate the great potential for crisis-related Twitter stream analyses

    Collective Intelligence in Collaborative Tagging System

    Get PDF
    Recently, a new form of organizing, sharing and finding information, named tagging, has gained importance because its results are the product of the combined efforts of the actual users’ opinions of the information. In this paper, we explore the conceptual model of the del.icio.us tagging system in order to investigate the degree to which the tagging system’s conceptual model reflects the human conceptual knowledge structure at both the population level and the individual level. We use datasets extracted from the del.icio.us system from 2003 to 2007 to obtain the strength of connection among tags, and compare that with data for the association of the same concepts by actual human beings. The results show that, overall, the conceptual model for the del.icio.us tagging system captures human’s notion of concept similarity. Several potential applications are mentioned

    Hierarchical categorisation of web tags for Delicious

    Get PDF
    In the scenario of social bookmarking, a user browsing the Web bookmarks web pages and assigns free-text labels (i.e., tags) to them according to their personal preferences. The benefits of social tagging are clear – tags enhance Web content browsing and search. However, since these tags may be publicly available to any Internet user, a privacy attacker may collect this information and extract an accurate snapshot of users’ interests or user profiles, containing sensitive information, such as health-related information, political preferences, salary or religion. In order to hinder attackers in their efforts to profile users, this report focuses on the practical aspects of capturing user interests from their tagging activity. More accurately, we study how to categorise a collection of tags posted by users in one of the most popular bookmarking services, Delicious (http://delicious.com).Preprin

    Semantically-enhanced image tagging system

    Get PDF
    In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags

    Acta Cybernetica : Volume 18. Number 2.

    Get PDF

    Hierarchical categorisation of tags for delicious

    Get PDF
    In the scenario of social bookmarking, a user browsing the Web bookmarks web pages and assigns free-text labels (i.e., tags) to them according to their personal preferences. In this technical report, we approach one of the practical aspects when it comes to represent users' interests from their tagging activity, namely the categorization of tags into high-level categories of interest. The reason is that the representation of user profiles on the basis of the myriad of tags available on the Web is certainly unfeasible from various practical perspectives; mainly concerning the unavailability of data to reliably, accurately measure interests across such fine-grained categorisation, and, should the data be available, its overwhelming computational intractability. Motivated by this, our study presents the results of a categorization process whereby a collection of tags posted at Delicious #http://delicious.com# are classified into 200 subcategories of interest.Preprin
    • …
    corecore