3 research outputs found

    Using Fuzzy Set Similarity in Sentence Similarity Measures

    Get PDF
    Sentence similarity measures the similarity between two blocks of text. A semantic similarity measure between individual pairs of words, each taken from the two blocks of text, has been used in STASIS. Word similarity is measured based on the distance between the words in the WordNet ontology. If the vague words, referred to as fuzzy words, are not found in WordNet, their semantic similarity cannot be used in the sentence similarity measure. FAST and FUSE transform these vague words into fuzzy set representations, type-1 and type-2 respectively, to create ontological structures where the same semantic similarity measure used in WordNet can then be used. This paper investigates eliminating the process of building an ontology with the fuzzy words and instead directly using fuzzy set similarity measures between the fuzzy words in the task of sentence similarity measurement. Their performance is evaluated based on their correlation with human judgments of sentence similarity. In addition, statistical tests showed there is not any significant difference in the sentence similarity values produced using fuzzy set similarity measures between fuzzy sets representing fuzzy words and using FAST semantic similarity within ontologies representing fuzzy words

    Persistent semantic identity in WordNet

    Get PDF
    Persistent semantic identity in WordNet Although rarely studied, the persistence of semantic identity in the WordNet lexical database is crucial for the interoperability of all the resources that use WordNet data. The present study investigates the stability of the two primary entities of the WordNet database (the word senses and the synonym sets), by following their respective identifiers (the sense keys and the synset offsets) across all the versions released between 1995 and 2012, while also considering "drifts" of identical definitions and semantic relations. Contrary to expectations, 94.4% of the WordNet 1.5 synsets still persisted in the latest 2012 version, compared to only 89.1% of the corresponding sense keys. Meanwhile, the splits and merges between synonym sets remained few and simple. These results are presented in tables that allow to estimate the lexicographic effort needed for updating WordNet-based resources to newer WordNet versions. We discuss the specific challenges faced by both the dominant synset-based mapping paradigm (a moderate amount of split synsets), and the recommended sense key-based approach (very few identity violations), and conclude that stable synset identifiers are viable, but need to be complemented by stable sense keys in order to adequately handle the split synonym sets.   Trwała tożsamość semantyczna w WordNecie Chociaż rzadko badana, trwałość tożsamości semantycznej w leksykalnej bazie danych WordNet ma kluczowe znaczenie dla interoperacyjności wszystkich zasobów korzystających z danych WordNetowych. W niniejszej pracy zbadano stabilność dwóch podstawowych elementów bazy danych WordNet (jednostek leksykalnych i synsetów – zbiorów synonimicznych jednostek leksykalnych), poprzez prześledzenie ich identyfikatorów (tj. identyfikatorów jednostek i identyfikatorów synsetów) we wszystkich wersjach wydanych w latach 1995-2012. Wzięto również pod uwagę przesunięcia identycznych definicji i relacji semantycznych. Wbrew oczekiwaniom, 94,4% synsetów WordNetu 1.5 zachowało się w najnowszej wersji z 2012 r., w porównaniu do 89,1% odpowiadających im identyfikatorów jednostek. Tymczasem podziały i połączenia pomiędzy synsetami pozostały proste i nieliczne. Wyniki te przedstawiono w tabelach, które pozwalają oszacować wysiłek leksykograficzny potrzebny do aktualizacji zasobów opartych o WordNet do nowszych wersji WordNetu. Omawiamy konkretne wyzwania, przed którymi stoi zarówno dominujący paradygmat rzutowania synsetów (umiarkowana liczba podzielonych synsetów), jak i zalecane podejście oparte na identyfikatorach jednostek (bardzo niewiele naruszeń tożsamości) i stwierdzamy, że można stworzyć stabilne identyfikatory synsetów, ale muszą one iść w parze ze stabilnymi identyfikatorami jednostek, aby odpowiednio zająć się podzielonymi synsetami

    Fuzzy natural language similarity measures through computing with words

    Get PDF
    A vibrant area of research is the understanding of human language by machines to engage in conversation with humans to achieve set goals. Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, typically used in everyday human natural language dialogue and often ambiguous and vague in meaning and dependent on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain fuzzy words and return a numeric measure of similarity of meaning between them. The motivation for this research is to create a new FSSM called FUSE (FUzzy Similarity mEasure). FUSE is an ontology-based similarity measure that uses Interval Type-2 Fuzzy Sets to model relationships between categories of human perception-based words. Four versions of FUSE (FUSE_1.0 – FUSE_4.0) have been developed, investigating the presence of linguistic hedges, the expansion of fuzzy categories and their use in natural language, incorporating logical operators such as ‘not’ and the introduction of the fuzzy influence factor. FUSE has been compared to several state-of-the-art, traditional semantic similarity measures (SSM’s) which do not consider the presence of fuzzy words. FUSE has also been compared to the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 Fuzzy Sets to model relationships between categories of human perception-based words. Results have shown FUSE is able to improve on the limitations of traditional SSM’s and the FAST algorithm by achieving a higher correlation with the average human rating (AHR) compared to traditional SSM’s and FAST using several published and gold-standard datasets. To validate FUSE, in the context of a real-world application, versions of the algorithm were incorporated into a simple Question & Answer (Q&A) dialogue system (DS), referred to as FUSION, to evaluate the improvement of natural language understanding. FUSION was tested on two different scenarios using human participants and results compared to a traditional SSM known as STASIS. Results of the DS experiments showed a True rating of 88.65% compared to STASIS with an average True rating of 61.36%. Results showed that the FUSE algorithm can be used within real world applications and evaluation of the DS showed an improvement of natural language understanding, allowing semantic similarity to be calculated more accurately from natural user responses. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets; leading to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words with other fuzzy applications such as semantic clustering. The development of a FSSM known as FUSE, which was expanded over four versions, investigating the incorporation of linguistic hedges, the expansion of fuzzy categories and their use in natural language, inclusion of logical operators such as ‘not’ and the introduction of the fuzzy influence factor. Integration of the FUSE algorithm into a simple Q&A DS referred to as FUSION demonstrated that FSSM can be used in a real-world practical implementation, therefore making FUSE and its fuzzy dictionary generalisable to other applications
    corecore