Search CORE

5 research outputs found

Semantic retrieval of trademarks based on conceptual similarity

Author: Anuar Fatahiyah Mohd
Lai Yu-Kun
Setchi Rossitza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2016
Field of study

Trademarks are signs of high reputational value. Thus, they require protection. This paper studies conceptual similarities between trademarks, which occurs when two or more trademarks evoke identical or analogous semantic content. This paper advances the state-of-the-art by proposing a computational approach based on semantics that can be used to compare trademarks for conceptual similarity. A trademark retrieval algorithm is developed that employs natural language processing techniques and an external knowledge source in the form of a lexical ontology. The search and indexing technique developed uses similarity distance, which is derived using Tversky's theory of similarity. The proposed retrieval algorithm is validated using two resources: a trademark database of 1400 disputed cases and a database of 378,943 company names. The accuracy of the algorithm is estimated using measures from two different domains: the R-precision score, which is commonly used in information retrieval and human judgment/collective human opinion, which is used in human-machine systems

Crossref

Online Research @ Cardiff

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

Author: Adhikari
Agirre
Al-Mubaid
Ana García-Serrano
Aouicha
Ashburner
Baker
Banerjee
Banjade
Batet
Batet
Batet
Batet
Batet
Batet
Ben Aouicha
Ben Aouicha
Blanchard
Botsch
Budanitsky
Castellanos
Castellanos
Castells
Chaves-González
Chen
Chirigati
Chirigati
Couto
Couto
Cross
Dagher
de Berg
Dijkman
Editorial
Fernando
Fernando Chirigati
Fokkens
Fähndrich
Gao
Garla
Grego
Guzzi
Hadj Taieb
Hadj Taieb
Hadj Taieb
Hao
Harispe
Harispe
Harispe
Harispe
Hill
Hirst
Jiang
Jiang
Juan J. Lastra-Díaz
Kyogoku
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Leacock
Lee
Leopold
Leopold
Li
Lin
Liu
Lord
Mandreoli
Martinez-Gil
Martínez
Mazandu
McInnes
McInnes
Mehlhorn
Mendling
Meng
Meng
Meng
Merkel
Meymandpour
Mihalcea
Miller
Miller
Miriam Fernández
Montani
Montserrat Batet
Munafò
Oliva
Patwardhan
Patwardhan
Pedersen
Pedersen
Pedersen
Pedersen
Pedersen
Pedersen
Pekar
Pesquita
Petrakis
Pirró
Pirró
Pirró
Pothos
Rada
Resnik
Resnik
Rodríguez
Rubenstein
Schlicker
Schlicker
Sebti
Seco
Seddiqui
Shima
Stanchev
Stojanovic
Sánchez
Sánchez
Sánchez
Sánchez
Tversky
Van Miltenburg
Vrandečić
Wolke
Wolke
Wu
Wu
Yuan
Zhang
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in (2015, 2016) [56–58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Open Research Online (The Open University)

The Oberta in open access

Trade mark similarity assessment support system

Author: Mohd Anuar Fatahiyah
Publication venue
Publication date
Field of study

Trade marks are valuable intangible intellectual property (IP) assets with potentially high reputational value that can be protected. Similarity between trade marks may potentially lead to infringement. That similarity is normally assessed based on the visual, conceptual and phonetic aspects of the trade marks in question. Hence, this thesis addresses this issue by proposing a trade mark similarity assessment support system that uses the three main aspects of trade mark similarity as a mechanism to avoid future infringement. A conceptual model of the proposed trade mark similarity assessment support system is first proposed and developed based on the similarity assessment criteria outlined in a trade mark manual. The proposed model is the first contribution of this study, and it consists of visual, conceptual, phonetic and inference engine modules. The second contribution of this work is an algorithm that compares trade marks based on their visual similarity. The algorithm performs a similarity assessment using content-based image retrieval (CBIR) technology and an integrated visual descriptor derived using the low-level image feature, i.e. the shape feature. The performance of the algorithm is then assessed using information retrieval based measures. The obtained result demonstrates better retrieval performance in comparison to the state of the art algorithm. The conceptual aspect of trade mark similarity is then examined and analysed using a proposed algorithm that employs semantic technology in the conceptual module. This contribution enables the computation of the conceptual similarity between trade marks, with the utilisation of an external knowledge source in the form of a lexical ontology, together with natural language processing and set similarity theory. The proposed algorithm is evaluated using both information VI retrieval and human collective opinion measures. The retrieval result produced by the proposed algorithm outperforms the traditional string similarity comparison algorithm in both measures. The phonetic module examines the phonetic similarity of trade marks using another proposed algorithm that utilises phoneme analysis. This algorithm employs phonological features, which are extracted based on human speech articulation. In addition, the algorithm also provides a mechanism to compare the phonetic aspect of trade marks with typographic characters. The proposed algorithm is the fourth contribution of this study. It is evaluated using an information retrieval based measure. The result shows better retrieval performance in comparison to the traditional string similarity algorithm. The final contribution of this study is a methodology to aggregate the overall similarity score between trade marks. It is motivated by the understanding that trade mark similarity should be assessed holistically; that is, the visual, conceptual and phonetic aspects should be considered together. The proposed method is developed in the inference engine module; it utilises fuzzy logic for the inference process. A set of fuzzy rules, which consists of several membership functions, is also derived in this study based on the trade mark manual and a collection of trade mark disputed cases is analysed. The method is then evaluated using both information retrieval and human collective opinion. The proposed method improves the retrieval accuracy and the experiment also proves that the aggregated similarity score correlates well with the score produced from human collective opinion. The evaluations performed in the course of this study employ the following datasets: the MPEG-7 shape dataset, the MPEG-7 trade marks dataset, a collection of 1400 trade marks from real trade mark dispute cases, and a collection of 378,943 company names

Online Research @ Cardiff

Fuzzy natural language similarity measures through computing with words

Author: Adel Naeemeh
Publication venue
Publication date: 26/08/2022
Field of study

A vibrant area of research is the understanding of human language by machines to engage in conversation with humans to achieve set goals. Human language is naturally fuzzy by nature, with words meaning different things to different people, depending on the context. Fuzzy words are words with a subjective meaning, typically used in everyday human natural language dialogue and often ambiguous and vague in meaning and dependent on an individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can compare two or more short texts which contain fuzzy words and return a numeric measure of similarity of meaning between them. The motivation for this research is to create a new FSSM called FUSE (FUzzy Similarity mEasure). FUSE is an ontology-based similarity measure that uses Interval Type-2 Fuzzy Sets to model relationships between categories of human perception-based words. Four versions of FUSE (FUSE_1.0 – FUSE_4.0) have been developed, investigating the presence of linguistic hedges, the expansion of fuzzy categories and their use in natural language, incorporating logical operators such as ‘not’ and the introduction of the fuzzy influence factor. FUSE has been compared to several state-of-the-art, traditional semantic similarity measures (SSM’s) which do not consider the presence of fuzzy words. FUSE has also been compared to the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited dictionary of fuzzy words and uses Type-1 Fuzzy Sets to model relationships between categories of human perception-based words. Results have shown FUSE is able to improve on the limitations of traditional SSM’s and the FAST algorithm by achieving a higher correlation with the average human rating (AHR) compared to traditional SSM’s and FAST using several published and gold-standard datasets. To validate FUSE, in the context of a real-world application, versions of the algorithm were incorporated into a simple Question & Answer (Q&A) dialogue system (DS), referred to as FUSION, to evaluate the improvement of natural language understanding. FUSION was tested on two different scenarios using human participants and results compared to a traditional SSM known as STASIS. Results of the DS experiments showed a True rating of 88.65% compared to STASIS with an average True rating of 61.36%. Results showed that the FUSE algorithm can be used within real world applications and evaluation of the DS showed an improvement of natural language understanding, allowing semantic similarity to be calculated more accurately from natural user responses. The key contributions of this work can be summarised as follows: The development of a new methodology to model fuzzy words using Interval Type-2 fuzzy sets; leading to the creation of a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other researchers in the field of natural language processing and Computing with Words with other fuzzy applications such as semantic clustering. The development of a FSSM known as FUSE, which was expanded over four versions, investigating the incorporation of linguistic hedges, the expansion of fuzzy categories and their use in natural language, inclusion of logical operators such as ‘not’ and the introduction of the fuzzy influence factor. Integration of the FUSE algorithm into a simple Q&A DS referred to as FUSION demonstrated that FSSM can be used in a real-world practical implementation, therefore making FUSE and its fuzzy dictionary generalisable to other applications

E-space: Manchester Metropolitan University's Research Repository