5 research outputs found
Semantic retrieval of trademarks based on conceptual similarity
Trademarks are signs of high reputational value. Thus, they require protection. This paper studies conceptual similarities between trademarks, which occurs when two or more trademarks evoke identical or analogous semantic content. This paper advances the state-of-the-art by proposing a computational approach based on semantics that can be used to compare trademarks for conceptual similarity. A trademark retrieval algorithm is developed that employs natural language processing techniques and an external knowledge source in the form of a lexical ontology. The search and indexing technique developed uses similarity distance, which is derived using Tversky's theory of similarity. The proposed retrieval algorithm is validated using two resources: a trademark database of 1400 disputed cases and a database of 378,943 company names. The accuracy of the algorithm is estimated using measures from two different domains: the R-precision score, which is commonly used in information retrieval and human judgment/collective human opinion, which is used in human-machine systems
HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset
This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in (2015, 2016) [56–58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep
Trade mark similarity assessment support system
Trade marks are valuable intangible intellectual property (IP) assets with potentially
high reputational value that can be protected. Similarity between trade marks may
potentially lead to infringement. That similarity is normally assessed based on the
visual, conceptual and phonetic aspects of the trade marks in question. Hence, this
thesis addresses this issue by proposing a trade mark similarity assessment support
system that uses the three main aspects of trade mark similarity as a mechanism to
avoid future infringement.
A conceptual model of the proposed trade mark similarity assessment support
system is first proposed and developed based on the similarity assessment criteria
outlined in a trade mark manual. The proposed model is the first contribution of this
study, and it consists of visual, conceptual, phonetic and inference engine modules.
The second contribution of this work is an algorithm that compares trade
marks based on their visual similarity. The algorithm performs a similarity
assessment using content-based image retrieval (CBIR) technology and an
integrated visual descriptor derived using the low-level image feature, i.e. the shape
feature. The performance of the algorithm is then assessed using information
retrieval based measures. The obtained result demonstrates better retrieval
performance in comparison to the state of the art algorithm.
The conceptual aspect of trade mark similarity is then examined and analysed
using a proposed algorithm that employs semantic technology in the conceptual
module. This contribution enables the computation of the conceptual similarity
between trade marks, with the utilisation of an external knowledge source in the
form of a lexical ontology, together with natural language processing and set
similarity theory. The proposed algorithm is evaluated using both information
VI
retrieval and human collective opinion measures. The retrieval result produced by
the proposed algorithm outperforms the traditional string similarity comparison
algorithm in both measures.
The phonetic module examines the phonetic similarity of trade marks using
another proposed algorithm that utilises phoneme analysis. This algorithm employs
phonological features, which are extracted based on human speech articulation. In
addition, the algorithm also provides a mechanism to compare the phonetic aspect
of trade marks with typographic characters. The proposed algorithm is the fourth
contribution of this study. It is evaluated using an information retrieval based
measure. The result shows better retrieval performance in comparison to the
traditional string similarity algorithm.
The final contribution of this study is a methodology to aggregate the overall
similarity score between trade marks. It is motivated by the understanding that trade
mark similarity should be assessed holistically; that is, the visual, conceptual and
phonetic aspects should be considered together. The proposed method is
developed in the inference engine module; it utilises fuzzy logic for the inference
process. A set of fuzzy rules, which consists of several membership functions, is
also derived in this study based on the trade mark manual and a collection of trade
mark disputed cases is analysed. The method is then evaluated using both
information retrieval and human collective opinion. The proposed method improves
the retrieval accuracy and the experiment also proves that the aggregated similarity
score correlates well with the score produced from human collective opinion.
The evaluations performed in the course of this study employ the following
datasets: the MPEG-7 shape dataset, the MPEG-7 trade marks dataset, a collection
of 1400 trade marks from real trade mark dispute cases, and a collection of 378,943
company names
Fuzzy natural language similarity measures through computing with words
A vibrant area of research is the understanding of human language by machines to engage in
conversation with humans to achieve set goals. Human language is naturally fuzzy by nature,
with words meaning different things to different people, depending on the context. Fuzzy
words are words with a subjective meaning, typically used in everyday human natural
language dialogue and often ambiguous and vague in meaning and dependent on an
individual’s perception. Fuzzy Sentence Similarity Measures (FSSM) are algorithms that can
compare two or more short texts which contain fuzzy words and return a numeric measure
of similarity of meaning between them.
The motivation for this research is to create a new FSSM called FUSE (FUzzy Similarity
mEasure). FUSE is an ontology-based similarity measure that uses Interval Type-2 Fuzzy Sets
to model relationships between categories of human perception-based words. Four versions
of FUSE (FUSE_1.0 – FUSE_4.0) have been developed, investigating the presence of linguistic
hedges, the expansion of fuzzy categories and their use in natural language, incorporating
logical operators such as ‘not’ and the introduction of the fuzzy influence factor.
FUSE has been compared to several state-of-the-art, traditional semantic similarity measures
(SSM’s) which do not consider the presence of fuzzy words. FUSE has also been compared to
the only published FSSM, FAST (Fuzzy Algorithm for Similarity Testing), which has a limited
dictionary of fuzzy words and uses Type-1 Fuzzy Sets to model relationships between
categories of human perception-based words. Results have shown FUSE is able to improve on
the limitations of traditional SSM’s and the FAST algorithm by achieving a higher correlation
with the average human rating (AHR) compared to traditional SSM’s and FAST using several
published and gold-standard datasets.
To validate FUSE, in the context of a real-world application, versions of the algorithm were
incorporated into a simple Question & Answer (Q&A) dialogue system (DS), referred to as
FUSION, to evaluate the improvement of natural language understanding. FUSION was tested
on two different scenarios using human participants and results compared to a traditional
SSM known as STASIS. Results of the DS experiments showed a True rating of 88.65%
compared to STASIS with an average True rating of 61.36%. Results showed that the FUSE
algorithm can be used within real world applications and evaluation of the DS showed an
improvement of natural language understanding, allowing semantic similarity to be
calculated more accurately from natural user responses.
The key contributions of this work can be summarised as follows: The development of a new
methodology to model fuzzy words using Interval Type-2 fuzzy sets; leading to the creation of
a fuzzy dictionary for nine fuzzy categories, a useful resource which can be used by other
researchers in the field of natural language processing and Computing with Words with other
fuzzy applications such as semantic clustering. The development of a FSSM known as FUSE,
which was expanded over four versions, investigating the incorporation of linguistic hedges,
the expansion of fuzzy categories and their use in natural language, inclusion of logical
operators such as ‘not’ and the introduction of the fuzzy influence factor. Integration of the
FUSE algorithm into a simple Q&A DS referred to as FUSION demonstrated that FSSM can be
used in a real-world practical implementation, therefore making FUSE and its fuzzy dictionary
generalisable to other applications