20,577 research outputs found
Just an Update on PMING Distance for Web-based Semantic Similarity in Artificial Intelligence and Data Mining
One of the main problems that emerges in the classic approach to semantics is
the difficulty in acquisition and maintenance of ontologies and semantic
annotations. On the other hand, the Internet explosion and the massive
diffusion of mobile smart devices lead to the creation of a worldwide system,
which information is daily checked and fueled by the contribution of millions
of users who interacts in a collaborative way. Search engines, continually
exploring the Web, are a natural source of information on which to base a
modern approach to semantic annotation. A promising idea is that it is possible
to generalize the semantic similarity, under the assumption that semantically
similar terms behave similarly, and define collaborative proximity measures
based on the indexing information returned by search engines. The PMING
Distance is a proximity measure used in data mining and information retrieval,
which collaborative information express the degree of relationship between two
terms, using only the number of documents returned as result for a query on a
search engine. In this work, the PMINIG Distance is updated, providing a novel
formal algebraic definition, which corrects previous works. The novel point of
view underlines the features of the PMING to be a locally normalized linear
combination of the Pointwise Mutual Information and Normalized Google Distance.
The analyzed measure dynamically reflects the collaborative change made on the
web resources
Utilising semantic technologies for intelligent indexing and retrieval of digital images
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion
Soft behaviour modelling of user communities
A soft modelling approach for describing behaviour in on-line user communities is introduced in this work. Behaviour models of individual users in dynamic virtual environments have been described in the literature in terms of timed transition automata; they have various drawbacks. Soft multi/agent behaviour automata are defined and proposed to describe multiple user behaviours and to recognise larger classes of user group histories, such as group histories which contain unexpected behaviours. The notion of deviation from the user community model allows defining a soft parsing process which assesses and evaluates the dynamic behaviour of a group of users interacting in virtual environments, such as e-learning and e-business platforms. The soft automaton model can describe virtually infinite sequences of actions due to multiple users and subject to temporal constraints. Soft measures assess a form of distance of observed behaviours by evaluating the amount of temporal deviation, additional or omitted actions contained in an observed history as well as actions performed by unexpected users. The proposed model allows the soft recognition of user group histories also when the observed actions only partially meet the given behaviour model constraints. This approach is more realistic for real-time user community support systems, concerning standard boolean model recognition, when more than one user model is potentially available, and the extent of deviation from community behaviour models can be used as a guide to generate the system support by anticipation, projection and other known techniques. Experiments based on logs from an e-learning platform and plan compilation of the soft multi-agent behaviour automaton show the expressiveness of the proposed model
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
- …