15 research outputs found
A customized semantic service retrieval methodology for the digital ecosystems environment
With the emergence of the Web and its pervasive intrusion on individuals, organizations, businesses etc., people now realize that they are living in a digital environment analogous to the ecological ecosystem. Consequently, no individual or organization can ignore the huge impact of the Web on social well-being, growth and prosperity, or the changes that it has brought about to the world economy, transforming it from a self-contained, isolated, and static environment to an open, connected, dynamic environment. Recently, the European Union initiated a research vision in relation to this ubiquitous digital environment, known as Digital (Business) Ecosystems. In the Digital Ecosystems environment, there exist ubiquitous and heterogeneous species, and ubiquitous, heterogeneous, context-dependent and dynamic services provided or requested by species. Nevertheless, existing commercial search engines lack sufficient semantic supports, which cannot be employed to disambiguate user queries and cannot provide trustworthy and reliable service retrieval. Furthermore, current semantic service retrieval research focuses on service retrieval in the Web service field, which cannot provide requested service retrieval functions that take into account the features of Digital Ecosystem services. Hence, in this thesis, we propose a customized semantic service retrieval methodology, enabling trustworthy and reliable service retrieval in the Digital Ecosystems environment, by considering the heterogeneous, context-dependent and dynamic nature of services and the heterogeneous and dynamic nature of service providers and service requesters in Digital Ecosystems.The customized semantic service retrieval methodology comprises: 1) a service information discovery, annotation and classification methodology; 2) a service retrieval methodology; 3) a service concept recommendation methodology; 4) a quality of service (QoS) evaluation and service ranking methodology; and 5) a service domain knowledge updating, and service-provider-based Service Description Entity (SDE) metadata publishing, maintenance and classification methodology.The service information discovery, annotation and classification methodology is designed for discovering ubiquitous service information from the Web, annotating the discovered service information with ontology mark-up languages, and classifying the annotated service information by means of specific service domain knowledge, taking into account the heterogeneous and context-dependent nature of Digital Ecosystem services and the heterogeneous nature of service providers. The methodology is realized by the prototype of a Semantic Crawler, the aim of which is to discover service advertisements and service provider profiles from webpages, and annotating the information with service domain ontologies.The service retrieval methodology enables service requesters to precisely retrieve the annotated service information, taking into account the heterogeneous nature of Digital Ecosystem service requesters. The methodology is presented by the prototype of a Service Search Engine. Since service requesters can be divided according to the group which has relevant knowledge with regard to their service requests, and the group which does not have relevant knowledge with regard to their service requests, we respectively provide two different service retrieval modules. The module for the first group enables service requesters to directly retrieve service information by querying its attributes. The module for the second group enables service requesters to interact with the search engine to denote their queries by means of service domain knowledge, and then retrieve service information based on the denoted queries.The service concept recommendation methodology concerns the issue of incomplete or incorrect queries. The methodology enables the search engine to recommend relevant concepts to service requesters, once they find that the service concepts eventually selected cannot be used to denote their service requests. We premise that there is some extent of overlap between the selected concepts and the concepts denoting service requests, as a result of the impact of service requesters’ understandings of service requests on the selected concepts by a series of human-computer interactions. Therefore, a semantic similarity model is designed that seeks semantically similar concepts based on selected concepts.The QoS evaluation and service ranking methodology is proposed to allow service requesters to evaluate the trustworthiness of a service advertisement and rank retrieved service advertisements based on their QoS values, taking into account the contextdependent nature of services in Digital Ecosystems. The core of this methodology is an extended CCCI (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) metrics, which allows a service requester to evaluate the performance of a service provider in a service transaction based on QoS evaluation criteria in a specific service domain. The evaluation result is then incorporated with the previous results to produce the eventual QoS value of the service advertisement in a service domain. Service requesters can rank service advertisements by considering their QoS values under each criterion in a service domain.The methodology for service domain knowledge updating, service-provider-based SDE metadata publishing, maintenance, and classification is initiated to allow: 1) knowledge users to update service domain ontologies employed in the service retrieval methodology, taking into account the dynamic nature of services in Digital Ecosystems; and 2) service providers to update their service profiles and manually annotate their published service advertisements by means of service domain knowledge, taking into account the dynamic nature of service providers in Digital Ecosystems. The methodology for service domain knowledge updating is realized by a voting system for any proposals for changes in service domain knowledge, and by assigning different weights to the votes of domain experts and normal users.In order to validate the customized semantic service retrieval methodology, we build a prototype – a Customized Semantic Service Search Engine. Based on the prototype, we test the mathematical algorithms involved in the methodology by a simulation approach and validate the proposed functions of the methodology by a functional testing approach
Using Search Term Positions for Determining Document Relevance
The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information.
In particular, textual information is becoming increasingly available in electronic form.
Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs.
The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents.
The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term.
However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems.
This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time.
To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data.
The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities.
The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research
Recommended from our members
Integration of search theories and evidential analysis to Web-wide Discovery of information for decision support
The main contribution of this research is that it addresses the issues associated with traditional information gathering and presents a novel semantic approach method to Web-based discovery of previously unknown intelligence for effective decision making. Itprovides a comprehensive theoretical background to the proposed solution together with a demonstration of the effectiveness of the method from results of the experiments, showing how the quality of collected information can be significantly enhanced by previously unknown information derived from the available known facts.
The quality of decisions made in business and government relates directly to the quality of the information used to formulate the decision. This information may be retrieved from an organisation’s knowledge base (Intranet) or from the World Wide Web. The purpose of this thesis is to investigate the specifics of information gathering from these sources. It has studied a number of search techniques that rely on statistical and semantic analysis of unstructured information, and identified benefits and limitations of these techniques. It was concluded that enterprise search technologies can efficiently manipulate Intranet held information, but require complex processing of large amount of textual information, which is not feasible and scalable when applied to the Web.
Based upon the search methods investigations, this thesis introduces a new semantic Web-based search method that automates the correlation of topic-related content for discovery of hitherto unknown information from disparate and widely diverse Web-sources. This method is in contrast to traditional search methods that are constrained to specific or narrowly defined topics. It addresses the three key aspects of the information: semantic closeness to search topic, information completeness, and quality. The method is based on algorithms from Natural Language Processing combined with techniques adapted from grounded theory and Dempster-Shafer theory to significantly enhance the discovery of topic related Web-sourced intelligence.
This thesis also describes the development of the new search solution by showing the integration of the mathematical methods used as well as the development of the working model. Real-world experiments demonstrate the effectiveness of the model with supporting performance analysis, showing that the quality of the extracted content is significantly enhanced comparing to the traditional Web-search approaches
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, FASE 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The 23 full papers, 1 tool paper and 6 testing competition papers presented in this volume were carefully reviewed and selected from 81 submissions. The papers cover topics such as requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, software processes, security and software evolution
Handbook of Digital Face Manipulation and Detection
This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area
Ontology-based semantic reminiscence support system
This thesis addresses the needs of people who find reminiscence helpful in focusing on
the development of a computerised reminiscence support system, which facilitates the
access to and retrieval of stored memories used as the basis for positive interactions
between elderly and young, and also between people with cognitive impairment and
members of their family or caregivers.
To model users’ background knowledge, this research defines a light weight useroriented
ontology and its building principles. The ontology is flexible, and has
simplified knowledge structure populated with semantically homogeneous ontology
concepts. The user-oriented ontology is different from generic ontology models, as it
does not rely on knowledge experts. Its structure enables users to browse, edit and
create new entries on their own.
To solve the semantic gap problem in personal information retrieval, this thesis
proposes a semantic ontology-based feature matching method. It involves natural
language processing and semantic feature extraction/selection using the user-oriented
ontology. It comprises four stages: (i) user-oriented ontology building, (ii) semantic
feature extraction for building vectors representing information objects, (iii) semantic
feature selection using the user-oriented ontology, and (iv) measuring the similarity
between the information objects.
To facilitate personal information management and dynamic generation of content,
the system uses ontologies and advanced algorithms for semantic feature matching.
An algorithm named Onto-SVD is also proposed, which uses the user-oriented
ontology to automatically detect the semantic relations within the stored memories. It
combines semantic feature selection with matrix factorisation and k-means clustering
to achieve topic identification based on semantic relations.
The thesis further proposes an ontology-based personalised retrieval mechanism for
the system. It aims to assist people to recall, browse and re-discover events from their
lives by considering their profiles and background knowledge, and providing them
v
with customised retrieval results. Furthermore, a user profile space model is defined,
and its construction method is also described. The model combines multiple useroriented
ontologies and has a self-organised structure based on relevance feedback.
The identification of person’s search intentions in this mechanism is on the conceptual
level and involves the person’s background knowledge. Based on the identified search
intentions, knowledge spanning trees are automatically generated from the ontologies
or user profile spaces. The knowledge spanning trees are used to expand and reform
queries, which enhance the queries’ semantic representations by applying domain
knowledge.
The crowdsourcing-based system evaluation measures users’ satisfaction on the
generated content of Sem-LSB. It compares the advantage and disadvantage of three
types of content presentations (i.e. unstructured, LSB-based and semantic/knowledgebased).
Based on users’ feedback, the semantic/knowledge-based presentation is
considered to have higher overall satisfaction and stronger reminiscing support effects
than the others
Recommended from our members
Clustering Information Retrieval Search Outputs
Users are known to have difficulties in dealing with information retrieval search outputs especially if the outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems. Clustering may provide users an overview of the output by exploiting the topicality information that resides in the output but has not been used in the retrieval stage. It can enable them to find the relevant documents more easily and also help them to form an understanding of the different facets of the query that have been provided for their Inspection. This project aimed to investigate the viability of using clustering as a way of mediating users’ interaction with search outputs and attempted to identify its possible benefits.
Can&Ozkarahan’s(90) C3M algorithm was used to test the effectiveness of clustering as a way of search output presentation. C3M is a relatively simple, non-hierarchical method that has been shown to give compatible or superior results to best-known hierarchical methods.
The method was implemented in TCL and linked to the department’s experimental IR system Okapi. Implementation included a procedure of term selection for document representation which preceded the clustering process and a procedure involving cluster representation for users’ viewing following the clustering process. After some tuning of the implementation parameters for the databases used, several experiments were designed and conducted to assess whether clusters could group documents in useful ways.
One group of experiments aimed to assess the ability of the implementation to bring together topically related documents. It was quite difficult to gather data for such an assessment, but the existence of a set of data generated for TREC Interactive track(1996) enabled us to design experiments that at least approximately satisfied our objective. TREC provided a set of queries, and groups of relevant documents with facet assignments made by expert users. It was thus possible to make an Inference by measuring the correlation between the clusters relevant documents were assigned to and the facet assignments made for the documents by TREC experts.
The utility of this data set was limited for various reasons discussed in the related chapters, however, it can be concluded that clusters cannot be relied on to bring together relevant documents assigned to a certain facet. While there was some correlation between the cluster and facet assignments of the documents when the clustering was done only on relevant documents, no correlation could be found when the clustering was based on results of queries defined by City participants to the Interactive track.
Another group of experiments was conducted to compare output clustering with relevance ranking as a search output representation method. This comparison was necessary as an immediate consequence of clustering search output would be the loss of relevance ranking. It had to be assessed whether clustering could help users to find the relevant documents more easily than by relevance ranking, before any clustering solution could be proposed as an alternative to relevance ranked output.
For this purpose, two sets of user experiments(n=20 and n=57) were conducted based on the users’ own information needs. While changes have been made to the implementation between the first and the second set of experiments, the experimental design was almost the same in both runs. Users were first asked to rank clusters formed from the search output(top 50 documents) and then make relevance judgements for the individual documents for the same output. The precision of cluster(s) marked best by the users were then compared to precision values that would be attained by relevance ranking at comparable thresholds.
The results from the 1st group of user experiments were not conclusive(in some part due to the smallness of the data set), but they drew our attention to the importance of representation of clusters and documents for users’ viewing. After some changes to the implementation, mainly related to representation issues, and an intermediate set of 10 experiments to assess two new representation formats, a set of 57 user experiments were conducted to measure and compare precision values attainable by clustering versus relevance ranking.
These experiments revealed no significant precision difference between clustered outputs and ranked lists. The number of cases where one method achieved better than the other was slightly higher for the ranked lists at the top cluster level and slightly higher for the clustered representation at the top two clusters level. However the overall average precision values were higher for the ranked list at both levels.
As such, clustering did not appear to be preferable to ranked lists especially as It also represented overheads in both computing time and resources involved in creation of the clusters, and the time and effort taken by the users to inspect them.
An interesting outcome of the user experiments was the ability of the users to identify clusters that do not include relevant information. There were less relevant documents among the clusters marked last by the users as compared to the documents ranked last at similar threshold levels. This brought out the possibility of using clusters as an exclusion tool to improve the precision of ranked lists. After exclusion of documents from the last cluster, ranked lists performed significantly better than the clusters at the top cluster level.
There was also some evidence (consisting of observation of users during the experiments and a few user comments) that clusters could be used to provide the users with a glimpse of the search results, in order to decide whether to inspect the search results or initiate a new query straight away.
In summary, cumulative experiment results imply that clustering cannot outperform relevance ranking, and seems to deserve only a secondary role in users’ interaction with IR systems. However, it should also be noted that the experiment results are not representative of the whole set of possible user types and search situations and it may be possible to Identify search situations where clustering can be more beneficial than relevance ranking
Handbook of Digital Face Manipulation and Detection
This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area