19,285 research outputs found
Highly focused document retrieval in aerospace engineering : user interaction design and evaluation
Purpose â This paper seeks to describe the preliminary studies (on both users and data), the design and evaluation of the K-Search system for searching legacy documents in aerospace engineering. Real-world reports of jet engine maintenance challenge the current indexing practice, while real usersâ tasks require retrieving the information in the proper context. K-Search is currently in use in Rolls-Royce plc and has evolved to include other tools for knowledge capture and management.
Design/methodology/approach â Semantic Web techniques have been used to automatically extract information from the reports while maintaining the original context, allowing a more focused retrieval than with more traditional techniques. The paper combines semantic search with classical information retrieval to increase search effectiveness. An innovative user interface has been designed to take advantage of this hybrid search technique. The interface is designed to allow a flexible and
personal approach to searching legacy data.
Findings â The user evaluation showed that the system is effective and well received by users. It also shows that different people look at the same data in different ways and make different use of the same system depending on their individual needs, influenced by their job profile and personal attitude.
Research limitations/implications â This study focuses on a specific case of an enterprise working in aerospace engineering. Although the findings are likely to be shared with other engineering domains (e.g. mechanical, electronic), the study does not expand the evaluation to different settings.
Originality/value â The study shows how real context of use can provide new and unexpected challenges to researchers and how effective solutions can then be adopted and used in organizations.</p
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Structuring visual exploratory analysis of skill demand
The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to usersâ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on
A New Design for Open and Scalable Collaboration of Independent Databases in Digitally Connected Enterprises
âDigitally connected enterprisesâ refers to e-business, global supply chains, and other new business designs of the Knowledge Economy; all of which require open and scalable information supply chains across independent enterprises. Connecting proprietarily designed and controlled enterprise databases in these information supply chains is a critical success factor for them. Previous connection designs tend to rely on âhard-codedâ regimes, which do not respond well to disruptions (including changes and failures), and do not afford these enterprises sufficient flexibility to join simultaneously in multiple supply chain regimes and share information for the benefit of all. The paper develops a new design: It combines matchmaking with global database query, and thereby supports the interoperation of independent databases to form on-demand information supply chains. The design provides flexible (re-)configuration to decrease the impact of disruption, and proactive control to increase collaboration and information sharing. More broadly, the papers results contribute to a new Information System design method for massively extended enterprises, and facilitate new business designs using digital connections at the level of databases
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, âCloudâ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
Development of an ontology for aerospace engine components degradation in service
This paper presents the development of an ontology for component service degradation. In this paper, degradation mechanisms in gas turbine metallic components are used for a case study to explain how a taxonomy within an ontology can be validated. The validation method used in this paper uses an iterative process and sanity checks. Data extracted from on-demand textual information are filtered and grouped into classes of degradation mechanisms. Various concepts are systematically and hierarchically arranged for use in the service maintenance ontology. The allocation of the mechanisms to the AS-IS ontology presents a robust data collection hub. Data integrity is guaranteed when the TO-BE ontology is introduced to analyse processes relative to various failure events. The initial evaluation reveals improvement in the performance of the TO-BE domain ontology based on iterations and updates with recognised mechanisms. The information extracted and collected is required to improve service k nowledge and performance feedback which are important for service engineers. Existing research areas such as natural language processing, knowledge management, and information extraction were also examined
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm usersâ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to âunannotatedâ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the âBag of Visual Wordsâ
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ânon-informative
visual wordsâ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
Local matching learning of large scale biomedical ontologies
Les larges ontologies biomĂ©dicales dĂ©crivent gĂ©nĂ©ralement le mĂȘme domaine d'intĂ©rĂȘt, mais en utilisant des modĂšles de modĂ©lisation et des vocabulaires diffĂ©rents. Aligner ces ontologies qui sont complexes et hĂ©tĂ©rogĂšnes est une tĂąche fastidieuse. Les systĂšmes de matching doivent fournir des rĂ©sultats de haute qualitĂ© en tenant compte de la grande taille de ces ressources. Les systĂšmes de matching d'ontologies doivent rĂ©soudre deux problĂšmes: (i) intĂ©grer la grande taille d'ontologies, (ii) automatiser le processus d'alignement. Le matching d'ontologies est une tĂąche difficile en raison de la large taille des ontologies. Les systĂšmes de matching d'ontologies combinent diffĂ©rents types de matcher pour rĂ©soudre ces problĂšmes. Les principaux problĂšmes de l'alignement de larges ontologies biomĂ©dicales sont: l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle, l'espace de recherche Ă©levĂ© et la qualitĂ© rĂ©duite des alignements rĂ©sultants.
Les systĂšmes d'alignement d'ontologies combinent diffĂ©rents matchers afin de rĂ©duire l'hĂ©tĂ©rogĂ©nĂ©itĂ©. Cette combinaison devrait dĂ©finir le choix des matchers Ă combiner et le poids. DiffĂ©rents matchers traitent diffĂ©rents types d'hĂ©tĂ©rogĂ©nĂ©itĂ©. Par consĂ©quent, le paramĂ©trage d'un matcher devrait ĂȘtre automatisĂ© par les systĂšmes d'alignement d'ontologies afin d'obtenir une bonne qualitĂ© de correspondance. Nous avons proposĂ© une approche appele "local matching learning" pour faire face Ă la fois Ă la grande taille des ontologies et au problĂšme de l'automatisation. Nous divisons un gros problĂšme d'alignement en un ensemble de problĂšmes d'alignement locaux plus petits. Chaque problĂšme d'alignement local est indĂ©pendamment alignĂ© par une approche d'apprentissage automatique. Nous rĂ©duisons l'Ă©norme espace de recherche en un ensemble de taches de recherche de corresondances locales plus petites. Nous pouvons aligner efficacement chaque tache de recherche de corresondances locale pour obtenir une meilleure qualitĂ© de correspondance. Notre approche de partitionnement se base sur une nouvelle stratĂ©gie Ă dĂ©coupes multiples gĂ©nĂ©rant des partitions non volumineuses et non isolĂ©es. Par consĂ©quence, nous pouvons surmonter le problĂšme de l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle. Le nouvel algorithme de partitionnement est basĂ© sur le clustering hiĂ©rarchique par agglomĂ©ration (CHA). Cette approche gĂ©nĂšre un ensemble de tĂąches de correspondance locale avec un taux de couverture suffisant avec aucune partition isolĂ©e.
Chaque tùche d'alignement local est automatiquement alignée en se basant sur les techniques d'apprentissage automatique. Un classificateur local aligne une seule tùche d'alignement local. Les classificateurs locaux sont basés sur des features élémentaires et structurelles. L'attribut class de chaque set de donne d'apprentissage " training set" est automatiquement étiqueté à l'aide d'une base de connaissances externe. Nous avons appliqué une technique de sélection de features pour chaque classificateur local afin de sélectionner les matchers appropriés pour chaque tùche d'alignement local. Cette approche réduit la complexité d'alignement et augmente la précision globale par rapport aux méthodes d'apprentissage traditionnelles. Nous avons prouvé que l'approche de partitionnement est meilleure que les approches actuelles en terme de précision, de taux de couverture et d'absence de partitions isolées. Nous avons évalué l'approche d'apprentissage d'alignement local à l'aide de diverses expériences basées sur des jeux de données d'OAEI 2018. Nous avons déduit qu'il est avantageux de diviser une grande tùche d'alignement d'ontologies en un ensemble de tùches d'alignement locaux. L'espace de recherche est réduit, ce qui réduit le nombre de faux négatifs et de faux positifs. L'application de techniques de sélection de caractéristiques à chaque classificateur local augmente la valeur de rappel pour chaque tùche d'alignement local.Although a considerable body of research work has addressed the problem of ontology matching, few studies have tackled the large ontologies used in the biomedical domain. We introduce a fully automated local matching learning approach that breaks down a large ontology matching task into a set of independent local sub-matching tasks. This approach integrates a novel partitioning algorithm as well as a set of matching learning techniques. The partitioning method is based on hierarchical clustering and does not generate isolated partitions. The matching learning approach employs different techniques: (i) local matching tasks are independently and automatically aligned using their local classifiers, which are based on local training sets built from element level and structure level features, (ii) resampling techniques are used to balance each local training set, and (iii) feature selection techniques are used to automatically select the appropriate tuning parameters for each local matching context. Our local matching learning approach generates a set of combined alignments from each local matching task, and experiments show that a multiple local classifier approach outperforms conventional, state-of-the-art approaches: these use a single classifier for the whole ontology matching task. In addition, focusing on context-aware local training sets based on local feature selection and resampling techniques significantly enhances the obtained results
- âŠ