131 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)
We describe a binding schema markup language (BSML) for describing data
interchange between scientific codes. Such a facility is an important
constituent of scientific problem solving environments (PSEs). BSML is designed
to integrate with a PSE or application composition system that views model
specification and execution as a problem of managing semistructured data. The
data interchange problem is addressed by three techniques for processing
semistructured data: validation, binding, and conversion. We present BSML and
describe its application to a PSE for wireless communications system design
Bounded repairability for regular tree languages
We study the problem of bounded repairability of a given restriction tree language R into a target tree language T. More precisely, we say that R is bounded repairable w.r.t. T if there exists a bound on the number of standard tree editing operations necessary to apply to any tree in R in order to obtain a tree in T. We consider a number of possible specifications for tree languages: bottom-up tree automata (on curry encoding of unranked trees) that capture the class of XML Schemas and DTDs. We also consider a special case when the restriction language R is universal, i.e., contains all trees over a given alphabet. We give an effective characterization of bounded repairability between pairs of tree languages represented with automata. This characterization introduces two tools, synopsis trees and a coverage relation between them, allowing one to reason about tree languages that undergo a bounded number of editing operations. We then employ this characterization to provide upper bounds to the complexity of deciding bounded repairability and we show that these bounds are tight. In particular, when the input tree languages are specified with arbitrary bottom-up automata, the problem is coNEXPTIME-complete. The problem remains coNEXPTIME-complete even if we use deterministic non-recursive DTDs to specify the input languages. The complexity of the problem can be reduced if we assume that the alphabet, the set of node labels, is fixed: the problem becomes PSPACE-complete for non-recursive DTDs and coNP-complete for deterministic non-recursive DTDs. Finally, when the restriction tree language R is universal, we show that the bounded repairability problem becomes EXPTIME-complete if the target language is specified by an arbitrary bottom-up tree automaton and becomes tractable (PTIME-complete, in fact) when a deterministic bottom-up automaton is used
Semantic technologies: from niche to the mainstream of Web 3? A comprehensive framework for web Information modelling and semantic annotation
Context: Web information technologies developed and applied in the last decade
have considerably changed the way web applications operate and have
revolutionised information management and knowledge discovery. Social
technologies, user-generated classification schemes and formal semantics have a
far-reaching sphere of influence. They promote collective intelligence, support
interoperability, enhance sustainability and instigate innovation.
Contribution: The research carried out and consequent publications follow the
various paradigms of semantic technologies, assess each approach, evaluate its
efficiency, identify the challenges involved and propose a comprehensive framework for web information modelling and semantic annotation, which is the thesis’ original contribution to knowledge. The proposed framework assists web information
modelling, facilitates semantic annotation and information retrieval, enables system interoperability and enhances information quality.
Implications: Semantic technologies coupled with social media and end-user
involvement can instigate innovative influence with wide organisational implications that can benefit a considerable range of industries. The scalable and sustainable business models of social computing and the collective intelligence of organisational social media can be resourcefully paired with internal research and knowledge from interoperable information repositories, back-end databases and legacy systems.
Semantified information assets can free human resources so that they can be used to better serve business development, support innovation and increase productivity
A teachable semi-automatic web information extraction system based on evolved regular expression patterns
This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages.
Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements
Procedural Creation of Medical Reports with Hierarchical Information Processing in Radiation Oncology
Background: For many years, the oncological doctor's letter has been the pivotal means of information transfer to general practitioners, medical specialists or medical consultants. Yet, both creator and recipient require a high level of abstraction, retentiveness and analysis due to the large number of diagnoses and therapies. In contrast to the commonly used structure of doctor's letters, where all diagnoses and therapies are listed in sequential order with all diagnoses first, it is by no means trivial to establish the important chronological and hierarchical context in the description of oncological cases. Additional aspects of importance are the integration of these letters into existing clinical and departmental information systems (for example via HL7 interface), various export formats (for example PDF, HTML), fax and encrypted email. Moreover these letters need a modern layout that, among others, meets the requirements of corporate design. Methods: The requirements for a doctor's letter system are manifold and can only be represented rudimentarily via a normal word processing system. Due to this deficiency we developed a system that covers all special features and requirements for clinical use. The system is based on a scalable and extensible client-server architecture. We use the programming languages Harbour, C++, PHP and JavaScript, Microsoft SQL database for data storage and the HL7 standard as the interface to other information systems such as hospital information system (HIS). Export formats are PDF, HTML/XML. Layouts are generated with TeX, LaTeX and MikTeX. Results: The aforementioned requirements were resolved with the doctor's letter and finding system IntDok. The hierarchical presentation of diagnoses, histologies and therapies provides the recipient with a first outline of the course of the disease. A strict procedure controls the whole process of document compilation and assists the user with many highly regarded tools such as text blocks, import and export (PDF and HTML/XML including barcodes) functions or HL7 interface to other information systems. The software also provides a sophisticated mail merging. All content from previous letters can easily be inserted into the current document. A TeX-server automatically provides document layout including supreme hyphenation so that uniform and perfect appearance (corporate design) is guaranteed. The documents are saved in a MS-SQL database (almost 230,000 documents since 1991), independent of any proprietary formats such as MS-Word. Conclusion: Creation of documents is fast, simple and well-structured. Sophisticated tools guarantee the optimal use of human resources and time. The system is an important module in our overall digital work environment
Comparaison et évolution de schémas XML
XML has become the de facto format for data exchange. We aim at establishing a multi-system environment where some local original systems work in harmony with a global integrated system, which is a conservative evolution of local ones. Data exchange is possible in both directions, allowing activities on both levels. For this purpose, we need schema mapping whose is to ensure schema evolution, and to guide the construction of a document translator, allowing automatic data adaptation wrt type evolution. We propose a set of tools to help dealing with XML database evolution. These tools are used : (i) to compute a mapping capable of obtaining a global schema which is a conservative extension of original local schemas, and to adapt XML documents ; (ii) to compute the set of integrity constraints for the global system on the basis of the local ones ; (iii) to compare XML types of two systems in order to replace a system by another one ; (iv) to correct a new document with respect to an XML schema. Experimental results are discussed, showing the efficiency of our methods in many situations.XML est devenu le format standard d’échange de données. Nous souhaitons construire un environnement multi-système où des systèmes locaux travaillent en harmonie avec un système global, qui est une évolution conservatrice des systèmes locaux. Dans cet environnement, l’échange de données se fait dans les deux sens. Pour y parvenir nous avons besoin d’un mapping entre les schémas des systèmes. Le but du mapping est d’assurer l’évolution des schémas et de guider l’adaptation des documents entre les schémas concernés. Nous proposons des outils pour faciliter l’évolution de base de données XML. Ces outils permettent de : (i) calculer un mapping entre le schéma global et les schémas locaux, et d’adapter les documents ; (ii) calculer les contraintes d’intégrité du système global à partir de celles des systèmes locaux ; (iii) comparer les schémas de deux systèmes pour pouvoir remplacer un système par celui qui le contient ; (iv) corriger un nouveau document qui est invalide par rapport au schéma d’un système, afin de l’ajouter au système. Des expériences ont été menées sur des données synthétiques et réelles pour montrer l’efficacité de nos méthodes
Recommended from our members
An integrated framework for developing generic modular reconfigurable platforms for micro manufacturing and its implementation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The continuing trends of miniaturisation, mass customisation, globalisation and wide use of the Internet have great impacts upon manufacturing in the 21st century. Micro manufacturing will play an increasingly important role in bridging the gap between the traditional precision manufacturing and the emerging technologies like MEMS/NEMS. The key requirements for micro manufacturing in this context are hybrid manufacturing capability, modularity, reconfigurability, adaptability and energy/resource efficiency. The existing design approaches tend to have narrow scope and are largely limited to individual manufacturing processes and applications. The above requirements demand a fundamentally new approach to the future applications of micro manufacturing so as to obtain producibility, predictability and productivity covering the full process chains and value chains.
A novel generic modular reconfigurable platform (GMRP) is proposed in such a context. The proposed GMRP is able to offer hybrid manufacturing capabilities, modularity, reconfigurablity and adaptivity as both an individual machine tool and a micro manufacturing system, and provides a cost effective solution to high value micro manufacturing in an agile, responsive and mass customisation manner.
An integrated framework has been developed to assist the design of GMRPs due to their complexity. The framework incorporates theoretical GMRP model, design support system and extension interfaces. The GMRP model covers various relevant micro manufacturing processes and machine tool elements. The design support system includes a user-friendly interface, a design engine for design process and design evaluation, together with scalable design knowledge base and database. The functionalities of the framework can also be extended through the design support system interface, the GMRP interface and the application interface, i.e. linking to external hardware and/or software modules.
The design support system provides a number of tools for the analysis and evaluation of the design solutions. The kinematic simulation of machine tools can be performed using the Virtual Reality toolbox in Matlab. A module has also been developed for the multiscale modelling, simulation and results analysis in Matlab. A number of different cutting parameters can be studied and the machining performance can be subsequently evaluated using this module. The mathematical models for a non-traditional micro manufacturing process, micro EDM, have been developed with the simulation performed using FEA.
Various design theories and methodologies have been studied, and the axiomatic design theory has been selected because of its great power and simplicity. It has been applied in the conceptual design of GMRP and its design support system. The implementation of the design support system is carried out using Matlab, Java and XML technologies. The proposed GMRP and framework have been evaluated through case studies and experimental results
- …