Search CORE

131 research outputs found

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments (PSEs)

Author: Bae Kyung Kyoon
He Jian
Jiang Jing
Ramakrishnan Naren
Rappaport Theodore S.
Shaffer Clifford A.
Tranter William H.
Verstak Alex
Watson Layne T.
Publication venue
Publication date: 18/02/2002
Field of study

We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

XML document-grammar comparison: related problems and applications

Author: A. Algergawy
A. Balmin
A. Budanitsky
A. Doan
A. Formica
A. Neumann
B. Bouchou
C. Chitic
C. Werner
C.J. Rijsbergen van
C.Y. Chan
D.C. Reis
E. Bertino
E.T. Ray
F. Giunchiglia
G. Lee
G. Salton
G.M. Landau
H. Do
J. Lee
J. Tekli
J. Tekli
K. Zhang
K. Zhang
M. Murata
P. Resnik
P. Shvaiko
R. Luz Da
R. Rada
R. Schenkel
S. Amer-Yahia
S. Axelsson
S. Nishimura
S.M. Selkow
T. Akatsu
T. Dalamagas
T. Schlieder
W. Lian
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref

Bounded repairability for regular tree languages

Author: Bourhis P.
Puppis G.
Riveros C.
Staworko S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

We study the problem of bounded repairability of a given restriction tree language R into a target tree language T. More precisely, we say that R is bounded repairable w.r.t. T if there exists a bound on the number of standard tree editing operations necessary to apply to any tree in R in order to obtain a tree in T. We consider a number of possible specifications for tree languages: bottom-up tree automata (on curry encoding of unranked trees) that capture the class of XML Schemas and DTDs. We also consider a special case when the restriction language R is universal, i.e., contains all trees over a given alphabet. We give an effective characterization of bounded repairability between pairs of tree languages represented with automata. This characterization introduces two tools, synopsis trees and a coverage relation between them, allowing one to reason about tree languages that undergo a bounded number of editing operations. We then employ this characterization to provide upper bounds to the complexity of deciding bounded repairability and we show that these bounds are tight. In particular, when the input tree languages are specified with arbitrary bottom-up automata, the problem is coNEXPTIME-complete. The problem remains coNEXPTIME-complete even if we use deterministic non-recursive DTDs to specify the input languages. The complexity of the problem can be reduced if we assume that the alphabet, the set of node labels, is fixed: the problem becomes PSPACE-complete for non-recursive DTDs and coNP-complete for deterministic non-recursive DTDs. Finally, when the restriction tree language R is universal, we show that the bounded repairability problem becomes EXPTIME-complete if the target language is specified by an arbitrary bottom-up tree automaton and becomes tractable (PTIME-complete, in fact) when a deterministic bottom-up automaton is used

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

On Repairing Structural Issues in Semi-Structured Documents

Author: YING SHANSHAN
Publication venue
Publication date: 03/06/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Semantic technologies: from niche to the mainstream of Web 3? A comprehensive framework for web Information modelling and semantic annotation

Author: Dotsika F.
Dotsika F.
Publication venue
Publication date: 01/01/2012
Field of study

Context: Web information technologies developed and applied in the last decade have considerably changed the way web applications operate and have revolutionised information management and knowledge discovery. Social technologies, user-generated classification schemes and formal semantics have a far-reaching sphere of influence. They promote collective intelligence, support interoperability, enhance sustainability and instigate innovation. Contribution: The research carried out and consequent publications follow the various paradigms of semantic technologies, assess each approach, evaluate its efficiency, identify the challenges involved and propose a comprehensive framework for web information modelling and semantic annotation, which is the thesis’ original contribution to knowledge. The proposed framework assists web information modelling, facilitates semantic annotation and information retrieval, enables system interoperability and enhances information quality. Implications: Semantic technologies coupled with social media and end-user involvement can instigate innovative influence with wide organisational implications that can benefit a considerable range of industries. The scalable and sustainable business models of social computing and the collective intelligence of organisational social media can be resourcefully paired with internal research and knowledge from interoperable information repositories, back-end databases and legacy systems. Semantified information assets can free human resources so that they can be used to better serve business development, support innovation and increase productivity

WestminsterResearch

A teachable semi-automatic web information extraction system based on evolved regular expression patterns

Author: Nor Zainah Siau (7169549)
Publication venue
Publication date: 01/01/2014
Field of study

This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

Loughborough University Institutional Repository

Procedural Creation of Medical Reports with Hierarchical Information Processing in Radiation Oncology

Author: Fahrner Harald
Gainey Mark
Heinemann Felix Ernst
Kirrmann Stefan
Schmucker Marianne
Vogel Martin
Publication venue: University of Bern
Publication date: 18/03/2019
Field of study

Background: For many years, the oncological doctor's letter has been the pivotal means of information transfer to general practitioners, medical specialists or medical consultants. Yet, both creator and recipient require a high level of abstraction, retentiveness and analysis due to the large number of diagnoses and therapies. In contrast to the commonly used structure of doctor's letters, where all diagnoses and therapies are listed in sequential order with all diagnoses first, it is by no means trivial to establish the important chronological and hierarchical context in the description of oncological cases. Additional aspects of importance are the integration of these letters into existing clinical and departmental information systems (for example via HL7 interface), various export formats (for example PDF, HTML), fax and encrypted email. Moreover these letters need a modern layout that, among others, meets the requirements of corporate design. Methods: The requirements for a doctor's letter system are manifold and can only be represented rudimentarily via a normal word processing system. Due to this deficiency we developed a system that covers all special features and requirements for clinical use. The system is based on a scalable and extensible client-server architecture. We use the programming languages Harbour, C++, PHP and JavaScript, Microsoft SQL database for data storage and the HL7 standard as the interface to other information systems such as hospital information system (HIS). Export formats are PDF, HTML/XML. Layouts are generated with TeX, LaTeX and MikTeX. Results: The aforementioned requirements were resolved with the doctor's letter and finding system IntDok. The hierarchical presentation of diagnoses, histologies and therapies provides the recipient with a first outline of the course of the disease. A strict procedure controls the whole process of document compilation and assists the user with many highly regarded tools such as text blocks, import and export (PDF and HTML/XML including barcodes) functions or HL7 interface to other information systems. The software also provides a sophisticated mail merging. All content from previous letters can easily be inserted into the current document. A TeX-server automatically provides document layout including supreme hyphenation so that uniform and perfect appearance (corporate design) is guaranteed. The documents are saved in a MS-SQL database (almost 230,000 documents since 1991), independent of any proprietary formats such as MS-Word. Conclusion: Creation of documents is fast, simple and well-structured. Sophisticated tools guarantee the optimal use of human resources and time. The system is an important module in our overall digital work environment

Journal of Radiation Oncology Informatics

BOP Serials

Comparaison et évolution de schémas XML

Author: Amavi Joshua
Publication venue: HAL CCSD
Publication date: 28/11/2014
Field of study

XML has become the de facto format for data exchange. We aim at establishing a multi-system environment where some local original systems work in harmony with a global integrated system, which is a conservative evolution of local ones. Data exchange is possible in both directions, allowing activities on both levels. For this purpose, we need schema mapping whose is to ensure schema evolution, and to guide the construction of a document translator, allowing automatic data adaptation wrt type evolution. We propose a set of tools to help dealing with XML database evolution. These tools are used : (i) to compute a mapping capable of obtaining a global schema which is a conservative extension of original local schemas, and to adapt XML documents ; (ii) to compute the set of integrity constraints for the global system on the basis of the local ones ; (iii) to compare XML types of two systems in order to replace a system by another one ; (iv) to correct a new document with respect to an XML schema. Experimental results are discussed, showing the efficiency of our methods in many situations.XML est devenu le format standard d’échange de données. Nous souhaitons construire un environnement multi-système où des systèmes locaux travaillent en harmonie avec un système global, qui est une évolution conservatrice des systèmes locaux. Dans cet environnement, l’échange de données se fait dans les deux sens. Pour y parvenir nous avons besoin d’un mapping entre les schémas des systèmes. Le but du mapping est d’assurer l’évolution des schémas et de guider l’adaptation des documents entre les schémas concernés. Nous proposons des outils pour faciliter l’évolution de base de données XML. Ces outils permettent de : (i) calculer un mapping entre le schéma global et les schémas locaux, et d’adapter les documents ; (ii) calculer les contraintes d’intégrité du système global à partir de celles des systèmes locaux ; (iii) comparer les schémas de deux systèmes pour pouvoir remplacer un système par celui qui le contient ; (iv) corriger un nouveau document qui est invalide par rapport au schéma d’un système, afin de l’ajouter au système. Des expériences ont été menées sur des données synthétiques et réelles pour montrer l’efficacité de nos méthodes

Thèses en Ligne

HAL Descartes

Hal-Diderot

Recommended from our members

An integrated framework for developing generic modular reconfigurable platforms for micro manufacturing and its implementation

Author: Sun Xizhi
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2009
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The continuing trends of miniaturisation, mass customisation, globalisation and wide use of the Internet have great impacts upon manufacturing in the 21st century. Micro manufacturing will play an increasingly important role in bridging the gap between the traditional precision manufacturing and the emerging technologies like MEMS/NEMS. The key requirements for micro manufacturing in this context are hybrid manufacturing capability, modularity, reconfigurability, adaptability and energy/resource efficiency. The existing design approaches tend to have narrow scope and are largely limited to individual manufacturing processes and applications. The above requirements demand a fundamentally new approach to the future applications of micro manufacturing so as to obtain producibility, predictability and productivity covering the full process chains and value chains. A novel generic modular reconfigurable platform (GMRP) is proposed in such a context. The proposed GMRP is able to offer hybrid manufacturing capabilities, modularity, reconfigurablity and adaptivity as both an individual machine tool and a micro manufacturing system, and provides a cost effective solution to high value micro manufacturing in an agile, responsive and mass customisation manner. An integrated framework has been developed to assist the design of GMRPs due to their complexity. The framework incorporates theoretical GMRP model, design support system and extension interfaces. The GMRP model covers various relevant micro manufacturing processes and machine tool elements. The design support system includes a user-friendly interface, a design engine for design process and design evaluation, together with scalable design knowledge base and database. The functionalities of the framework can also be extended through the design support system interface, the GMRP interface and the application interface, i.e. linking to external hardware and/or software modules. The design support system provides a number of tools for the analysis and evaluation of the design solutions. The kinematic simulation of machine tools can be performed using the Virtual Reality toolbox in Matlab. A module has also been developed for the multiscale modelling, simulation and results analysis in Matlab. A number of different cutting parameters can be studied and the machining performance can be subsequently evaluated using this module. The mathematical models for a non-traditional micro manufacturing process, micro EDM, have been developed with the simulation performed using FEA. Various design theories and methodologies have been studied, and the axiomatic design theory has been selected because of its great power and simplicity. It has been applied in the conceptual design of GMRP and its design support system. The implementation of the design support system is carried out using Matlab, Java and XML technologies. The proposed GMRP and framework have been evaluated through case studies and experimental results

Brunel University Research Archive