16,024 research outputs found

    A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

    Get PDF
    Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

    Integration of BPM systems

    Get PDF
    New technologies have emerged to support the global economy where for instance suppliers, manufactures and retailers are working together in order to minimise the cost and maximise efficiency. One of the technologies that has become a buzz word for many businesses is business process management or BPM. A business process comprises activities and tasks, the resources required to perform each task, and the business rules linking these activities and tasks. The tasks may be performed by human and/or machine actors. Workflow provides a way of describing the order of execution and the dependent relationships between the constituting activities of short or long running processes. Workflow allows businesses to capture not only the information but also the processes that transform the information - the process asset (Koulopoulos, T. M., 1995). Applications which involve automated, human-centric and collaborative processes across organisations are inherently different from one organisation to another. Even within the same organisation but over time, applications are adapted as ongoing change to the business processes is seen as the norm in today’s dynamic business environment. The major difference lies in the specifics of business processes which are changing rapidly in order to match the way in which businesses operate. In this chapter we introduce and discuss Business Process Management (BPM) with a focus on the integration of heterogeneous BPM systems across multiple organisations. We identify the problems and the main challenges not only with regards to technologies but also in the social and cultural context. We also discuss the issues that have arisen in our bid to find the solutions

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

    XML in Motion from Genome to Drug

    Get PDF
    Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted

    Data integration through service-based mediation for web-enabled information systems

    Get PDF
    The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process

    Forum Session at the First International Conference on Service Oriented Computing (ICSOC03)

    Get PDF
    The First International Conference on Service Oriented Computing (ICSOC) was held in Trento, December 15-18, 2003. The focus of the conference ---Service Oriented Computing (SOC)--- is the new emerging paradigm for distributed computing and e-business processing that has evolved from object-oriented and component computing to enable building agile networks of collaborating business applications distributed within and across organizational boundaries. Of the 181 papers submitted to the ICSOC conference, 10 were selected for the forum session which took place on December the 16th, 2003. The papers were chosen based on their technical quality, originality, relevance to SOC and for their nature of being best suited for a poster presentation or a demonstration. This technical report contains the 10 papers presented during the forum session at the ICSOC conference. In particular, the last two papers in the report ere submitted as industrial papers

    DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks

    Get PDF
    We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives of INEX ad-hoc topics is shown to improve retrieval eectiveness
    corecore