Search CORE

20,082 research outputs found

Tree model guided candidate generation for mining frequent subtrees from XML

Author: Abe K.
Agrawal R.
Chi Y.
Elizabeth Chang
Fedja Hadzic
Feng L.
Ghoting A.
Henry Tan
Ling Feng
Nijssen S.
Sidhu A. S.
Suciu D.
Tan H.
Tan H.
Tan H.
Termier A.
Tharam S. Dillon
Wang C.
Xiao Y.
Yan X.
Yang L. H.
Zhang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques

Crossref

espace@Curtin

An Effective XML Keyword Search with User Search Intention over XML Documents

Author: Pradeep Kumar Reddy Gade Mr.
Publication venue: Global Journals Inc. (US)
Publication date: 30/08/2011
Field of study

The extreme success of web search engines makes keyword search the most popular search model for ordinary users. Keyword search on XML is a user friendly way to query XML databases since it allows users to pose queries without the knowledge of complex query languages and the database schema. The three main challenges faces in XML keyword search: 1) Identify the user search intention, i.e., identify the XML node types that users want to search for and search via. 2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings; a keyword can appear as the tag name of different XML node types with different meanings. 3) As the search results are sub trees of the XML documents, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results over XML documents. Then, based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions over XML documents

Global Journal of Computer Science and Technology (GJCST)

Description and interaction of RESTful services for automatic discovery and execution

Author: De Roo Jos
Steiner Thomas
Vallés Joaquim Gabarro
Van de Walle Rik
Van Deursen Davy
Verborgh Ruben
Publication venue: Future Technology Research Association International (FTRA)
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

The Topology ToolKit

Author: Favelier Guillaume
Gueunet Charles
Levine Joshua A.
Michaux Michael
Tierny Julien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This system paper presents the Topology ToolKit (TTK), a software platform designed for topological data analysis in scientific visualization. TTK provides a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependence-free, C++, to ease integration into pre-existing complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewise-linear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multi-scale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which self-adjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK's website

arXiv.org e-Print Archive

The University of Arizona

Towards structured sharing of raw and derived neuroimaging data across existing resources

Author: Ashish N.
Burns G. A.
Gadde S.
Ghosh S. S.
Helmer K.
Keator D. B.
Nichols B. N.
Steffener J.
Turner J. A.
Van Erp T. G. M.
Publication venue
Publication date: 06/03/2013
Field of study

Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California

Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing

Author: Carroll John
Evans Roger
Klein Ewan
Publication venue
Publication date: 01/01/2005
Field of study

Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers

CiteSeerX

University of Brighton Research Portal

Sussex Research Online

Information extraction from multimedia web documents: an open-source platform and testbed

Author: Blanco Roi
Boato Giulia
Costanzo Andrea
Demidova Elena
Dupplaw David
Fontani Marco
Griffiths Thomas
Hare Jonathon
Johansson Richard
Lewis Paul H.
Matthews Michael
Minack Enrico
Moschitti Alessandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2014
Field of study

The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval

Southampton (e-Prints Soton)

Linked education: interlinking educational resources and the web of data

Author: Dietze Stefan
Dovrolis Nikolas
Giordano Daniela
Kaldoudi Eleni
Taibi Davide
Yu Hong Qing
Publication venue
Publication date: 01/01/2012
Field of study

Research on interoperability of technology-enhanced learning (TEL) repositories throughout the last decade has led to a fragmented landscape of competing approaches, such as metadata schemas and interface mechanisms. However, so far Web-scale integration of resources is not facilitated, mainly due to the lack of take-up of shared principles, datasets and schemas. On the other hand, the Linked Data approach has emerged as the de-facto standard for sharing data on the Web and offers a large potential to solve interoperability issues in the field of TEL. In this paper, we describe a general approach to exploit the wealth of already existing TEL data on the Web by allowing its exposure as Linked Data and by taking into account automated enrichment and interlinking techniques to provide rich and well-interlinked data for the educational domain. This approach has been implemented in the context of the mEducator project where data from a number of open TEL data repositories has been integrated, exposed and enriched by following Linked Data principles

CiteSeerX

Open Research Online (The Open University)

Investigation into Indexing XML Data Techniques

Author: Joan Lu
Klaib Alhadi
Publication venue
Publication date: 21/07/2014
Field of study

The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues. Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow

University of Huddersfield Repository