Search CORE

8,872 research outputs found

Automatic Metadata Generation using Associative Networks

Author: de Lin S.
Han H.
Herbert Van De Sompel
Johan Bollen
Mao S.
Marko A. Rodriguez
Rorvig M.
Yang H.-C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2009
Field of study

In spite of its tremendous value, metadata is generally sparse and incomplete, thereby hampering the effectiveness of digital information services. Many of the existing mechanisms for the automated creation of metadata rely primarily on content analysis which can be costly and inefficient. The automatic metadata generation system proposed in this article leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources. Because of its independence from content analysis, it can be applied to a wide variety of resource media types and is shown to be computationally inexpensive. The proposed method operates through two distinct phases. Occurrence and co-occurrence algorithms first generate an associative network of repository resources leveraging existing repository metadata. Second, using the associative network as a substrate, metadata associated with metadata-rich resources is propagated to metadata-poor resources by means of a discrete-form spreading activation algorithm. This article discusses the general framework for building associative networks, an algorithm for disseminating metadata through such networks, and the results of an experiment and validation of the proposed method using a standard bibliographic dataset

arXiv.org e-Print Archive

Crossref

An overview of the planned CCAT software system

Author: Brazier Adam
Edwards Kevin
Higgins Ronan
Jenness Tim
Marsden Gaelen
Nikola Thomas
Ossenkopf Volker
Sayers Jack
Schaaf Reinhold
Shepherd Martin C.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 05/06/2014
Field of study

CCAT will be a 25m diameter sub-millimeter telescope capable of operating in the 0.2 to 2.1mm wavelength range. It will be located at an altitude of 5600m on Cerro Chajnantor in northern Chile near the ALMA site. The anticipated first generation instruments include large format (60,000 pixel) kinetic inductance detector (KID) cameras, a large format heterodyne array and a direct detection multi-object spectrometer. The paper describes the architecture of the CCAT software and the development strategy.Comment: 17 pages, 6 figures, to appear in Software and Cyberinfrastructure for Astronomy III, Chiozzi & Radziwill (eds), Proc. SPIE 9152, paper ID 9152-10

arXiv.org e-Print Archive

Caltech Authors

Data DNA: The Next Generation of Statistical Metadata

Author: Cynthia M. Taeuber
Daniel W. Gillman
Laura Smith
Publication venue: 'Brookings Institution Press'
Publication date: 03/03/2007
Field of study

Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users

IssueLab

Extending a geo-catalogue with matching capabilities

Author: Dutta Biswanath
Farazi Feroz
Ivanyukovich Alexander
Maltese Vincenzo
Publication venue
Publication date: 01/05/2011
Field of study

To achieve semantic interoperability, geo-spatial applications need to be equipped with tools able to understand user terminology that is typically different from the one enforced by standards. In this paper we summarize our experience in providing a semantic extension to the geo-catalogue of the Autonomous Province of Trento (PAT) in Italy. The semantic extension is based on the adoption of the S-Match semantic matching tool and on the use of a specifically designed faceted ontology codifying domain specific knowledge. We also briefly report our experience in the integration of the ontology with the geo-spatial ontology GeoWordNet

Unitn-eprints Research

CC-interop : COPAC/Clumps Continuing Technical Cooperation. Final Project Report

Author: Dunsire Gordon
Gilby John
JISC (Joint Information Systems Committee) (Funder)
Publication venue: JISC
Publication date: 01/01/2004
Field of study

As far as is known, CC-interop was the first project of its kind anywhere in the world and still is. Its basic aim was to test the feasibility of cross-searching between physical and virtual union catalogues, using COPAC and the three functioning "clumps" or virtual union catalogues (CAIRNS, InforM25, and RIDING), all funded or part-funded by JISC in recent years. The key issues investigated were technical interoperability of catalogues, use of collection level descriptions to search union catalogues dynamically, quality of standards in cataloguing and indexing practices, and usability of union catalogues for real users. The conclusions of the project were expected to, and indeed do, contribute to the development of the JISC Information Environment and to the ongoing debate as to the feasibility and desirability of creating a national UK catalogue. They also inhabit the territory of collection level descriptions (CLDs) and the wider services of JISC's Information Environment Services Registry (IESR). The results of this project will also have applicability for the common information environment, particularly through the landscaping work done via SCONE/CAIRNS. This work is relevant not just to HE and not just to digital materials, but encompasses other sectors and domains and caters for print resources as well. Key findings are thematically grouped as follows: System performance when inter-linking COPAC and the Z39.50 clumps. The various individual Z39.50 configurations permit technical interoperability relatively easily but only limited semantic interoperability is possible. Disparate cataloguing and indexing practices are an impairment to semantic interoperability, not just for catalogues but also for CLDs and descriptions of services (like those constituting JISC's IESR). Creating dynamic landscaping through CLDs: routines can be written to allow collection description databases to be output in formats that other UK users of CLDs, including developers of the JISC information environment. Searching a distributed (virtual) catalogue or clump via Z39.50: use of Z39.50 to Z39.50 middleware permits a distributed catalogue to be searched via Z39.50 from such disparate user services as another virtual union catalogue or clump, a physical union catalogue like COPAC, an individual Z client and other IE services. The breakthrough in this Z39.50 to Z39.50 conundrum came with the discovery that the JISC-funded JAFER software (a result of the 5/99 programme) meets many of the requirements and can be used by the current clumps services. It is technically possible for the user to select all or a sub-set of available end destination Z39.50 servers (we call this "landscaping") within this middleware. Comparing results processing between COPAC and clumps. Most distributed services (clumps) do not bring back complete results sets from associated Z servers (in order to save time for users). COPAC on-the-fly routines could feasibly be applied to the clumps services. An automated search set up to repeat its query of 17 catalogues in a clump (InforM25) hourly over nearly 3 months returned surprisingly good results; for example, over 90% of responses were received in less than one second, and no servers showed slower response times in periods of traditionally heavy OPAC use (mid-morning to early evening). User behaviour when cross-searching catalogues: the importance to users of a number of on-screen features, including the ability to refine a search and clear indication that a search is processing. The importance to users of information about the availability of an item as well as the holdings data. The impact of search tools such as Google and Amazon on user behaviour and the expectations of more information than is normally available from a library catalogue. The distrust of some librarians interviewed of the data sources in virtual union catalogues, thinking that there was not true interoperability

University of Strathclyde Institutional Repository

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

Ontologies on the semantic web

Author: Ashburner
Berners-Lee
Berners-Lee
Bollobas
Borgida
Brachman
Brachman
Brooks
Buchanan
Burton-Jones
Bush
Cayzer
Chisholm
Copeland
Cost
Cruse
De Bruijn
Decker
Fensel
Fensel
Frege
Genesereth
Goble
Gruber
Gruber
Guha
Harré
Heery
Heflin
Hendler
Hendler
Horrocks
Horrocks
Kant
Kirk
Klein
Legg
Lenat
Lenat
Lenat
Lenat
Lindsay
Lowe
Lowe
Maedche
McCool
McGuinness
McIlraith
Minsky
Noy
Noy
Pease
Peirce
Peirce
Quillian
Quine
Rorty
Rozenberg
Schlick
Sicilia
Smith
Smith
Smith
Sowa
Sowa
Sowa
Weinberger
Weiss
Zalta
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

As an informational technology, the World Wide Web has enjoyed spectacular success. In just ten years it has transformed the way information is produced, stored, and shared in arenas as diverse as shopping, family photo albums, and high-level academic research. The “Semantic Web” was touted by its developers as equally revolutionary but has not yet achieved anything like the Web’s exponential uptake. This 17 000 word survey article explores why this might be so, from a perspective that bridges both philosophy and IT

Deakin Research Online

Crossref

Research Commons@Waikato