6 research outputs found
The Landscape of Rights and Licensing Initiatives for Data Sharing
Over the last twenty years, a wide variety of resources have been developed to address the rights and licensing problems inherent with contemporary data sharing practices. The landscape of developments is this area is increasingly confusing and difficult to navigate, due to the complexity of intellectual property and ethics issues associated with sharing sensitive data. This paper seeks to address this challenge, examining the landscape and presenting a Version 1.0 directory of resources. A multi-method study was pursued, with an environmental scan examining 20 resources, resulting in three high-level categories: standards, tools, and community initiatives; and a content analysis revealing the subcategories of rights, licensing, metadata & ontologies. A timeline confirms a shift in licensing standardization priorities from open data to more nuanced and technologically robust solutions, over time, to accommodate for more sensitive data types. This paper reports on the research undertaking, and comments on the potential for using license-specific metadata supplements and developing data-centric rights and licensing ontologies
DARSI: An Ontology for Facilitating the Development of Data Sharing and Use Agreements
The advantages of data sharing across organizations and disciplines are indisputable; although, sensitive and restricted data cannot be easily shared due to policies and legal matters. The research presented in this paper takes a step toward systematizing the sharing of sensitive and restricted research data by developing an ontology to frame and guide DSUA (Data Sharing and Usage Agreement) development. The paper provides background context, describes the ontology creation process, and introduces the Data Sharing Agreements for Restricted and Sensitive Information (DARSI) ontology. DARSI contains four top level classes, 20 sub-classes, 33 sub-categories and 17 simple properties for categories applicable at various levels. The discussion provides further insight into the work accomplished, and the conclusion identifies next steps
Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries
Representing aboutness is a challenge for humanities documents, given the linguistic indeterminacy of the text. The challenge is even greater when applying automatic indexing to historical documents for a multidisciplinary collection, such as encyclopedias. The research presented in this paper explores this challenge with an automatic indexing comparative study examining topic relevance. The setting is the NEH-funded 19th-Century Knowledge Project, where researchers in the Digital Scholarship Center, Temple University, and the Metadata Research Center, Drexel University, are investigating the best way to index entries across four historical editions of the Encyclopedia Britannica (3rd, 7th, 9th, and 11th editions). Individual encyclopedia entry entries were processed using the Helping Interdisciplinary Vocabulary Engineering (HIVE) system, a linked-data, automatic indexing terminology application that uses controlled vocabularies. Comparative topic relevance evaluation was performed for three separate keyword extraction algorithms: RAKE, Maui, and Kea++. Results show that RAKE performed the best, with an average of 67% precision for RAKE, and 28% precision for both Maui and Kea++. Additionally, the highest-ranked HIVE results with both RAKE and Kea++ demonstrated relevance across all sample entries, while Maui’s highest-ranked results returned zero relevant terms. This paper reports on background information, research objectives and methods, results, and future research prospects for further optimization of RAKE’s algorithm parameters to accommodate for encyclopedia entries of different lengths, and evaluating the indexing impact of correcting the historical Long S
Evaluating the Impact of the Long-S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results
This research compares automatic subject metadata generation when the pre-1800s Long-S character is corrected to a standard < s >. The test environment includes entries from the third edition of the Encyclopedia Britannica, and the HIVE automatic subject indexing tool. A comparative study of metadata generated before and after correction of the Long-S demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the Long-S is not corrected. Results confirm that correcting the Long-S increases the availability of terms that can be used for creating quality metadata records. A relationship is also demonstrated between shorter entries and an increase in omitted terms when the Long-S is not corrected