100 research outputs found
Report of the DAMENames Ad Hoc Committee
In early 2018, the DAMEid group requested that Cataloging and Metadata unit examine the metadata needs for the DAME. When analyzing metadata needs in both OAKTrust and Fedora, it became clear that the lack of name authority control was causing serious problems for users, especially in the case of a single author having many entries in the author index. For example, Steven M. Wright, Royce E. Wisenbaker, Professor II in Chemical and Electrical Engineering, has 10 different entries for his name. This problem is caused by the lack of authority control and the inconsistent ways in which names are inputted into Vireo and OAKTrust. In their report to the DAMEid committee, the Metadata and Cataloging librarians strongly suggested that some type of name authority control be implemented within the DAME.
In smaller repositories with few names and fewer entities (e.g., persons, organizations, subjects, etc.), the absence of explicit disambiguation or authority control can be a manageable problem. When only a few authors share a name, it is easy to tell them apart based on the subject matter of the works attached to the name. The problem compounds as collections grow larger and the number of entities with the same name that need to be distinguished from each other increases. For example, in the large OAKTrust IR, it is hard for a user to identify the "Steven Wright" that he or she is looking for, as there are several authors so named with dozens of items in the IR. Another issue that emerges in a system with no authority control – such as OAKTrust – is that an everyday typographical error (an extra space, no period after an initial, misspellings, etc.) results in a new entry in the author list. This results in multiple names for one person and it means that there is no way for a user to easily identify all the works attributed to one author
Report of the DAMENames Ad Hoc Committee
In early 2018, the DAMEid group requested that Cataloging and Metadata unit examine the metadata needs for the DAME. When analyzing metadata needs in both OAKTrust and Fedora, it became clear that the lack of name authority control was causing serious problems for users, especially in the case of a single author having many entries in the author index. For example, Steven M. Wright, Royce E. Wisenbaker, Professor II in Chemical and Electrical Engineering, has 10 different entries for his name. This problem is caused by the lack of authority control and the inconsistent ways in which names are inputted into Vireo and OAKTrust. In their report to the DAMEid committee, the Metadata and Cataloging librarians strongly suggested that some type of name authority control be implemented within the DAME.
In smaller repositories with few names and fewer entities (e.g., persons, organizations, subjects, etc.), the absence of explicit disambiguation or authority control can be a manageable problem. When only a few authors share a name, it is easy to tell them apart based on the subject matter of the works attached to the name. The problem compounds as collections grow larger and the number of entities with the same name that need to be distinguished from each other increases. For example, in the large OAKTrust IR, it is hard for a user to identify the "Steven Wright" that he or she is looking for, as there are several authors so named with dozens of items in the IR. Another issue that emerges in a system with no authority control – such as OAKTrust – is that an everyday typographical error (an extra space, no period after an initial, misspellings, etc.) results in a new entry in the author list. This results in multiple names for one person and it means that there is no way for a user to easily identify all the works attributed to one author
Improving the visibility of the institution, researchers, and publications by introducing specific identifiers (PIDs)
In the course of 2021, the Belgian Health Care Knowledge Centre (KCE) has decided to give further thought to improving the visibility of these publications. This has led it to develop a project to set up three types of PIDs - one for the institution, another for researchers and the third for the publications themselves. The purpose of this text is to retrace the various stages in the project's implementation and to share the initial findings
D7.3 Training materials
This Deliverable gives a detailed description of the comprehensive training programme and of the open educational content that the University of Padua has accomplished up to now for the project "Linked Heritage: Coordination of standard and technologies for the enrichment of Europeana" (CIP Best Practice Network). The final version of D7.3 will be released by the end of the project, when all the Learning Objects will be finished
Recommended from our members
Building data into knowledge: Identifying challenges and their solutions in biodiversity informatics
Biologists are in a race to document biodiversity in the face of ailing ecosystems and species decline. The drive to create knowledge to support effective documentation, measurement, and conservation of biodiversity has led the community to quickly research and develop methods to organize and connect biodiversity data across providers and throughout the world. Biodiversity data came online through distributed and disconnected databases but through time has been shaped into a biodiversity network that now represents nearly 500 million biodiversity records. The ability to access these data has brought exciting new research and new challenges. In this thesis I discuss my work to solve some of those challenges and build innovative approaches and tools for biodiversity informatics. I start by documenting tools that help improve the quality and fitness for use of data. Then I present two tools for visualizing and analyzing data in a phylogenetic and conservation context. More importantly, I discuss how designing these tools to operate within a greater knowledge creation framework can make the work of documenting patterns and processes in biodiversity faster and more resilient to future changes and improved information. At the heart of that discussion is the idea that the outputs of the tools themselves should be published and directly linked back to the original data and forward to any future analyses. The outputs should also document all models, parameters, and heuristics used do arrive at the reported outcome. In this way, both the data and our research of that data can be woven into a connected fabric of knowledge and information that links biodiversity and the digital data stored in our databases. Finally, I discuss the possibility we have for expanding our biodiversity data and improving the research we can do with it through the use of citizen science. The data available today is still deficient. Natural history collections hold a wealth of data that has not yet been digitized, but as a community we lack the resources to unlock that data quickly without a novel solution. Citizen science offers us the ability to quickly generate historical biodiversity data from natural history collections. We present a novel platform for engaging citizen scientists and developing a shared, community driven, platform to harness the potential of citizen science
Interval Privacy: A Framework for Privacy-Preserving Data Collection
The emerging public awareness and government regulations of data privacy
motivate new paradigms of collecting and analyzing data transparent and
acceptable to data owners. We present a new concept of privacy and
corresponding data formats, mechanisms, and theories for privatizing data
during data collection. The privacy, named Interval Privacy, enforces the raw
data conditional distribution on the privatized data to be the same as its
unconditional distribution over a nontrivial support set. Correspondingly, the
proposed privacy mechanism will record each data value as a random interval
(or, more generally, a range) containing it. The proposed interval privacy
mechanisms can be easily deployed through survey-based data collection
interfaces, e.g., by asking a respondent whether its data value is within a
randomly generated range. Another unique feature of interval mechanisms is that
they obfuscate the truth but not perturb it. Using narrowed range to convey
information is complementary to the popular paradigm of perturbing data. Also,
the interval mechanisms can generate progressively refined information at the
discretion of individuals, naturally leading to privacy-adaptive data
collection. We develop different aspects of theory such as composition,
robustness, distribution estimation, and regression learning from
interval-valued data. Interval privacy provides a new perspective of
human-centric data privacy where individuals have a perceptible, transparent,
and simple way of sharing sensitive data
Persistent Identifiers and Sharing of Digital Information About Scientific Specimens
Using persistent identifiers (PIDs) in digital data production and sharing concerning scientific specimens promotes an overarching goal, to allow for creation of relationships. The assignment of unique PIDs is an essential step for enabling findability and accessibility of digital data using the FAIR data model. Implementation of the digital extended specimen links the digital object record with associated and derived specimen parts and research data. Linking to atomized information such as collection event, collector, locality, collection, institutions, taxon (identification), people involved in analyzing and processing the specimen, other related specimens, and many other subsamples and derived and related data can be accomplished with a system incorporating numerous types of unique persistent ids. These many IDs need to be maintained by organizations to prevent broken links and provide redirects for older identifiers. While community development of best practice is influenced by experts in digital data
architecture, it must incorporate challenges based on the history of data sharing concerning scientific specimens. The development of identifier systems and normalization around digital object structure and vocabulary needs to accommodate the needs of managers of diverse collections. Most providers are working with a collection management system and with
limitations based on past decisions and limited time and finances, so data sharing practices should address these issues to encourage compliance. This paper will use a combination of reviews of the literature and of several interviews with workers in the field to explore community collaboration, persistent ids, and increased mobilization of shared data
- …