932 research outputs found

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    The United States Marine Corps Data Collaboration Requirements: Retrieving and Integrating Data From Multiple Databases

    Get PDF
    The goal of this research is to develop an information sharing and database integration model and suggest a framework to fully satisfy the United States Marine Corps collaboration requirements as well as its information sharing and database integration needs. This research is exploratory; it focuses on only one initiative: the IT-21 initiative. The IT-21 initiative dictates The Technology for the United States Navy and Marine Corps, 2000-2035: Becoming a 21st Century Force. The IT-21 initiative states that Navy and Marine Corps information infrastructure will be based largely on commercial systems and services, and the Department of the Navy must ensure that these systems are seamlessly integrated and that information transported over the infrastructure is protected and secure. The Delphi Technique, a qualitative method approach, was used to develop a Holistic Model and to suggest a framework for information sharing and database integration. Data was primarily collected from mid-level to senior information officers, with a focus on Chief Information Officers. In addition, an extensive literature review was conducted to gain insight about known similarities and differences in Strategic Information Management, information sharing strategies, and database integration strategies. It is hoped that the Armed Forces and the Department of Defense will benefit from future development of the information sharing and database integration Holistic Model

    Digital Image Users and Reuse: Enhancing practitioner discoverability of digital library reuse based on user file naming behavior

    Get PDF
    Diese Dissertation untersucht Geräte, die Praktiker verwenden, um die Wiederverwendung von digitalen Bibliotheksmaterialien zu entdecken. Der Autor führt zwei Verifikationsstudien durch, in denen zwei zuvor angewandte Strategien untersucht werden, die Praktiker verwenden, um die Wiederverwendung digitaler Objekte zu identifizieren, insbesondere Google Images Reverse Image Lookup (RIL) und eingebettete Metadaten. Es beschreibt diese Strategiebeschränkungen und bietet einen neuen, einzigartigen Ansatz zur Verfolgung der Wiederverwendung, indem der Suchansatz des Autors basierend auf dem Benennungsverhalten von Benutzerdateien verwendet wird. Bei der Untersuchung des Nutzens und der Einschränkungen von Google Images und eingebetteten Metadaten beobachtet und dokumentiert der Autor ein Muster des Benennungsverhaltens von Benutzerdateien, das vielversprechend ist, die Wiederverwendung durch den Praktiker zu verbessern. Der Autor führt eine Untersuchung zur Bewertung der Dateibenennung durch, um dieses Muster des Verhaltens der Benutzerdateibenennung und die Auswirkungen der Dateibenennung auf die Suchmaschinenoptimierung zu untersuchen. Der Autor leitet mehrere signifikante Ergebnisse ab, während er diese Studie fertigstellt. Der Autor stellt fest, dass Google Bilder aufgrund der Änderung des Algorithmus kein brauchbares Werkzeug mehr ist, um die Wiederverwendung durch die breite Öffentlichkeit oder andere Benutzer zu entdecken, mit Ausnahme von Benutzern aus der Industrie. Eingebettete Metadaten sind aufgrund der nicht persistenten Natur eingebetteter Metadaten kein zuverlässiges Bewertungsinstrument. Der Autor stellt fest, dass viele Benutzer ihre eigenen Dateinamen generieren, die beim Speichern und Teilen von digitalen Bildern fast ausschließlich für Menschen lesbar sind. Der Autor argumentiert, dass, wenn Praktiker Suchbegriffe nach den "aggregierten Dateinamen" modellieren, sie ihre Entdeckung wiederverwendeter digitaler Objekte erhöhen.This dissertation explores devices practitioners utilize to discover the reuse of digital library materials. The author performs two verification studies investigating two previously employed strategies that practitioners use to identify digital object reuse, specifically Google Images reverse image lookup (RIL) and embedded metadata. It describes these strategy limitations and offers a new, unique approach for tracking reuse by employing the author's search approach based on user file naming behavior. While exploring the utility and limitations of Google Images and embedded metadata, the author observes and documents a pattern of user file naming behavior that exhibits promise for improving practitioner's discoverability of reuse. The author conducts a file naming assessment investigation to examine this pattern of user file naming behavior and the impact of file naming on search engine optimization. The author derives several significant findings while completing this study. The author establishes that Google Images is no longer a viable tool to discover reuse by the general public or other users except for industry users because of its algorithm change. Embedded metadata is not a reliable assessment tool because of the non-persistent nature of embedded metadata. The author finds that many users generate their own file names, almost exclusively human-readable when saving and sharing digital images. The author argues that when practitioners model search terms after the "aggregated file names" they increase their discovery of reused digital objects

    Rapid Parallelization by Collaboration

    Get PDF
    The widespread adoption of Chip Multiprocessors has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this document, we describe the research being conducted at the Computer Science Department at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers considering parallelizing their code. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the code the user is considering parallelizing and to the environment in which the optimized program will execute in. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization -- widely and on diverse hardware and software. By leveraging the "Wisdom of Crowds" model which has been conjunctured to scale exponentially and which has successfully worked for Wikis, COMPASS aims to enable rapid parallelization of code and thus continue to extend the benefits for Moore's law scaling to science and society

    Closing Information Gaps with Need-driven Knowledge Sharing

    Get PDF
    Informationslücken schließen durch bedarfsgetriebenen Wissensaustausch Systeme zum asynchronen Wissensaustausch – wie Intranets, Wikis oder Dateiserver – leiden häufig unter mangelnden Nutzerbeiträgen. Ein Hauptgrund dafür ist, dass Informationsanbieter von Informationsuchenden entkoppelt, und deshalb nur wenig über deren Informationsbedarf gewahr sind. Zentrale Fragen des Wissensmanagements sind daher, welches Wissen besonders wertvoll ist und mit welchen Mitteln Wissensträger dazu motiviert werden können, es zu teilen. Diese Arbeit entwirft dazu den Ansatz des bedarfsgetriebenen Wissensaustauschs (NKS), der aus drei Elementen besteht. Zunächst werden dabei Indikatoren für den Informationsbedarf erhoben – insbesondere Suchanfragen – über deren Aggregation eine fortlaufende Prognose des organisationalen Informationsbedarfs (OIN) abgeleitet wird. Durch den Abgleich mit vorhandenen Informationen in persönlichen und geteilten Informationsräumen werden daraus organisationale Informationslücken (OIG) ermittelt, die auf fehlende Informationen hindeuten. Diese Lücken werden mit Hilfe so genannter Mediationsdienste und Mediationsräume transparent gemacht. Diese helfen Aufmerksamkeit für organisationale Informationsbedürfnisse zu schaffen und den Wissensaustausch zu steuern. Die konkrete Umsetzung von NKS wird durch drei unterschiedliche Anwendungen illustriert, die allesamt auf bewährten Wissensmanagementsystemen aufbauen. Bei der Inversen Suche handelt es sich um ein Werkzeug das Wissensträgern vorschlägt Dokumente aus ihrem persönlichen Informationsraum zu teilen, um damit organisationale Informationslücken zu schließen. Woogle erweitert herkömmliche Wiki-Systeme um Steuerungsinstrumente zur Erkennung und Priorisierung fehlender Informationen, so dass die Weiterentwicklung der Wiki-Inhalte nachfrageorientiert gestaltet werden kann. Auf ähnliche Weise steuert Semantic Need, eine Erweiterung für Semantic MediaWiki, die Erfassung von strukturierten, semantischen Daten basierend auf Informationsbedarf der in Form strukturierter Anfragen vorliegt. Die Umsetzung und Evaluation der drei Werkzeuge zeigt, dass bedarfsgetriebener Wissensaustausch technisch realisierbar ist und eine wichtige Ergänzung für das Wissensmanagement sein kann. Darüber hinaus bietet das Konzept der Mediationsdienste und Mediationsräume einen Rahmen für die Analyse und Gestaltung von Werkzeugen gemäß der NKS-Prinzipien. Schließlich liefert der hier vorstellte Ansatz auch Impulse für die Weiterentwicklung von Internetdiensten und -Infrastrukturen wie der Wikipedia oder dem Semantic Web

    A Learning Health System for Radiation Oncology

    Get PDF
    The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes. The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure. Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping. To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented. The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes. Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine

    Doctor of Philosophy

    Get PDF
    dissertationServing as a record of what happened during a scientific process, often computational, provenance has become an important piece of computing. The importance of archiving not only data and results but also the lineage of these entities has led to a variety of systems that capture provenance as well as models and schemas for this information. Despite significant work focused on obtaining and modeling provenance, there has been little work on managing and using this information. Using the provenance from past work, it is possible to mine common computational structure or determine differences between executions. Such information can be used to suggest possible completions for partial workflows, summarize a set of approaches, or extend past work in new directions. These applications require infrastructure to support efficient queries and accessible reuse. In order to support knowledge discovery and reuse from provenance information, the management of those data is important. One component of provenance is the specification of the computations; workflows provide structured abstractions of code and are commonly used for complex tasks. Using change-based provenance, it is possible to store large numbers of similar workflows compactly. This storage also allows efficient computation of differences between specifications. However, querying for specific structure across a large collection of workflows is difficult because comparing graphs depends on computing subgraph isomorphism which is NP-Complete. Graph indexing methods identify features that help distinguish graphs of a collection to filter results for a subgraph containment query and reduce the number of subgraph isomorphism computations. For provenance, this work extends these methods to work for more exploratory queries and collections with significant overlap. However, comparing workflow or provenance graphs may not require exact equality; a match between two graphs may allow paired nodes to be similar yet not equivalent. This work presents techniques to better correlate graphs to help summarize collections. Using this infrastructure, provenance can be reused so that users can learn from their own and others' history. Just as textual search has been augmented with suggested completions based on past or common queries, provenance can be used to suggest how computations can be completed or which steps might connect to a given subworkflow. In addition, provenance can help further science by accelerating publication and reuse. By incorporating provenance into publications, authors can more easily integrate their results, and readers can more easily verify and repeat results. However, reusing past computations requires maintaining stronger associations with any input data and underlying code as well as providing paths for migrating old work to new hardware or algorithms. This work presents a framework for maintaining data and code as well as supporting upgrades for workflow computations

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok
    • …
    corecore