22 research outputs found
A Review of Trusted Broker Architectures for Data Sharing
Sharing data across organizational boundaries must strike a balance between the competing data quality dimensions of access and security. Without access to data, it can't be used and, consequently, is of no value. At the same time, uncontrolled access to data, especially sensitive personal data, can result in dire legal and ethical consequences. This paper discusses the trade-offs between security and access for three styles of trusted broker architectures in hopes that this will provide guidance for organizations trying to implement data sharing systems.Naval Postgraduate School Acquisition Research Progra
CoDoSA: A Lightweight, XML-Based Framework for Integrating Unstructured Textual Information
One of the most fundamental dimensions of information quality is access. For many organizations, a large part of their information assets is locked away in Unstructured Textual Information (UTI) in the form of email, letters, contracts, call notes, and spreadsheet. In addition to internal UTI, there is also a wealth of publicly available UTI on websites, in newspapers, courthouse records and other sources that can add value when combined with internally managed information. This paper describes a system called Compressed Document Set Architecture (CoDoSA) designed to facilitate the integration of UTI into a structured database environment where it can be more readily accessed and manipulated. The CoDoSA Framework comprises an XML-based metadata standard and an associated Application Program Interface (API). It further describes how CoDoSA can facilitate the storage and management of information during the ETL (Extract, Transform, and Load) process to integrate unstructured UTI information. It also explains how CoDoSA promotes higher information quality by providing several features that simplify the governance of metadata standards and enforcement of data quality constraints across different UTI applications and development teams. In addition, CoDoSA provides a mechanism for inserting semantic tags into captured UTI, tags that can be used in later steps to drive semantic-mediated queries and processes
Critical Cultural Success Factors for Achieving High Quality Information in an Organization
While information and data quality practitioners are in general agreement that social, cultural, and organizational factors are the most important in determining the success or failure of an organization’s data quality programs, there is little to no existing research quantifying these factors. In this research we build from both our previous research and others’ to distill and clarify those cultural factors which are the Critical Cultural Success Factors (CCSFs) for successful Information and Data Quality programs in an organization. Using the Delphi method for gaining consensus from a group of experts, we distilled fourteen factors down to six and clarified the definitions of those six factors. We begin explaining how these CCSFs fit into Organizational Learning Theory and plan to ultimately define a new system dynamics model incorporating them so that organizations and information quality practitioners can positively affect the success of information and data quality programs
Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings
Keyword extraction is a fundamental task in natural language processing that
facilitates mapping of documents to a concise set of representative single and
multi-word phrases. Keywords from text documents are primarily extracted using
supervised and unsupervised approaches. In this paper, we present an
unsupervised technique that uses a combination of theme-weighted personalized
PageRank algorithm and neural phrase embeddings for extracting and ranking
keywords. We also introduce an efficient way of processing text documents and
training phrase embeddings using existing techniques. We share an evaluation
dataset derived from an existing dataset that is used for choosing the
underlying embedding model. The evaluations for ranked keyword extraction are
performed on two benchmark datasets comprising of short abstracts (Inspec), and
long scientific papers (SemEval 2010), and is shown to produce results better
than the state-of-the-art systems.Comment: preprint for paper accepted in Proceedings of 1st IEEE International
Conference on Multimedia Information Processing and Retrieva
Visual recognition of gestures in a meeting to detect when documents being talked about are missing
Meetings frequently involve discussion of documents and can be significantly affected if a document is absent. An agent system capable of spontaneously retrieving a document at the point it is needed would have to judge whether a meeting is talking about a particular document and whether that document is already present. We report the exploratory application of agent techniques for making these two judgements. To obtain examples from which an agent system can learn, we first conducted a study of participants making these judgements with video recordings of meetings. We then show that interactions between hands and paper documents in meetings can be used to recognise when a document being talked about is not to hand. The work demonstrates the potential for multimodal agent systems using these techniques to learn to perform specific, discourse-level tasks during meetings