7,388 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    Towards automatic context-aware summarization of code entities

    Get PDF
    Software developers are working with different methods and classes and in order to understand those that perplex them and–or that are part of their tasks, they need to tackle with a huge amount of information. Therefore, providing developers with high-quality summaries of code entities can help them during their maintenance and evolution tasks. To provide useful information about the purpose of code entities, informal documentation (Stack Overflow) has been shown to be an important source of information that can be leveraged. In this study, we investigate bug reports as a type of informal documentation and we apply machine learning to produce summaries of code entities (methods and classes) in bug reports. In the proposed approach, code entities are extracted using a technique in a form of an island parser that we implemented to identify code in bug reports. Additionally, we applied machine learning to select a set of useful sentences that will be part of the code entities’ summaries. We have used logistic regression as our machine learning technique to rank sentences based on their importance. To this aim, a corpus of sentences is built based on the occurrence of code entities in the sentences belonging to bug reports containing the code entities in question. In the last step, summaries have been evaluated using surveys to estimate the quality of produced summaries. The results show that the automatically produced summaries can reduce time and effort to understand the usage of code entities. Specifically, the majority of participants found summaries extremely helpful to decrease the understanding time (43.5%) and the effort to understand the code entities (39.1%). In the future, summaries can be produced by using other informal documentation such as mailing lists or stack overflow, etc. Additionally, the approach can be applied in practical settings. Consequently, it can be used within an IDE such as Eclipse to assist developers during their software maintenance and evolution tasks

    Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding

    Get PDF
    Privacy policies are verbose, difficult to understand, take too long to read, and may be the least-read items on most websites even as users express growing concerns about information collection practices. For all their faults, though, privacy policies remain the single most important source of information for users to attempt to learn how companies collect, use, and share data. Likewise, these policies form the basis for the self-regulatory notice and choice framework that is designed and promoted as a replacement for regulation. The underlying value and legitimacy of notice and choice depends, however, on the ability of users to understand privacy policies. This paper investigates the differences in interpretation among expert, knowledgeable, and typical users and explores whether those groups can understand the practices described in privacy policies at a level sufficient to support rational decision-making. The paper seeks to fill an important gap in the understanding of privacy policies through primary research on user interpretation and to inform the development of technologies combining natural language processing, machine learning and crowdsourcing for policy interpretation and summarization. For this research, we recruited a group of law and public policy graduate students at Fordham University, Carnegie Mellon University, and the University of Pittsburgh (“knowledgeable users”) and presented these law and policy researchers with a set of privacy policies from companies in the e-commerce and news & entertainment industries. We asked them nine basic questions about the policies’ statements regarding data collection, data use, and retention. We then presented the same set of policies to a group of privacy experts and to a group of non-expert users. The findings show areas of common understanding across all groups for certain data collection and deletion practices, but also demonstrate very important discrepancies in the interpretation of privacy policy language, particularly with respect to data sharing. The discordant interpretations arose both within groups and between the experts and the two other groups. The presence of these significant discrepancies has critical implications. First, the common understandings of some attributes of described data practices mean that semi-automated extraction of meaning from website privacy policies may be able to assist typical users and improve the effectiveness of notice by conveying the true meaning to users. However, the disagreements among experts and disagreement between experts and the other groups reflect that ambiguous wording in typical privacy policies undermines the ability of privacy policies to effectively convey notice of data practices to the general public. The results of this research will, consequently, have significant policy implications for the construction of the notice and choice framework and for the US reliance on this approach. The gap in interpretation indicates that privacy policies may be misleading the general public and that those policies could be considered legally unfair and deceptive. And, where websites are not effectively conveying privacy policies to consumers in a way that a “reasonable person” could, in fact, understand the policies, “notice and choice” fails as a framework. Such a failure has broad international implications since websites extend their reach beyond the United States

    Opening Books and the National Corpus of Graduate Research

    Get PDF
    Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. Much important knowledge and scientific data lie hidden in ETDs, and we need better tools to mine the content and facilitate the identification, discovery, and reuse of these important components. (3) A wide range of audiences can potentially benefit from this research, including but not limited to Librarians, Students, Authors, Educators, Researchers, and other interested readers. We will answer the following key research questions: (1) How can we effectively identify and extract key parts (chapters, sections, tables, figures, citations), in both born digital and page image formats? (2) How can we develop effective automatic classication as well as chapter summarization techniques? (3) How can our ETD digital library most effectively serve stakeholders? In response to these questions, we plan to first compile an ETD corpus consisting of at least 50,000 documents from multiple institutional repositories. We will make the corpus inclusive and diverse, covering a range of degrees (master’s and doctoral), years, graduate programs (STEM and non-STEM), and authors (from HBCUs and non-HBCUs). Testing first with this sample, we will investigate three major research areas (RAs), outlined below. RA 1: Document analysis and extraction, in which we experiment with machine/deep learning models for effective ETD segmentation and subsequent information extraction. Anticipated results of this research include new software tools that can be used and adapted by libraries for automatic extraction of structural metadata and document components (chapters, sections, figures, tables, citations, bibliographies) from ETDs - applied to both page image and born digital documents. RA 2: Adding value, in which we investigate techniques and build machine/deep learning models to automatically summarize and classify ETD chapters. Anticipated results of this research include software implementations of a chapter-level text summarizer that generates paragraph-length summaries of ETD chapters, and a multi-label classifier that assigns subject categories to ETD chapters. Our aim is to develop software that can be adapted or replicated by libraries to add value to their existing ETD services. RA 3: User services, in which we study users to identify and understand their information needs and information seeking behaviors, so that we may establish corresponding requirements for user interface and service components most useful for interacting with ETD content. Basing our design decisions on empirical evidence obtained from user analysis, we will construct a prototype system to demonstrate how these components can improve the user experience with ETD collections, and ultimately increase the capacity of libraries to provide access to ETDs and other long-form document content. Our project brings to bear cutting-edge computer science and machine/deep learning technologies to advance discovery, use, and potential for reuse of the knowledge hidden in the text of books and book-length documents. In addition, by focusing on libraries\u27 ETD collections (where legal restrictions from book publishers generally are not applicable), our research will open this rich corpus of graduate research and scholarship, leverage ETDs to advance further research and education, and allow libraries to achieve greater impact

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
    • …
    corecore