6,038 research outputs found

    Question-based Text Summarization

    Get PDF
    In the modern information age, finding the right information at the right time is an art (and a science). However, the abundance of information makes it difficult for people to digest it and make informed choices. In this thesis, we aim to help people who want to quickly capture the main idea of a piece of information before they read the details through text summarization. In contrast with existing works, which mainly utilize declarative sentences to summarize a text document, we aim to use a few questions as a summary. In this way, people would know what questions a given text document can address and thus they may further read it if they have similar questions in mind. A question-based summary needs to satisfy three goals, relevancy, answerability, and diversity. Relevancy measures whether a few questions can cover the main points that discussed in a text document; answerability measures whether answers to the questions are included in the text document; and diversity measures whether there is redundant information carried by the questions. To achieve the three goals, we design a two-stage approach which consists of question selection and question diversification. The question selection component aims to find a set of candidate questions that are relevant to a text document, which in turn can be treated as answers to the questions. Specifically, we explore two lines of approaches that have been developed for traditional text summarization tasks, extractive approaches and abstractive approaches to achieve the goals of relevancy and answerability, respectively. The question diversification component is designed to re-rank the questions with the goal of rewarding diversity in the final question-based summary. Evaluation on product review summarization tasks for two product categories shows that the proposed approach is effective for discovering meaningful questions that are representative for individual reviews. This thesis opens up a new direction in the intersection of information retrieval and natural language processing. Despite the evaluation on the product review domain, the thesis provides a general solution for question selection for many interesting applications and discusses the possibility of extending the problem to other domain-specific question-based text summarization tasks.Ph.D., Information Science -- Drexel University, 201

    Quantum Jump from Singularity to Outside of Black Hole

    Full text link
    Considering the role of black hole singularity in quantum evolution, a resolution to the firewall paradox is presented. It is emphasized that if an observer has the singularity as a part of his spacetime, then the semi-classical evolution would be non-unitary as viewed by him. Specifically, a free-falling observer inside the black hole would have a Hilbert space with non-unitary evolution; a quantum jump for particles encountering the singularity to outside of the horizon as late Hawking radiations. The non-unitariness in the jump resembles the one in collapse of wave function, but preserves entanglements. Accordingly, we elaborate the first postulate of black hole complementarity: freely falling observers who pass through the event horizon would have non-unitary evolution, while it does not have physically measurable effects for them. Besides, no information would be lost in the singularity. Taking the modified picture into account, the firewall paradox can be resolved, respecting No Drama. A by-product of our modification is that roughly half of the entropy of the black hole is released close to the end of evaporation in the shape of very hot Hawking radiation.Comment: 7 figures, v2 more comprehensive, v3 matches the published versio

    A Survey of Source Code Search: A 3-Dimensional Perspective

    Full text link
    (Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.Comment: submitted to ACM Transactions on Software Engineering and Methodolog

    Natural language processing methods for knowledge management - Applying document clustering for fast search and grouping of engineering documents

    Get PDF
    Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested

    Information Literacy Instruction in an English Capstone Course: A Study of Student Confidence, Perception, and Practice

    Get PDF
    An English professor and an instruction librarian at the University of New Hampshire at Manchester felt that the college\u27s new English Capstone course for majors provided a unique opportunity to assess the information literacy skill levels of graduating English majors. They therefore engaged in a three-year study to evaluate the IL competency of these students, to gauge their perceptions of library instruction provided during the Capstone course and throughout their academic careers, and to determine students\u27 confidence and self-efficacy with respect to these skills. The researchers sought to determine the ways in which the IL program for English majors effectively met established IL goals and to identify areas for improvement
    • …
    corecore