9,316 research outputs found

    Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

    Full text link
    Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201

    Automatic Document Summarization Using Knowledge Based System

    Get PDF
    This dissertation describes a knowledge-based system to create abstractive summaries of documents by generalizing new concepts, detecting main topics and creating new sentences. The proposed system is built on the Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system is unsupervised and domain independent. Its domain knowledge is provided by the comprehensive ontology of common sense knowledge contained in the Cyc knowledge base. The system described in this dissertation generates coherent and topically related new sentences as a summary for a given document. It uses syntactic structure and semantic features of the given documents to fuse information. It makes use of the knowledge base as a source of domain knowledge. Furthermore, it uses the reasoning engine to generalize novel information. The proposed system consists of three main parts: knowledge acquisition, knowledge discovery, and knowledge representation. Knowledge acquisition derives syntactic structure of each sentence in the document and maps words and their syntactic relationships into Cyc knowledge base. Knowledge discovery abstracts novel concepts, not explicitly mentioned in the document by exploring the ontology of mapped concepts and derives main topics described in the document by clustering the concepts. Knowledge representation creates new English sentences to summarize main concepts and their relationships. The syntactic structure of the newly created sentences is extended beyond simple subject-predicate-object triplets by incorporating adjective and adverb modifiers. This structure allows the system to create sentences that are more complex. The proposed system was implemented and tested. Test results show that the system is capable of creating new sentences that include abstracted concepts not mentioned in the original document and is capable of combining information from different parts of the document text to compose a summary

    Human resources mining for examination of R&D progress and requirements

    Get PDF

    Intelligent Tourist Routes

    Get PDF
    A maior parte das pessoas gosta de viajar e o Porto foi eleita a cidade da Europa mais interessante para visitar em 2019. Com grande potencial de atratividade, o Porto conta com infindáveis opções de rotas turísticas. Investigações recentes mostram que um operador eficiente de viagens não só deve ter em conta as necessidades e constrangimentos do utilizador, mas também permitir algum grau de livre exploração da cidade, adaptando a oferta de acordo com as preferências do utilizador. A imagem global do contexto é um bom ponto de partida para uma viagem memorável. Nesta dissertação pretende-se desenvolver sistema inteligente capaz de maximizar a satisfação do visitante, criando percursos dinâmicos e personalizados em função de preferências e interesses dos utilizadores. Estes serão aferidos diretamente através de técnicas modernas de segmentação e descoberta de perfil e indiretamente através da pontuação atribuída pelos utilizadores a sets de fotografias (normais e 360) dos locais de interesse. Ao longo do percurso o utilizador poderá dar feedback sobre os locais de interesse sugeridos por forma a potenciar a aprendizagem do sistema

    Active Learning for Text Classification

    Get PDF
    Text classification approaches are used extensively to solve real-world challenges. The success or failure of text classification systems hangs on the datasets used to train them, without a good dataset it is impossible to build a quality system. This thesis examines the applicability of active learning in text classification for the rapid and economical creation of labelled training data. Four main contributions are made in this thesis. First, we present two novel selection strategies to choose the most informative examples for manually labelling. One is an approach using an advanced aggregated confidence measurement instead of the direct output of classifiers to measure the confidence of the prediction and choose the examples with least confidence for querying. The other is a simple but effective exploration guided active learning selection strategy which uses only the notions of density and diversity, based on similarity, in its selection strategy. Second, we propose new methods of using deterministic clustering algorithms to help bootstrap the active learning process. We first illustrate the problems of using non-deterministic clustering for selecting initial training sets, showing how non-deterministic clustering methods can result in inconsistent behaviour in the active learning process. We then compare various deterministic clustering techniques and commonly used non-deterministic ones, and show that deterministic clustering algorithms are as good as non-deterministic clustering algorithms at selecting initial training examples for the active learning process. More importantly, we show that the use of deterministic approaches stabilises the active learning process. Our third direction is in the area of visualising the active learning process. We demonstrate the use of an existing visualisation technique in understanding active learning selection strategies to show that a better understanding of selection strategies can be achieved with the help of visualisation techniques. Finally, to evaluate the practicality and usefulness of active learning as a general dataset labelling methodology, it is desirable that actively labelled dataset can be reused more widely instead of being only limited to some particular classifier. We compare the reusability of popular active learning methods for text classification and identify the best classifiers to use in active learning for text classification. This thesis is concerned using active learning methods to label large unlabelled textual datasets. Our domain of interest is text classification, but most of the methods proposed are quite general and so are applicable to other domains having large collections of data with high dimensionality
    corecore