30 research outputs found

    Resource efficient action recognition in videos

    Get PDF
    This thesis traces an innovative journey in the domain of real-world action recognition, in particular focusing on memory and data efficient systems. It begins by introducing a novel approach for smart frame selection, which significantly reduces computational costs in video classification. It further optimizes the action recognition process by addressing the challenges of training time and memory consumption in video transformers, laying a strong foundation for memory efficient action recognition. The thesis then delves into zero-shot learning, focusing on the flaws of the currently existing protocol and establishing a new split for true zero-shot action recognition, ensuring zero overlap between unseen test classes and training or pre-training classes. Building on this, a unique cluster-based representation, optimized using reinforcement learning, is proposed for zero-shot action recognition. Crucially, we show that a joint visual-semantic representation learning is essential for improved performance. We also experiment with feature generation approaches for zero-shot action recognition by introducing a synthetic sample selection methodology extending the utility of zero-shot learning to both images and videos and selecting high-quality samples for synthetic data augmentation. This form of data valuation is then incorporated for our novel video data augmentation approach where we generate video composites using foreground and background mixing of videos. The data valuation helps us choose good composites at a reduced overall cost. Finally, we propose the creation of a meaningful semantic space for action labels. We create a textual description dataset for each action class and propose a novel feature generating approach to maximise the benefits of this semantic space. The research contributes significantly to the field, potentially paving the way for more efficient, resource-friendly, and robust video processing and understanding techniques

    Semantics-driven Abstractive Document Summarization

    Get PDF
    The evolution of the Web over the last three decades has led to a deluge of scientific and news articles on the Internet. Harnessing these publications in different fields of study is critical to effective end user information consumption. Similarly, in the domain of healthcare, one of the key challenges with the adoption of Electronic Health Records (EHRs) for clinical practice has been the tremendous amount of clinical notes generated that can be summarized without which clinical decision making and communication will be inefficient and costly. In spite of the rapid advances in information retrieval and deep learning techniques towards abstractive document summarization, the results of these efforts continue to resemble extractive summaries, achieving promising results predominantly on lexical metrics but performing poorly on semantic metrics. Thus, abstractive summarization that is driven by intrinsic and extrinsic semantics of documents is not adequately explored. Resources that can be used for generating semantics-driven abstractive summaries include: ‱ Abstracts of multiple scientific articles published in a given technical field of study to generate an abstractive summary for topically-related abstracts within the field, thus reducing the load of having to read semantically duplicate abstracts on a given topic. ‱ Citation contexts from different authoritative papers citing a reference paper can be used to generate utility-oriented abstractive summary for a scientific article. ‱ Biomedical articles and the named entities characterizing the biomedical articles along with background knowledge bases to generate entity and fact-aware abstractive summaries. ‱ Clinical notes of patients and clinical knowledge bases for abstractive clinical text summarization using knowledge-driven multi-objective optimization. In this dissertation, we develop semantics-driven abstractive models based on intra- document and inter-document semantic analyses along with facts of named entities retrieved from domain-specific knowledge bases to produce summaries. Concretely, we propose a sequence of frameworks leveraging semantics at various granularity (e.g., word, sentence, document, topic, citations, and named entities) levels, by utilizing external resources. The proposed frameworks have been applied to a range of tasks including 1. Abstractive summarization of topic-centric multi-document scientific articles and news articles. 2. Abstractive summarization of scientific articles using crowd-sourced citation contexts. 3. Abstractive summarization of biomedical articles clustered based on entity-relatedness. 4. Abstractive summarization of clinical notes of patients with heart failure and Chest X-Rays recordings. The proposed approaches achieve impressive performance in terms of preserving semantics in abstractive summarization while paraphrasing. For summarization of topic-centric multiple scientific/news articles, we propose a three-stage approach where abstracts of scientific articles or news articles are clustered based on their topical similarity determined from topics generated using Latent Dirichlet Allocation (LDA), followed by extractive phase and abstractive phase. Then, in the next stage, we focus on abstractive summarization of biomedical literature where we leverage named entities in biomedical articles to 1) cluster related articles; and 2) leverage the named entities towards guiding abstractive summarization. Finally, in the last stage, we turn to external resources such as citation contexts pointing to a scientific article to generate a comprehensive and utility-centric abstractive summary of a scientific article, domain-specific knowledge bases to fill gaps in information about entities in a biomedical article to summarize and clinical notes to guide abstractive summarization of clinical text. Thus, the bottom-up progression of exploring semantics towards abstractive summarization in this dissertation starts with (i) Semantic Analysis of Latent Topics; builds on (ii) Internal and External Knowledge-I (gleaned from abstracts and Citation Contexts); and extends it to make it comprehensive using (iii) Internal and External Knowledge-II (Named Entities and Knowledge Bases)

    Addressing the data bottleneck in implicit discourse relation classification

    Get PDF
    When humans comprehend language, their interpretation consists of more than just the sum of the content of the sentences. Additional logic and semantic links (known as coherence relations or discourse relations) are inferred between sentences/clauses in the text. The identification of discourse relations is beneficial for various NLP applications such as question-answering, summarization, machine translation, information extraction, etc. Discourse relations are categorized into implicit and explicit discourse relations depending on whether there is an explicit discourse marker between the arguments. In this thesis, we mainly focus on the implicit discourse relation classification, given that with the explicit markers acting as informative cues, the explicit relations are relatively easier to identify for machines. The recent neural network-based approaches in particular suffer from insufficient training (and test) data. As shown in Chapter 3 of this thesis, we start out by showing to what extent the limited data size is a problem in implicit discourse relation classification and propose data augmentation methods with the help of cross-lingual data. And then we propose several approaches for better exploiting and encoding various types of existing data in the discourse relation classification task. Most of the existing machine learning methods train on sections 2-21 of the PDTB and test on section 23, which only includes a total of less than 800 implicit discourse relation instances. With the help of cross validation, we argue that the standard test section of the PDTB is too small to draw conclusions upon. With more test samples in the cross validation, we would come to very different conclusions about whether a feature is generally useful. Second, we propose a simple approach to automatically extract samples of implicit discourse relations from multilingual parallel corpus via back-translation. After back-translating from target languages, it is easy for the discourse parser to identify those examples that are originally implicit but explicit in the back-translations. Having those additional data in the training set, the experiments show significant improvements on different settings. Finally, having better encoding ability is also of crucial importance in terms of improving classification performance. We propose different methods including a sequence-to-sequence neural network and a memory component to help have a better representation of the arguments. We also show that having the correct next sentence is beneficial for the task within and across domains, with the help of the BERT (Devlin et al., 2019) model. When it comes to a new domain, it is beneficial to integrate external domain-specific knowledge. In Chapter 8, we show that with the entity-enhancement, the performance on BioDRB is improved significantly, comparing with other BERT-based methods. In sum, the studies reported in this dissertation contribute to addressing the data bottleneck problem in implicit discourse relation classification and propose corresponding approaches that achieve 54.82% and 69.57% on PDTB and BioDRB respectively.Wenn Menschen Sprache verstehen, besteht ihre Interpretation aus mehr als nur der Summe des Inhalts der SĂ€tze. Zwischen SĂ€tzen im Text werden zusĂ€tzliche logische und semantische VerknĂŒpfungen (sogenannte KohĂ€renzrelationen oder Diskursrelationen) hergeleitet. Die Identifizierung von Diskursrelationen ist fĂŒr verschiedene NLP-Anwendungen wie Frage- Antwort, Zusammenfassung, maschinelle Übersetzung, Informationsextraktion usw. von Vorteil. Diskursrelationen werden in implizite und explizite Diskursrelationen unterteilt, je nachdem, ob es eine explizite Diskursrelationen zwischen den Argumenten gibt. In dieser Arbeit konzentrieren wir uns hauptsĂ€chlich auf die Klassifizierung der impliziten Diskursrelationen, da die expliziten Marker als hilfreiche Hinweise dienen und die expliziten Beziehungen fĂŒr Maschinen relativ leicht zu identifizieren sind. Es wurden verschiedene AnsĂ€tze vorgeschlagen, die bei der impliziten Diskursrelationsklassifikation beeindruckende Ergebnisse erzielt haben. Die meisten von ihnen leiden jedoch darunter, dass die Daten fĂŒr auf neuronalen Netzen basierende Methoden unzureichend sind. In dieser Arbeit gehen wir zunĂ€chst auf das Problem begrenzter Daten bei dieser Aufgabe ein und schlagen dann Methoden zur Datenanreicherung mit Hilfe von sprachĂŒbergreifenden Daten vor. Zuletzt schlagen wir mehrere Methoden vor, um die Argumente aus verschiedenen Aspekten besser kodieren zu können. Die meisten der existierenden Methoden des maschinellen Lernens werden auf den Abschnitten 2-21 der PDTB trainiert und auf dem Abschnitt 23 getestet, der insgesamt nur weniger als 800 implizite Diskursrelationsinstanzen enthĂ€lt. Mit Hilfe der Kreuzvalidierung argumentieren wir, dass der Standardtestausschnitt der PDTB zu klein ist um daraus Schlussfolgerungen zu ziehen. Mit mehr Teststichproben in der Kreuzvalidierung wĂŒrden wir zu anderen Schlussfolgerungen darĂŒber kommen, ob ein Merkmal fĂŒr diese Aufgabe generell vorteilhaft ist oder nicht, insbesondere wenn wir einen relativ großen Labelsatz verwenden. Wenn wir nur unseren kleinen Standardtestsatz herausstellen, laufen wir Gefahr, falsche SchlĂŒsse darĂŒber zu ziehen, welche Merkmale hilfreich sind. Zweitens schlagen wir einen einfachen Ansatz zur automatischen Extraktion von Samples impliziter Diskursrelationen aus mehrsprachigen Parallelkorpora durch RĂŒckĂŒbersetzung vor. Er ist durch den Explikationsprozess motiviert, wenn Menschen einen Text ĂŒbersetzen. Nach der RĂŒckĂŒbersetzung aus den Zielsprachen ist es fĂŒr den Diskursparser leicht, diejenigen Beispiele zu identifizieren, die ursprĂŒnglich implizit, in den RĂŒckĂŒbersetzungen aber explizit enthalten sind. Da diese zusĂ€tzlichen Daten im Trainingsset enthalten sind, zeigen die Experimente signifikante Verbesserungen in verschiedenen Situationen. Wir verwenden zunĂ€chst nur französisch-englische Paare und haben keine Kontrolle ĂŒber die QualitĂ€t und konzentrieren uns meist auf die satzinternen Relationen. Um diese Fragen in Angriff zu nehmen, erweitern wir die Idee spĂ€ter mit mehr Vorverarbeitungsschritten und mehr Sprachpaaren. Mit den Mehrheitsentscheidungen aus verschiedenen Sprachpaaren sind die gemappten impliziten Labels zuverlĂ€ssiger. Schließlich ist auch eine bessere KodierfĂ€higkeit von entscheidender Bedeutung fĂŒr die Verbesserung der Klassifizierungsleistung. Wir schlagen ein neues Modell vor, das aus einem Klassifikator und einem Sequenz-zu-Sequenz-Modell besteht. Neben der korrekten Vorhersage des Labels werden sie auch darauf trainiert, eine ReprĂ€sentation der Diskursrelationsargumente zu erzeugen, indem sie versuchen, die Argumente einschließlich eines geeigneten impliziten Konnektivs vorherzusagen. Die neuartige sekundĂ€re Aufgabe zwingt die interne ReprĂ€sentation dazu, die Semantik der Relationsargumente vollstĂ€ndiger zu kodieren und eine feinkörnigere Klassifikation vorzunehmen. Um das allgemeine Wissen in Kontexten weiter zu erfassen, setzen wir auch ein GedĂ€chtnisnetzwerk ein, um eine explizite KontextreprĂ€sentation von Trainingsbeispielen fĂŒr Kontexte zu erhalten. FĂŒr jede Testinstanz erzeugen wir durch gewichtetes Lesen des GedĂ€chtnisses einen Wissensvektor. Wir evaluieren das vorgeschlagene Modell unter verschiedenen Bedingungen und die Ergebnisse zeigen, dass das Modell mit dem Speichernetzwerk die Vorhersage von Diskursrelationen erleichtern kann, indem es Beispiele auswĂ€hlt, die eine Ă€hnliche semantische ReprĂ€sentation und Diskursrelationen aufweisen. Auch wenn ein besseres VerstĂ€ndnis, eine Kodierung und semantische Interpretation fĂŒr die Aufgabe der impliziten Diskursrelationsklassifikation unerlĂ€sslich und nĂŒtzlich sind, so leistet sie doch nur einen Teil der Arbeit. Ein guter impliziter Diskursrelationsklassifikator sollte sich auch der bevorstehenden Ereignisse, Ursachen, Folgen usw. bewusst sein, um die Diskurserwartung in die Satzdarstellungen zu kodieren. Mit Hilfe des kĂŒrzlich vorgeschlagenen BERT-Modells versuchen wir herauszufinden, ob es fĂŒr die Aufgabe vorteilhaft ist, den richtigen nĂ€chsten Satz zu haben oder nicht. Die experimentellen Ergebnisse zeigen, dass das Entfernen der Aufgabe zur Vorhersage des nĂ€chsten Satzes die Leistung sowohl innerhalb der DomĂ€ne als auch domĂ€nenĂŒbergreifend stark beeintrĂ€chtigt. Die begrenzte FĂ€higkeit von BioBERT, domĂ€nenspezifisches Wissen, d.h. EntitĂ€tsinformationen, EntitĂ€tsbeziehungen etc. zu erlernen, motiviert uns, externes Wissen in die vortrainierten Sprachmodelle zu integrieren. Wir schlagen eine unĂŒberwachte Methode vor, bei der Information-Retrieval-System und Wissensgraphen-Techniken verwendet werden, mit der Annahme, dass, wenn zwei Instanzen Ă€hnliche EntitĂ€ten in beiden relationalen Argumenten teilen, die Wahrscheinlichkeit groß ist, dass sie die gleiche oder eine Ă€hnliche Diskursrelation haben. Der Ansatz erzielt vergleichbare Ergebnisse auf BioDRB, verglichen mit Baselinemodellen. Anschließend verwenden wir die extrahierten relevanten EntitĂ€ten zur Verbesserung des vortrainierten Modells K-BERT, um die Bedeutung der Argumente besser zu kodieren und das ursprĂŒngliche BERT und BioBERT mit einer Genauigkeit von 6,5% bzw. 2% zu ĂŒbertreffen. Zusammenfassend trĂ€gt diese Dissertation dazu bei, das Problem des Datenengpasses bei der impliziten Diskursrelationsklassifikation anzugehen, und schlĂ€gt entsprechende AnsĂ€tze in verschiedenen Aspekten vor, u.a. die Darstellung des begrenzten Datenproblems und der Risiken bei der Schlussfolgerung daraus; die Erfassung automatisch annotierter Daten durch den Explikationsprozess wĂ€hrend der manuellen Übersetzung zwischen Englisch und anderen Sprachen; eine bessere ReprĂ€sentation von Diskursrelationsargumenten; Entity-Enhancement mit einer unĂŒberwachten Methode und einem vortrainierten Sprachmodell

    Neural Graph Transfer Learning in Natural Language Processing Tasks

    Get PDF
    Natural language is essential in our daily lives as we rely on languages to communicate and exchange information. A fundamental goal for natural language processing (NLP) is to let the machine understand natural language to help or replace human experts to mine knowledge and complete tasks. Many NLP tasks deal with sequential data. For example, a sentence is considered as a sequence of works. Very recently, deep learning-based language models (i.e.,BERT \citep{devlin2018bert}) achieved significant improvement in many existing tasks, including text classification and natural language inference. However, not all tasks can be formulated using sequence models. Specifically, graph-structured data is also fundamental in NLP, including entity linking, entity classification, relation extraction, abstractive meaning representation, and knowledge graphs \citep{santoro2017simple,hamilton2017representation,kipf2016semi}. In this scenario, BERT-based pretrained models may not be suitable. Graph Convolutional Neural Network (GCN) \citep{kipf2016semi} is a deep neural network model designed for graphs. It has shown great potential in text classification, link prediction, question answering and so on. This dissertation presents novel graph models for NLP tasks, including text classification, prerequisite chain learning, and coreference resolution. We focus on different perspectives of graph convolutional network modeling: for text classification, a novel graph construction method is proposed which allows interpretability for the prediction; for prerequisite chain learning, we propose multiple aggregation functions that utilize neighbors for better information exchange; for coreference resolution, we study how graph pretraining can help when labeled data is limited. Moreover, an important branch is to apply pretrained language models for the mentioned tasks. So, this dissertation also focuses on the transfer learning method that generalizes pretrained models to other domains, including medical, cross-lingual, and web data. Finally, we propose a new task called unsupervised cross-domain prerequisite chain learning, and study novel graph-based methods to transfer knowledge over graphs

    Medical Instrument Detection in 3D Ultrasound for Intervention Guidance

    Get PDF

    Medical Instrument Detection in 3D Ultrasound for Intervention Guidance

    Get PDF

    Recommendation Systems: An Insight Into Current Development and Future Research Challenges

    Get PDF
    Research on recommendation systems is swiftly producing an abundance of novel methods, constantly challenging the current state-of-the-art. Inspired by advancements in many related fields, like Natural Language Processing and Computer Vision, many hybrid approaches based on deep learning are being proposed, making solid improvements over traditional methods. On the downside, this flurry of research activity, often focused on improving over a small number of baselines, makes it hard to identify reference methods and standardized evaluation protocols. Furthermore, the traditional categorization of recommendation systems into content-based, collaborative filtering and hybrid systems lacks the informativeness it once had. With this work, we provide a gentle introduction to recommendation systems, describing the task they are designed to solve and the challenges faced in research. Building on previous work, an extension to the standard taxonomy is presented, to better reflect the latest research trends, including the diverse use of content and temporal information. To ease the approach toward the technical methodologies recently proposed in this field, we review several representative methods selected primarily from top conferences and systematically describe their goals and novelty. We formalize the main evaluation metrics adopted by researchers and identify the most commonly used benchmarks. Lastly, we discuss issues in current research practices by analyzing experimental results reported on three popular datasets
    corecore