27 research outputs found

    Contextual relabelling of detected objects

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordContextual information, such as the co-occurrence of objects and the spatial and relative size among objects provides deep and complex information about scenes. It also can play an important role in improving object detection. In this work, we present two contextual models (rescoring and re-labeling models) that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the state-of-the-art RCNN-based object detection (Faster RCNN). We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO)

    Improving object detection performance using scene contextual constraints

    Get PDF
    Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides rich and complex information about digital scenes. It also plays an important role in improving object detection and determining out-of-context objects. In this work, we present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes. We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO), where in some experiments PASCAL2012 is also used.We also show that iterating the process of applying our contextual models also enhances the detection performance further

    Contextual information for object detection

    Get PDF
    Object detection has improved very rapidly in the last decades, but because they are very essential and considerably needed in various applications, further enhancement is needed. This thesis proposes the use of contextual information captured from digital scenes as a tool to contribute to developing detection performance. Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides deep and complex knowledge and interpretation about scenes. Determining such relationships among objects is seen to provide machine learning models with vital cues that aid detection methods to reach a better performance. In this thesis, sixteen contextual object-object relationships captured from MSCOCO 2017 training dataset are proposed. Upon the unique and intelligent enlightenment that those sixteen relationships provide, two contextual models, named Rescoring Model, and Relabelling Model, are proposed. These models explicitly encode contextual information from scenes, resulting to an improvement in the performance of two of the state-of-the-art detectors (i.e., Faster RCNN and YOLO). These models even provide greater improvement when being repeatedly processed, achieving higher AUC, mAP and F1 scores, with an increase of up to 19 percentage points compared with the baseline detectors. Due to the enhancement those contextual models achieve, another contextual model, named Transformer-Encoder Detector Module, is proposed. In contrast to the previous models, this model implicitly encodes contextual statistics and uses attention mechanism to provide a deeper understanding of images contents. It also achieves higher mAP, F1 scores and AUC average score of 13 percentage points compared to Faster RCNN detector. Perturbed images, where two different approaches of perturbations are applied, are used to examine the impact of the proposed contextual models. Results show that contextual models also gain better performances compared to the baseline detector. This is due to the use of both visual and contextual features, unlike the detector, which depends only on visual features

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Deliverable D1.4 Visual, text and audio information analysis for hypervideo, final release

    Get PDF
    Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including (a) the improvement of a number of methods in terms of accuracy and time efficiency, (b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform

    Context based bioinformatics

    Get PDF
    The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio- logical questions. In many cases, these questions are driven by new biotechnological techniques, especially by genome and cell wide high throughput experiment studies. In principle there are two approaches: 1. Reduction and abstraction of the question to a clearly defined optimization problem, which can be solved with appropriate and efficient algorithms. 2. Development of context based methods, incorporating as much contextual knowledge as possible in the algorithms, and derivation of practical solutions for relevant biological ques- tions on the high-throughput data. These methods can be often supported by appropriate software tools and visualizations, allowing for interactive evaluation of the results by ex- perts. Context based methods are often much more complex and require more involved algorithmic techniques to get practical relevant and efficient solutions for real world problems, as in many cases already the simplified abstraction of problems result in NP-hard problem instances. In many cases, to solve these complex problems, one needs to employ efficient data structures and heuristic search methods to solve clearly defined sub-problems using efficient (polynomial) op- timization (such as dynamic programming, greedy, path- or tree-algorithms). In this thesis, we present new methods and analyses addressing open questions of bioinformatics from different contexts by incorporating the corresponding contextual knowledge. The two main contexts in this thesis are the protein structure similarity context (Part I) and net- work based interpretation of high-throughput data (Part II). For the protein structure similarity context Part I we analyze the consistency of gold standard structure classification systems and derive a consistent benchmark set usable for different ap- plications. We introduce two methods (Vorolign, PPM) for the protein structure similarity recog- nition problem, based on different features of the structures. Derived from the idea and results of Vorolign, we introduce the concept of contact neighbor- hood potential, aiming to improve the results of protein fold recognition and threading. For the re-scoring problem of predicted structure models we introduce the method Vorescore, clearly improving the fold-recognition performance, and enabling the evaluation of the contact neighborhood potential for structure prediction methods in general. We introduce a contact consistent Vorolign variant ccVorolign further improving the structure based fold recognition performance, and enabling direct optimization of the neighborhood po- tential in the future. Due to the enforcement of contact-consistence, the ccVorolign method has much higher computational complexity than the polynomial Vorolign method - the cost of com- puting interpretable and consistent alignments. Finally, we introduce a novel structural alignment method (PPM) enabling the explicit modeling and handling of phenotypic plasticity in protein structures. We employ PPM for the analysis of effects of alternative splicing on protein structures. With the help of PPM we test the hypothesis, whether splice isoforms of the same protein can lead to protein structures with different folds (fold transitions). In Part II of the thesis we present methods generating and using context information for the interpretation of high-throughput experiments. For the generation of context information of molecular regulations we introduce novel textmin- ing approaches extracting relations automatically from scientific publications. In addition to the fast NER (named entity recognition) method (syngrep) we also present a novel, fully ontology-based context-sensitive method (SynTree) allowing for the context-specific dis- ambiguation of ambiguous synonyms and resulting in much better identification performance. This context information is important for the interpretation of high-throughput data, but often missing in current databases. Despite all improvements, the results of automated text-mining methods are error prone. The RelAnn application presented in this thesis helps to curate the automatically extracted regula- tions enabling manual and ontology based curation and annotation. For the usage of high-throughput data one needs additional methods for data processing, for example methods to map the hundreds of millions short DNA/RNA fragments (so called reads) on a reference genome or transcriptome. Such data (RNA-seq reads) are the output of next generation sequencing methods measured by sequencing machines, which are becoming more and more efficient and affordable. Other than current state-of-the-art methods, our novel read-mapping method ContextMap re- solves the occurring ambiguities at the final step of the mapping process, employing thereby the knowledge of the complete set of possible ambiguous mappings. This approach allows for higher precision, even if more nucleotide errors are tolerated in the read mappings in the first step. The consistence between context information of molecular regulations stored in databases and extracted from textmining against measured data can be used to identify and score consistent reg- ulations (GGEA). This method substantially extends the commonly used gene-set based methods such over-representation (ORA) and gene set enrichment analysis (GSEA). Finally we introduce the novel method RelExplain, which uses the extracted contextual knowl- edge and generates network-based and testable hypotheses for the interpretation of high-throughput data.Bioinformatik befasst sich mit der Entwicklung innovativer und praktisch einsetzbarer Verfahren und Algorithmen für biologische Fragestellungen. Oft ergeben sich diese Fragestellungen aus neuen Beobachtungs- und Messverfahren, insbesondere neuen Hochdurchsatzverfahren und genom- und zellweiten Studien. Im Prinzip gibt es zwei Vorgehensweisen: Reduktion und Abstraktion der Fragestellung auf ein klar definiertes Optimierungsproblem, das dann mit geeigneten möglichst effizienten Algorithmen gelöst wird. Die Entwicklung von kontext-basierten Verfahren, die möglichst viel Kontextwissen und möglichst viele Randbedingungen in den Algorithmen nutzen, um praktisch relevante Lösungen für relvante biologische Fragestellungen und Hochdurchsatzdaten zu erhalten. Die Verfahren können oft durch geeignete Softwaretools und Visualisierungen unterstützt werden, um eine interaktive Auswertung der Ergebnisse durch Fachwissenschaftler zu ermöglichen. Kontext-basierte Verfahren sind oft wesentlich aufwändiger und erfordern involviertere algorithmische Techniken um für reale Probleme, deren simplifizierende Abstraktionen schon NP-hart sind, noch praktisch relevante und effiziente Lösungen zu ermöglichen. Oft werden effiziente Datenstrukturen und heuristische Suchverfahren benötigt, die für klar umrissene Teilprobleme auf effiziente (polynomielle) Optimierungsverfahren (z.B. dynamische Programmierung, Greedy, Wege- und Baumverfahren) zurückgreifen und sie entsprechend für das Gesamtverfahren einsetzen. In dieser Arbeit werden eine Reihe von neuen Methoden und Analysen vorgestellt um offene Fragen der Bioinformatik aus verschiedenen Kontexten durch Verwendung von entsprechendem Kontext-Wissen zu adressieren. Die zwei Hauptkontexte in dieser Arbeit sind (Teil 1) die Ähnlichkeiten von 3D Protein Strukturen und (Teil 2) auf die netzwerkbasierte Interpretation von Hochdurchsatzdaten. Im Proteinstrukturkontext Teil 1 analysieren wir die Konsistenz der heute verfügbaren Goldstandards für Proteinstruktur-Klassifikationen, und leiten ein vielseitig einsetzbares konsistentes Benchmark-Set ab. Für eine genauere Bestimmung der Ähnlichkeit von Proteinstrukturen beschreiben wir zwei Methoden (Vorolign, PPM), die unterschiedliche Strukturmerkmale nutzen. Ausgehend von den für Vorolign erzielten Ergebnissen, führen wir Kontakt-Umgebungs-Potentiale mit dem Ziel ein, Fold-Erkennung (auf Basis der vorhandenen Strukturen) und Threading (zur Proteinstrukturvorhersage) zu verbessern. Für das Problem des Re-scorings von vorhergesagten Strukturmodellen beschreiben wir das Vorescore Verfahren ein, mit dem die Fold-Erkennung deutlich verbessert, aber auch die Anwendbarkeit von Potentialen im Allgemeinen getested werden kann. Zur weiteren Verbesserung führen wir eine Kontakt-konsistente Vorolign Variante (ccVorolign) ein, die wegen der neuen Konsistenz-Randbedingung erheblich aufwändiger als das polynomielle Vorolignverfahren ist, aber eben auch interpretierbare konsistente Alignments liefert. Das neue Strukturalignment Verfahren (PPM) erlaubt es phänotypische Plastizität, explizit zu modellieren und zu berücksichtigen. PPM wird eingesetzt, um die Effekte von alternativem Splicing auf die Proteinstruktur zu untersuchen, insbesondere die Hypothese, ob Splice-Isoformen unterschiedliche Folds annehmen können (Fold-Transitionen). Im zweiten Teil der Arbeit werden Verfahren zur Generierung von Kontextinformationen und zu ihrer Verwendung für die Interpretation von Hochdurchsatz-Daten vorgestellt. Neue Textmining Verfahren extrahieren aus wissenschaftlichen Publikationen automatisch molekulare regulatorische Beziehungen und entsprechende Kontextinformation. Neben schnellen NER (named entity recognition) Verfahren (wie syngrep) wird auch ein vollständig Ontologie-basiertes kontext-sensitives Verfahren (SynTree) eingeführt, das es erlaubt, mehrdeutige Synonyme kontext-spezifisch und damit wesentlich genauer aufzulösen. Diese für die Interpretation von Hochdurchsatzdaten wichtige Kontextinformation fehlt häufig in heutigen Datenbanken. Automatische Verfahren produzieren aber trotz aller Verbesserungen noch viele Fehler. Mithilfe unserer Applikation RelAnn können aus Texten extrahierte regulatorische Beziehungen ontologiebasiert manuell annotiert und kuriert werden. Die Verwendung aktueller Hochdurchsatzdaten benötigt zusätzliche Ansätze für die Datenprozessierung, zum Beispiel für das Mapping von hunderten von Millionen kurzer DNA/RNA Fragmente (sog. reads) auf Genom oder Transkriptom. Diese Daten (RNA-seq) ergeben sich durch next generation sequencing Methoden, die derzeit mit immer leistungsfähigeren Geräten immer kostengünstiger gemessen werden können. In der ContextMap Methode werden im Gegensatz zu state-of-the-art Verfahren die auftretenden Mehrdeutigkeiten erst am Ende des Mappingprozesses aufgelöst, wenn die Gesamtheit der Mappinginformationen zur Verfügung steht. Dadurch könenn mehr Fehler beim Mapping zugelassen und trotzdem höhere Genauigkeit erreicht werden. Die Konsistenz zwischen der Kontextinformation aus Textmining und Datenbanken sowie den gemessenen Daten kann dann für das Auffinden und Bewerten von konsistente Regulationen (GGEA) genutzt werden. Dieses Verfahren stellt eine wesentliche Erweiterung der häufig verwendeten Mengen-orientierten Verfahren wie overrepresentation (ORA) und gene set enrichment analysis (GSEA) dar. Zuletzt stellen wir die Methode RelExplain vor, die aus dem extrahierten Kontextwissen netzwerk-basierte, testbare Hypothesen für die Erklärung von Hochdurchsatzdaten generiert

    Deriving and Exploiting Situational Information in Speech: Investigations in a Simulated Search and Rescue Scenario

    Get PDF
    The need for automatic recognition and understanding of speech is emerging in tasks involving the processing of large volumes of natural conversations. In application domains such as Search and Rescue, exploiting automated systems for extracting mission-critical information from speech communications has the potential to make a real difference. Spoken language understanding has commonly been approached by identifying units of meaning (such as sentences, named entities, and dialogue acts) for providing a basis for further discourse analysis. However, this fine-grained identification of fundamental units of meaning is sensitive to high error rates in the automatic transcription of noisy speech. This thesis demonstrates that topic segmentation and identification techniques can be employed for information extraction from spoken conversations by being robust to such errors. Two novel topic-based approaches are presented for extracting situational information within the search and rescue context. The first approach shows that identifying the changes in the context and content of first responders' report over time can provide an estimation of their location. The second approach presents a speech-based topological map estimation technique that is inspired, in part, by automatic mapping algorithms commonly used in robotics. The proposed approaches are evaluated on a goal-oriented conversational speech corpus, which has been designed and collected based on an abstract communication model between a first responder and a task leader during a search process. Results have confirmed that a highly imperfect transcription of noisy speech has limited impact on the information extraction performance compared with that obtained on the transcription of clean speech data. This thesis also shows that speech recognition accuracy can benefit from rescoring its initial transcription hypotheses based on the derived high-level location information. A new two-pass speech decoding architecture is presented. In this architecture, the location estimation from a first decoding pass is used to dynamically adapt a general language model which is used for rescoring the initial recognition hypotheses. This decoding strategy has resulted in a statistically significant gain in the recognition accuracy of the spoken conversations in high background noise. It is concluded that the techniques developed in this thesis can be extended to more application domains that deal with large volumes of natural spoken conversations
    corecore