Search CORE

200 research outputs found

Introducing Domain Knowledge to Scene Parsing in Autonomous Driving

Author: Rafael Valente Cristino
Publication venue
Publication date: 20/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Sekventiaalisen tiedon louhinta : segmenttirakenteita etsimässä

Author: Haiminen Niina
Publication venue: 'University of Helsinki Libraries'
Publication date: 08/04/2008
Field of study

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.Segmentointi on tiedon louhinnassa käytetty menetelmä, jonka avulla voidaan tuottaa yksinkertaisia kuvauksia sekvenssistä, joka koostuu järjestetystä jonosta pisteitä. Pisteet voivat olla joko yksi- tai moniulotteisia. Segmentoinnissa sekvenssi jaetaan tiettyyn määrään yhtenäisiä alueita, segmenttejä, ja kunkin alueen sisältämiä pisteitä kuvataan yhdellä arvolla. Väitöskirjassa keskitytään paloittain vakioiden segmenttirakenteiden etsintään. Tällaisille rakenteille kunkin segmentin paras kuvaus sekä koko sekvenssin paras jako segmentteihin voidaan laskea tehokkaasti. Tiedon mallintaminen segmentoinnin avulla on hyödyllistä mm. silloin kun tietoa tallennetaan ja indeksoidaan sekvenssitietokannoissa, sekä kun halutaan saada lisätietoja tietyn sekvenssin yleisrakenteesta. Väitöskirjassa käsitellään ensin segmentointiin liittyviä peruskysymyksiä, segmenttien lukumäärän valitsemista ja segmentointitulosten arviointia. Olemassa olevien mallinvalintamenetelmien näytetään soveltuvan hyvin segmenttien lukumäärän valitsemiseen. Segmentointien arviointia käsitellään suhteessa tunnettuun segmenttirakenteeseen. Voidaan näyttää, että segmentoimalla sekvenssi sen tiettyjen ominaisuuksien suhteen saadaan tulokseksi segmentointeja, joiden samankaltaisuus tunnetun rakenteen kanssa on merkitsevä. Perinteiseen segmentointikehykseen esitellään kaksi laajennosta: yksihuippuinen segmentointi ja kantasegmentointi. Yksihuippuisessa segmentoinnissa segmenttien kuvaukset saavat arvoja, jotka ensin kasvavat ja sitten vähenevät. Kantasegmentoinnissa puolestaan mallinnetaan segmenttien sekä sekvenssin eri ulottuvuuksien välisiä suhteita. Väitöskirjassa määritellään nämä kaksi uutta segmentointiongelmaa. Lisäksi sekä annetaan että analysoidaan laskennallisia menetelmiä, algoritmeja, niiden ratkaisemiseksi. Segmentointimenetelmiä sovelletaan käytännössä mm. aikasarjojen, tietovirtojen, tekstin ja biologisten sekvenssien analysoinnissa. Väitöskirjassa käsitellään esimerkinomaisesti segmentoinnin soveltamista genomisekvenssien analysoinnissa

Helsingin yliopiston digitaalinen arkisto

Superpixel-guided CFAR Detection of Ships at Sea in SAR Imagery

Author: Achim Alin
Bull David
Pappas Odysseas A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Crossref

Explore Bristol Research

Data mining using concepts of independence, unimodality and homophily

Author: Ye Wei
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 12/01/2018
Field of study

With the widespread use of information technologies, more and more complex data is generated and collected every day. Such complex data is various in structure, size, type and format, e.g. time series, texts, images, videos and graphs. Complex data is often high-dimensional and heterogeneous, which makes the separation of the wheat (knowledge) from the chaff (noise) more difficult. Clustering is a main mode of knowledge discovery from complex data, which groups objects in such a way that intra-group objects are more similar than inter-group objects. Traditional clustering methods such as k-means, Expectation-Maximization clustering (EM), DBSCAN and spectral clustering are either deceived by "the curse of dimensionality" or spoiled by heterogenous information. So, how to effectively explore complex data? In some cases, people may only have some partial information about the complex data. For example, in social networks, not every user provides his/her profile information such as the personal interests. Can we leverage the limited user information and friendship network wisely to infer the likely labels of the unlabeled users so that the advertisers can do accurate advertising? This is the problem of learning from labeled and unlabeled data, which is literarily attributed to semi-supervised classification. To gain insights into these problems, this thesis focuses on developing clustering and semi-supervised classification methods that are driven by the concepts of independence, unimodality and homophily. The proposed methods leverage techniques from diverse areas, such as statistics, information theory, graph theory, signal processing, optimization and machine learning. Specifically, this thesis develops four methods, i.e. FUSE, ISAAC, UNCut, and wvGN. FUSE and ISAAC are clustering techniques to discover statistically independent patterns from high-dimensional numerical data. UNCut is a clustering technique to discover unimodal clusters in attributed graphs in which not all the attributes are relevant to the graph structure. wvGN is a semi-supervised classification technique using the theory of homophily to infer the labels of the unlabeled vertices in graphs. We have verified our clustering and semi-supervised classification methods on various synthetic and real-world data sets. The results are superior to those of the state-of-the-art.Täglich werden durch den weit verbreiteten Einsatz von Informationstechnologien mehr und mehr komplexe Daten generiert und gesammelt. Diese komplexen Daten unterscheiden sich in der Struktur, Größe, Art und Format. Häufig anzutreffen sind beispielsweise Zeitreihen, Texte, Bilder, Videos und Graphen. Dabei sind diese Daten meist hochdimensional und heterogen, was die Trennung des Weizens ( Wissen ) von der Spreu ( Rauschen ) erschwert. Die Cluster Analyse ist dabei eine der wichtigsten Methoden um aus komplexen Daten wssen zu extrahieren. Dabei werden die Objekte eines Datensatzes in einer solchen Weise gruppiert, dass intra-gruppierte Objekte ähnlicher sind als Objekte anderer Gruppen. Der Einsatz von traditionellen Clustering-Methoden wie k-Means, Expectation-Maximization (EM), DBSCAN und Spektralclustering wird dabei entweder "durch der Fluch der Dimensionalität" erschwert oder ist angesichts der heterogenen Information nicht möglich. Wie erforscht man also solch komplexe Daten effektiv? Darüber hinaus ist es oft der Fall, dass für Objekte solcher Datensätze nur partiell Informationen vorliegen. So gibt in sozialen Netzwerken nicht jeder Benutzer seine Profil-Informationen wie die persönlichen Interessen frei. Können wir diese eingeschränkten Benutzerinformation trotzdem in Kombination mit dem Freundschaftsnetzwerk nutzen, um von von wenigen, einer Klasse zugeordneten Nutzern auf die anderen zu schließen. Beispielsweise um zielgerichtete Werbung zu schalten? Dieses Problem des Lernens aus klassifizierten und nicht klassifizierten Daten wird dem semi-supversised Learning zugeordnet. Um Einblicke in diese Probleme zu gewinnen, konzentriert sich diese Arbeit auf die Entwicklung von Clustering- und semi-überwachten Klassifikationsmethoden, die von den Konzepten der Unabhängigkeit, Unimodalität und Homophilie angetrieben werden. Die vorgeschlagenen Methoden nutzen Techniken aus verschiedenen Bereichen der Statistik, Informationstheorie, Graphentheorie, Signalverarbeitung, Optimierung und des maschinelles Lernen. Dabei stellt diese Arbeit vier Techniken vor: FUSE, ISAAC, UNCut, sowie wvGN. FUSE und ISAAC sind Clustering-Techniken, um statistisch unabhängige Muster aus hochdimensionalen numerischen Daten zu entdecken. UNCut ist eine Clustering-Technik, um unimodale Cluster in attributierten Graphen zu entdecken, in denen die Kanten und Attribute heterogene Informationen liefern. wvGN ist eine halbüberwachte Klassifikationstechnik, die Homophilie verwendet, um von gelabelten Kanten auf ungelabelte Kanten im Graphen zu schließen. Wir haben diese Clustering und semi-überwachten Klassifizierungsmethoden auf verschiedenen synthetischen und realen Datensätze überprüft. Die Ergebnisse sind denen von bisherigen State-of-the-Art-Methoden überlegen

Structures in High-Dimensional Data: Intrinsic Dimension and Cluster Analysis

Author: Johnsson Kerstin
Publication venue: Centre for Mathematical Sciences, Lund University
Publication date: 16/08/2016
Field of study

With today's improved measurement and data storing technologies it has become common to collect data in search for hypotheses instead of for testing hypotheses---to do exploratory data analysis. Finding patterns and structures in data is the main goal. This thesis deals with two kinds of structures that can convey relationships between different parts of data in a high-dimensional space: manifolds and clusters. They are in a way opposites of each other: a manifold structure shows that it is plausible to connect two distant points through the manifold, a clustering shows that it is plausible to separate two nearby points by assigning them to different clusters. But clusters and manifolds can also be the same: each cluster can be a manifold of its own.The first paper in this thesis concerns one specific aspect of a manifold structure, namely its dimension, also called the intrinsic dimension of the data. A novel estimator of intrinsic dimension, taking advantage of ``the curse of dimensionality'', is proposed and evaluated. It is shown that it has in general less bias than estimators from the literature and can therefore better distinguish manifolds with different dimensions.The second and third paper in this thesis concern cluster analysis of data generated by flow cytometry---a high-throughput single-cell measurement technology. In this area, clustering is performed routinely by manual assignment of data in two-dimensional plots, to identify cell populations. It is a tedious and subjective task, especially since data often has four, eight, twelve or even more dimensions, and the analysts need to decide which two dimensions to look at together, and in which order.In the second paper of the thesis a new pipeline for automated cell population identification is proposed, which can process multiple flow cytometry samples in parallel using a hierarchical model that shares information between the clusterings of the samples, thus making corresponding clusters in different samples similar while allowing for variation in cluster location and shape.In the third and final paper of the thesis, statistical tests for unimodality are investigated as a tool for quality control of automated cell population identification algorithms. It is shown that the different tests have different interpretations of unimodality and thus accept different kinds of clusters as sufficiently close to unimodal

Lund University Publications

Recommended from our members

Visual object discovery and understanding

Author: Yuan Jialin
Publication venue: 'Oregon State University'
Publication date
Field of study

Learning to recognize objects is a fundamental and essential step in human perception and understanding of the world. Accordingly, research of object discovery across diverse modalities plays a pivotal role in the context of computer vision. This field not only contributes significantly to enhancing our understanding of visual information but also offers a plethora of potential applications, like augmented reality, e-commerce, and robotics, particularly in industrial manipulation scenarios. We first address the task of discovering objects from still images regardless of any predefined categories. We introduce a novel variational relaxation approach tailored to the task. By framing it as an optimization problem for piecewise-constant segmentation, this technique enables direct training of a fully convolutional network (FCN) for predicting object labels on each pixel. Applying our approach to the instance segmentation task achieved results almost as good as mask R-CNN without depending on a two-stage framework. Note that the training of the network does not depend on the category label, enabling our approach to discover objects unbounded by predefined categories. Next, we extend our exploration to video sequences, focusing on the task of unsupervised video object segmentation. Here, we aim to discover and track objects within videos. Noticing that single-frame object proposals often fail to obtain a good proposal due to motion blur, occlusion, and other reasons, our approach involves refining key frame proposals using a Multi-proposal graph constructed from proposals initially generated in nearby frames and then propagated to the key frame. We then compute the maximal cliques within this graph, which contains proposals that represent the same object. Pixel-level voting is performed within each clique to generate the key frame proposals that could be better than any of the single-frame proposals. Then a semi-supervised VOS algorithm subsequently tracks these key frame proposals across the entire video, showcasing the potential for precise and robust object tracking in dynamic visual environments. We further explore into the domain of Vision-Language, where we seek to identify objects associated with a specific textual context. In this multifaceted context, we tackle the intricate challenge of content moderation (CM), which assesses multimodal user-generated content to detect material that is illegal, harmful, or insulting. We present a novel CM model to address the asymmetric in semantics between vision and language. Our model features an innovative asymmetric fusion architecture that not only fuses the common knowledge in both modalities but also leverages the unique information present in each modality. Additionally, we introduce a novel cross-modality contrastive loss to capture knowledge that arises exclusively in multimodal context, which is crucial for addressing harmful intent that may emerge at the intersection of these modalities

ScholarsArchive@OSU