1,978 research outputs found
Data complexity measured by principal graphs
How to measure the complexity of a finite set of vectors embedded in a
multidimensional space? This is a non-trivial question which can be approached
in many different ways. Here we suggest a set of data complexity measures using
universal approximators, principal cubic complexes. Principal cubic complexes
generalise the notion of principal manifolds for datasets with non-trivial
topologies. The type of the principal cubic complex is determined by its
dimension and a grammar of elementary graph transformations. The simplest
grammar produces principal trees.
We introduce three natural types of data complexity: 1) geometric (deviation
of the data's approximator from some "idealized" configuration, such as
deviation from harmonicity); 2) structural (how many elements of a principal
graph are needed to approximate the data), and 3) construction complexity (how
many applications of elementary graph transformations are needed to construct
the principal object starting from the simplest one).
We compute these measures for several simulated and real-life data
distributions and show them in the "accuracy-complexity" plots, helping to
optimize the accuracy/complexity ratio. We discuss various issues connected
with measuring data complexity. Software for computing data complexity measures
from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres
Cell type identification, differential expression analysis and trajectory inference in single-cell transcriptomics
Single-cell RNA-sequencing (scRNA-seq) is a cutting-edge technology that enables to quantify the transcriptome, the set of expressed RNA transcripts, of a group of cells at the single-cell level. It represents a significant upgrade from bulk RNA-seq, which measures the combined signal of thousands of cells. Measuring gene expression by bulk RNA-seq is an invaluable tool for biomedical researchers who want to understand how cells alter their gene expression due to an illness, differentiation, ternal stimulus, or other events. Similarly, scRNA-seq has become an essential method for biomedical researchers, and it has brought several new applications previously unavailable with bulk RNA-seq.
scRNA-seq has the same applications as bulk RNA-seq. However, the single-cell resolution also enables cell annotation based on gene markers of clusters, that is, cell populations that have been identified based on machine learning to be, on average, dissimilar at the transcriptomic level. Researchers can use the cell clusters to detect cell-type-specific gene expression changes between conditions such as case and control groups. Clustering can sometimes even discover entirely new cell types. Besides the cluster-level representation, the single-cell resolution also enables to model cells as a trajectory, representing how the cells are related at the cell level and what is the dynamic differentiation process that the cells undergo in a tissue.
This thesis introduces new computational methods for cell type identification and trajectory inference from scRNA-seq data. A new cell type identification method (ILoReg) was proposed, which enables high-resolution clustering of cells into populations with subtle transcriptomic differences. In addition, two new trajectory inference methods were developed: scShaper, which is an accurate and robust method for inferring linear trajectories; and Totem, which is a user-friendly and flexible method for inferring tree-shaped trajectories. In addition, one of the works benchmarked methods for detecting cell-type-specific differential states from scRNA-seq data with multiple subjects per comparison group, requiring tailored methods to confront false discoveries.
KEYWORDS: Single-cell RNA sequencing, transcriptome, cell type identification, trajectory inference, differential expressionYksisoluinen RNA-sekvensointi on huipputeknologia, joka mahdollistaa transkriptomin eli ilmentyneiden RNA-transkriptien laskennallisen määrittämisen joukolle soluja yhden solun tarkkuudella, ja sen kehittäminen oli merkittävä askel eteenpäin perinteisestä bulkki-RNA-sekvensoinnista, joka mittaa tuhansien solujen yhteistä signaalia. Bulkki-RNA-sekvensointi on tärkeä työväline biolääketieteen tutkijoille, jotka haluavat ymmärtää miten solut muuttavat geenien ilmentymistä sairauden, erilaistumisen, ulkoisen ärsykkeen tai muun tapahtuman seurauksena. Yksisoluisesta RNA-sekvensoinnista on vastaavasti kehittynyt tärkeä työväline tutkijoille, ja se on tuonut useita uusia sovelluksia.
Yksisoluisella RNA-sekvensoinnilla on samat sovellukset kuin bulkki-RNA-sekvensoinnilla, mutta sen lisäksi se mahdollistaa solujen tunnistamisen geenimarkkerien perusteella. Geenimarkkerit etsitään tilastollisin menetelmin solupopulaatioille, joiden on tunnistettu koneoppimisen menetelmin muodostavan transkriptomitasolla keskenään erilaisia joukkoja eli klustereita. Tutkijat voivat hyödyntää soluklustereita tutkimaan geeniekspressioeroja solutyyppien sisällä esimerkiksi sairaiden ja terveiden välillä, ja joskus klusterointi voi jopa tunnistaa uusia solutyyppejä. Yksisolutason mittaukset mahdollistavat myös solujen mallintamisen trajektorina, joka esittää kuinka solut kehittyvät dynaamisesti toisistaan geenien ilmentymistä vaativien prosessien aikana.
Tämä väitöskirja esittelee uusia laskennallisia menetelmiä solutyyppien ja trajektorien tunnistamiseen yksisoluisesta RNA-sekvensointidatasta. Väitöskirja esittelee uuden solutyyppitunnistusmenetelmän (ILoReg), joka mahdollistaa hienovaraisia geeniekspressioeroja sisältävien solutyyppien tunnistamisen. Sen lisäksi väitöskirjassa kehitettiin kaksi uutta trajektorin tunnistusmenetelmää: scShaper, joka on tarkka ja robusti menetelmä lineaaristen trajektorien tunnistamiseen, sekä Totem, joka on käyttäjäystävällinen ja joustava menetelmä puumallisten trajektorien tunnistamiseen. Lopuksi väitöskirjassa vertailtiin menetelmiä solutyyppien sisäisten geeniekspressioerojen tunnistamiseen ryhmien välillä, joissa on useita koehenkilöitä tai muita biologisia replikaatteja, mikä vaatii erityisiä menetelmiä väärien positiivisten löydösten vähentämiseen.
ASIASANAT: yksisoluinen RNA-sekvensointi, klusterointi, trajektorin tunnistus, geeniekspressi
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
µMatch: 3D shape correspondence for biological image data
Modern microscopy technologies allow imaging biological objects in 3D over a wide range of spatial and temporal scales, opening the way for a quantitative assessment of morphology. However, establishing a correspondence between objects to be compared, a first necessary step of most shape analysis workflows, remains challenging for soft-tissue objects without striking features allowing them to be landmarked. To address this issue, we introduce the μMatch 3D shape correspondence pipeline. μMatch implements a state-of-the-art correspondence algorithm initially developed for computer graphics and packages it in a streamlined pipeline including tools to carry out all steps from input data pre-processing to classical shape analysis routines. Importantly, μMatch does not require any landmarks on the object surface and establishes correspondence in a fully automated manner. Our open-source method is implemented in Python and can be used to process collections of objects described as triangular meshes. We quantitatively assess the validity of μMatch relying on a well-known benchmark dataset and further demonstrate its reliability by reproducing published results previously obtained through manual landmarking
Surface Shape Perception in Volumetric Stereo Displays
In complex volume visualization applications, understanding the displayed objects and their spatial relationships is challenging for several reasons. One of the most important obstacles is that these objects can be translucent and can overlap spatially, making it difficult to understand their spatial structures. However, in many applications, for example medical visualization, it is crucial to have an accurate understanding of the spatial relationships among objects. The addition of visual cues has the potential to help human perception in these visualization tasks. Descriptive line elements, in particular, have been found to be effective in conveying shape information in surface-based graphics as they sparsely cover a geometrical surface, consistently following the geometry. We present two approaches to apply such line elements to a volume rendering process and to verify their effectiveness in volume-based graphics. This thesis reviews our progress to date in this area and discusses its effects and limitations. Specifically, it examines the volume renderer implementation that formed the foundation of this research, the design of the pilot study conducted to investigate the effectiveness of this technique, the results obtained. It further discusses improvements designed to address the issues revealed by the statistical analysis. The improved approach is able to handle visualization targets with general shapes, thus making it more appropriate to real visualization applications involving complex objects
- …