71 research outputs found

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Discovering robust dependencies from data

    Get PDF
    Science revolves around forming hypotheses, designing experiments, collecting data, and tests. It was not until recently, with the advent of modern hardware and data analytics, that science shifted towards a big-data-driven paradigm that led to an unprecedented success across various fields. What is perhaps the most astounding feature of this new era, is that interesting hypotheses can now be automatically discovered from observational data. This dissertation investigates knowledge discovery procedures that do exactly this. In particular, we seek algorithms that discover the most informative models able to compactly “describe” aspects of the phenomena under investigation, in both supervised and unsupervised settings. We consider interpretable models in the form of subsets of the original variable set. We want the models to capture all possible interactions, e.g., linear, non-linear, between all types of variables, e.g., discrete, continuous, and lastly, we want their quality to be meaningfully assessed. For this, we employ information-theoretic measures, and particularly, the fraction of information for the supervised setting, and the normalized total correlation for the unsupervised. The former measures the uncertainty reduction of the target variable conditioned on a model, and the latter measures the information overlap of the variables included in a model. Without access to the true underlying data generating process, we estimate the aforementioned measures from observational data. This process is prone to statistical errors, and in our case, the errors manifest as biases towards larger models. This can lead to situations where the results are utterly random, hindering therefore further analysis. We correct this behavior with notions from statistical learning theory. In particular, we propose regularized estimators that are unbiased under the hypothesis of independence, leading to robust estimation from limited data samples and arbitrary dimensionalities. Moreover, we do this for models consisting of both discrete and continuous variables. Lastly, to discover the top scoring models, we derive effective optimization algorithms for exact, approximate, and heuristic search. These algorithms are powered by admissible, tight, and efficient-to-compute bounding functions for our proposed estimators that can be used to greatly prune the search space. Overall, the products of this dissertation can successfully assist data analysts with data exploration, discovering powerful description models, or concluding that no satisfactory models exist, implying therefore new experiments and data are required for the phenomena under investigation. This statement is supported by Materials Science researchers who corroborated our discoveries.In der Wissenschaft geht es um Hypothesenbildung, Entwerfen von Experimenten, Sammeln von Daten und Tests. Jüngst hat sich die Wissenschaft, durch das Aufkommen moderner Hardware und Datenanalyse, zu einem Big-Data-basierten Paradigma hin entwickelt, das zu einem beispiellosen Erfolg in verschiedenen Bereichen geführt hat. Ein erstaunliches Merkmal dieser neuen ra ist, dass interessante Hypothesen jetzt automatisch aus Beobachtungsdaten entdeckt werden k nnen. In dieser Dissertation werden Verfahren zur Wissensentdeckung untersucht, die genau dies tun. Insbesondere suchen wir nach Algorithmen, die Modelle identifizieren, die in der Lage sind, Aspekte der untersuchten Ph nomene sowohl in beaufsichtigten als auch in unbeaufsichtigten Szenarien kompakt zu “beschreiben”. Hierzu betrachten wir interpretierbare Modelle in Form von Untermengen der ursprünglichen Variablenmenge. Ziel ist es, dass diese Modelle alle m glichen Interaktionen erfassen (z.B. linear, nicht-lineare), zwischen allen Arten von Variablen unterscheiden (z.B. diskrete, kontinuierliche) und dass schlussendlich ihre Qualit t sinnvoll bewertet wird. Dazu setzen wir informationstheoretische Ma e ein, insbesondere den Informationsanteil für das überwachte und die normalisierte Gesamtkorrelation für das unüberwachte Szenario. Ersteres misst die Unsicherheitsreduktion der Zielvariablen, die durch ein Modell bedingt ist, und letztere misst die Informationsüberlappung der enthaltenen Variablen. Ohne Kontrolle des Datengenerierungsprozesses werden die oben genannten Ma e aus Beobachtungsdaten gesch tzt. Dies ist anf llig für statistische Fehler, die zu Verzerrungen in gr  eren Modellen führen. So entstehen Situationen, wobei die Ergebnisse v llig zuf llig sind und somit weitere Analysen st ren. Wir korrigieren dieses Verhalten mit Methoden aus der statistischen Lerntheorie. Insbesondere schlagen wir regularisierte Sch tzer vor, die unter der Hypothese der Unabh ngigkeit nicht verzerrt sind und somit zu einer robusten Sch tzung aus begrenzten Datenstichproben und willkürlichen-Dimensionalit ten führen. Darüber hinaus wenden wir dies für Modelle an, die sowohl aus diskreten als auch aus kontinuierlichen Variablen bestehen. Um die besten Modelle zu entdecken, leiten wir effektive Optimierungsalgorithmen mit verschiedenen Garantien ab. Diese Algorithmen basieren auf speziellen Begrenzungsfunktionen der vorgeschlagenen Sch tzer und erlauben es den Suchraum stark einzuschr nken. Insgesamt sind die Produkte dieser Arbeit sehr effektiv für die Wissensentdeckung. Letztere Aussage wurde von Materialwissenschaftlern best tigt

    A review of methods for modelling both Gaussian and Non-Gaussian longitudinal data with application.

    Get PDF
    M. Sc. University of KwaZulu-Natal, Pietermaritzburg 2015.The study of longitudinal data plays an integral role in medicine, epidemiology, social science, biomedical and health sciences research where repeated measurements are obtained over time for each individual. Generally, the interest is in the dependence of the outcome variable on the covariates. The analysis of the data from longitudinal studies requires special techniques, which take into account the fact that the repeated measurements within one individual are correlated. In review of this work, we explore modern developments in the area of linear and nonlinear generalized mixed-effects regression models and various alternatives including generalized estimating equations for analysis of longitudinal data and correspondence analysis (CA). Methods are described for continuous and normally distributed as well as categorical variables. We apply this theory to the analysis of complete longitudinal data from National Institute of Environment Health Sciences (NIEHS) focusing on the relationship between blood lead levels (PbB) and some associated covariates. The results show that Placebo-treated children had a gradual (occuring) decrease in blood lead level. Succimer-treated children had an abrupt (unexpected) drop in blood lead level, followed by rebound. The average mean blood lead level of the succimer-treated children after initiation of treatment was 19.14 ÎŒg/dL lower than that of placebo-treated children. After randomization, blood lead levels had fallen by similar amounts in both chelated and placebo children, despite the immediate drops in the chelated group; there was no association between change in blood lead level and change in cognitive test score. Blood lead levels continued to fall

    Medical image registration and soft tissue deformation for image guided surgery system

    Get PDF
    In parallel with the developments in imaging modalities, image-guided surgery (IGS) can now provide the surgeon with high quality three-dimensional images depicting human anatomy. Although IGS is now in widely use in neurosurgery, there remain some limitations that must be overcome before it can be employed in more general minimally invasive procedures. In this thesis, we have developed several contributions to the field of medical image registration and brain tissue deformation modeling. From the methodology point of view, medical image registration algorithms can be classified into feature-based and intensity-based methods. One of the challenges faced by feature-based registration would be to determine which specific type of feature is desired for a given task and imaging type. For this reason, a point set registration using points and curves feature is proposed, which has the accuracy of registration based on points and the robustness of registration based on lines or curves. We have also tackled the problem on rigid registration of multimodal images using intensity-based similarity measures. Mutual information (MI) has emerged in recent years as a popular similarity metric and widely being recognized in the field of medical image registration. Unfortunately, it ignores the spatial information contained in the images such as edges and corners that might be useful in the image registration. We introduce a new similarity metric, called Adaptive Mutual Information (AMI) measure which incorporates the gradient spatial information. Salient pixels in the regions with high gradient value will contribute more in the estimation of mutual information of image pairs being registered. Experimental results showed that our proposed method improves registration accuracy and it is more robust to noise images which have large deviation from the reference image. Along with this direction, we further improve the technique to simultaneously use all information obtained from multiple features. Using multiple spatial features, the proposed algorithm is less sensitive to the effect of noise and some inherent variations, giving more accurate registration. Brain shift is a complex phenomenon and there are many different reasons causing brain deformation. We have investigated the pattern of brain deformation with respect to location and magnitude and to consider the implications of this pattern for correcting brain deformation in IGS systems. A computational finite element analysis was carried out to analyze the deformation and stress tensor experienced by the brain tissue during surgical operations. Finally, we have developed a prototype visualization display and navigation platform for interpretation of IGS. The system is based upon Qt (cross-platform GUI toolkit) and it integrates VTK (an object-oriented visualization library) as the rendering kernel. Based on the construction of a visualization software platform, we have laid a foundation on the future research to be extended to implement brain tissue deformation into the system

    Stroke-related alterations in inter-areal communication

    Get PDF
    Beyond causing local ischemia and cell damage at the site of injury, stroke strongly affects long-range anatomical connections, perturbing the functional organization of brain networks. Several studies reported functional connectivity abnormalities parallelling both behavioral deficits and functional recovery across different cognitive domains. FC alterations suggest that long-range communication in the brain is altered after stroke. However, standard FC analyses cannot reveal the directionality and time scale of inter-areal information transfer. We used resting-state fMRI and covariance-based Granger causality analysis to quantify network-level information transfer and its alteration in stroke. Two main large-scale anomalies were observed in stroke patients. First, inter-hemispheric information transfer was significantly decreased with respect to healthy controls. Second, stroke caused inter-hemispheric asymmetries, as information transfer within the affected hemisphere and from the affected to the intact hemisphere was significantly reduced. Both anomalies were more prominent in resting-state networks related to attention and language, and they correlated with impaired performance in several behavioral domains. Overall, our findings support the hypothesis that stroke provokes asymmetries between the affected and spared hemisphere, with different functional consequences depending on which hemisphere is lesioned

    Compression of DNA sequencing data

    Get PDF
    With the release of the latest generations of sequencing machines, the cost of sequencing a whole human genome has dropped to less than US$1,000. The potential applications in several fields lead to the forecast that the amount of DNA sequencing data will soon surpass the volume of other types of data, such as video data. In this dissertation, we present novel data compression technologies with the aim of enhancing storage, transmission, and processing of DNA sequencing data. The first contribution in this dissertation is a method for the compression of aligned reads, i.e., read-out sequence fragments that have been aligned to a reference sequence. The method improves compression by implicitly assembling local parts of the underlying sequences. Compared to the state of the art, our method achieves the best trade-off between memory usage and compressed size. Our second contribution is a method for the quantization and compression of quality scores, i.e., values that quantify the error probability of each read-out base. Specifically, we propose two Bayesian models that are used to precisely control the quantization. With our method it is possible to compress the data down to 0.15 bit per quality score. Notably, we can recommend a particular parametrization for one of our models which—by removing noise from the data as a side effect—does not lead to any degradation in the distortion metric. This parametrization achieves an average rate of 0.45 bit per quality score. The third contribution is the first implementation of an entropy codec compliant to MPEG-G. We show that, compared to the state of the art, our method achieves the best compression ranks on average, and that adding our method to CRAM would be beneficial both in terms of achievable compression and speed. Finally, we provide an overview of the standardization landscape, and in particular of MPEG-G, in which our contributions have been integrated.Mit der EinfĂŒhrung der neuesten Generationen von Sequenziermaschinen sind die Kosten fĂŒr die Sequenzierung eines menschlichen Genoms auf weniger als 1.000 US-Dollar gesunken. Es wird prognostiziert, dass die Menge der Sequenzierungsdaten bald diejenige anderer Datentypen, wie z.B. Videodaten, ĂŒbersteigen wird. Daher werden in dieser Arbeit neue Datenkompressionsverfahren zur Verbesserung der Speicherung, Übertragung und Verarbeitung von Sequenzierungsdaten vorgestellt. Der erste Beitrag in dieser Arbeit ist eine Methode zur Komprimierung von alignierten Reads, d.h. ausgelesenen Sequenzfragmenten, die an eine Referenzsequenz angeglichen wurden. Die Methode verbessert die Komprimierung, indem sie die Reads nutzt, um implizit lokale Teile der zugrunde liegenden Sequenzen zu schĂ€tzen. Im Vergleich zum Stand der Technik erzielt die Methode das beste Ergebnis in einer gemeinsamen Betrachtung von Speichernutzung und erzielter Komprimierung. Der zweite Beitrag ist eine Methode zur Quantisierung und Komprimierung von QualitĂ€tswerten, welche die Fehlerwahrscheinlichkeit jeder ausgelesenen Base quantifizieren. Konkret werden zwei Bayes’sche Modelle vorgeschlagen, mit denen die Quantisierung prĂ€zise gesteuert werden kann. Mit der vorgeschlagenen Methode können die Daten auf bis zu 0,15 Bit pro QualitĂ€tswert komprimiert werden. Besonders hervorzuheben ist, dass eine bestimmte Parametrisierung fĂŒr eines der Modelle empfohlen werden kann, die – durch die Entfernung von Rauschen aus den Daten als Nebeneffekt – zu keiner Verschlechterung der Verzerrungsmetrik fĂŒhrt. Mit dieser Parametrisierung wird eine durchschnittliche Rate von 0,45 Bit pro QualitĂ€tswert erreicht. Der dritte Beitrag ist die erste Implementierung eines MPEG-G-konformen Entropie-Codecs. Es wird gezeigt, dass der vorgeschlagene Codec die durchschnittlich besten Kompressionswerte im Vergleich zum Stand der Technik erzielt und dass die Aufnahme des Codecs in CRAM sowohl hinsichtlich der erreichbaren Kompression als auch der Geschwindigkeit von Vorteil wĂ€re. Abschließend wird ein Überblick ĂŒber Standards zur Komprimierung von Sequenzierungsdaten gegeben. Insbesondere wird hier auf MPEG-G eingangen, da alle BeitrĂ€ge dieser Arbeit in MPEG-G integriert wurden
    • 

    corecore