360 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Applied and Computational Statistics
Research without statistics is like water in the sand; the latter is necessary to reap the benefits of the former. This collection of articles is designed to bring together different approaches to applied statistics. The studies presented in this book are a tiny piece of what applied statistics means and how statistical methods find their usefulness in different fields of research from theoretical frames to practical applications such as genetics, computational chemistry, and experimental design. This book presents several applications of the statistics: A new continuous distribution with five parameters—the modified beta Gompertz distribution; A method to calculate the p-value associated with the Anderson–Darling statistic; An approach of repeated measurement designs; A validated model to predict statement mutations score; A new family of structural descriptors, called the extending characteristic polynomial (EChP) family, used to express the link between the structure of a compound and its properties. This collection brings together authors from Europe and Asia with a specific contribution to the knowledge in regards to theoretical and applied statistics
Recommended from our members
Appropriate, accessible and appealing probabilistic graphical models
Appropriate - Many multivariate probabilistic models either use independent distributions or dependent Gaussian distributions. Yet, many real-world datasets contain count-valued or non-negative skewed data, e.g. bag-of-words text data and biological sequencing data. Thus, we develop novel probabilistic graphical models for use on count-valued and non-negative data including Poisson graphical models and multinomial graphical models. We develop one generalization that allows for triple-wise or k-wise graphical models going beyond the normal pairwise formulation. Furthermore, we also explore Gaussian-copula graphical models and derive closed-form solutions for the conditional distributions and marginal distributions (both before and after conditioning). Finally, we derive mixture and admixture, or topic model, generalizations of these graphical models to introduce more power and interpretability.
Accessible - Previous multivariate models, especially related to text data, often have complex dependencies without a closed form and require complex inference algorithms that have limited theoretical justification. For example, hierarchical Bayesian models often require marginalizing over many latent variables. We show that our novel graphical models (even the k-wise interaction models) have simple and intuitive estimation procedures based on node-wise regressions that likely have similar theoretical guarantees as previous work in graphical models. For the copula-based graphical models, we show that simple approximations could still provide useful models; these copula models also come with closed-form conditional and marginal distributions, which make them amenable to exploratory inspection and manipulation. The parameters of these models are easy to interpret and thus may be accessible to a wide audience.
Appealing - High-level visualization and interpretation of graphical models with even 100 variables has often been difficult even for a graphical model expert---despite visualization being one of the original motivators for graphical models. This difficulty is likely due to the lack of collaboration between graphical model experts and visualization experts. To begin bridging this gap, we develop a novel "what if?" interaction that manipulates and leverages the probabilistic power of graphical models. Our approach defines: the probabilistic mechanism via conditional probability; the query language to map text input to a conditional probability query; and the formal underlying probabilistic model. We then propose to visualize these query-specific probabilistic graphical models by combining the intuitiveness of force-directed layouts with the beauty and readability of word clouds, which pack many words into valuable screen space while ensuring words do not overlap via pixel-level collision detection. Although both the force-directed layout and the pixel-level packing problems are challenging in their own right, we approximate both simultaneously via adaptive simulated annealing starting from careful initialization. For visualizing mixture distributions, we also design a meaningful mapping from the properties of the mixture distribution to a color in the perceptually uniform CIELUV color space. Finally, we demonstrate our approach via illustrative visualizations of several real-world datasets.Computer Science
Gaussian latent tree model constraints for linguistics and other applications
The relationships between languages are often modelled as phylogenetic trees whereby there is a single shared ancestral language at the root and contemporary languages appear as leaves. These can be thought of as directed acyclic graphs with hidden variables, specifically Bayesian networks. However, from a statistical perspective there is often no formal assessment of the suitability of these latent tree models. A lot of the work that seeks to address this has focused on discrete variable models. However, when observations are instead considered as functional data, the high dimensional approximations are often better considered in a Gaussian context. The high dimensional data is often inefficiently stored and so the first challenge is to project this data to a low dimension while retaining the information of interest. One approach is to use the newly developed tool named separable-canonical variate analysis to form a basis.
Extending the techniques for assessing latent tree model compatibility to beyond discrete variables, the complete set of Gaussian tree constraints are derived for the first time. This set comprises equations and inequality statements in terms of correlations of observed variables. These statements must in theory be adhered to for a Gaussian latent tree model to be appropriate for a given data set. Using the separable-canonical variate analysis basis to obtain a truncated representation, the suitability of a phylogenetic tree can then be plainly assessed. However, in practice it is desirable to allow for some sampling error and as such probabilistic tools are developed alongside the theoretical derivation of Gaussian tree constraints.
The proposed methodology is implemented in an in-depth study of a real linguistic data set to assess the phylogenies of five Romance languages. This application is distinctive as the data set consists of acoustic recordings, these are treated as functional data, and moreover these are then being used to compare languages in a phylogenetic context. As a consequence a wide range of theory and tools are called upon from the multivariate and functional domains, and the powerful new separable-canonical function analysis and separable-canonical variate analysis are used. Utilising the newly derived Gaussian tree constraints for hidden variable models provides a first insight into features of spoken languages that appear to be tree-compatible
On the intelligent management of sepsis in the intensive care unit
The management of the Intensive Care Unit (ICU) in a hospital has its own, very specific requirements that involve, amongst
others, issues of risk-adjusted mortality and average length of stay; nurse turnover and communication with physicians; technical
quality of care; the ability to meet patient's family needs; and avoid medical error due rapidly changing circumstances and work
overload. In the end, good ICU management should lead to an improvement in patient outcomes.
Decision making at the ICU environment is a real-time challenge that works according to very tight guidelines, which relate to
often complex and sensitive research ethics issues. Clinicians in this context must act upon as much available information as
possible, and could therefore, in general, benefit from at least partially automated computer-based decision support based on
qualitative and quantitative information. Those taking executive decisions at ICUs will require methods that are not only reliable,
but also, and this is a key issue, readily interpretable. Otherwise, any decision tool, regardless its sophistication and accuracy,
risks being rendered useless.
This thesis addresses this through the design and development of computer based decision making tools to assist clinicians at
the ICU. It focuses on one of the main problems that they must face: the management of the Sepsis pathology. Sepsis is one of
the main causes of death for non-coronary ICU patients. Its mortality rate can reach almost up to one out of two patients for
septic shock, its most acute manifestation. It is a transversal condition affecting people of all ages. Surprisingly, its definition has
only been standardized two decades ago as a systemic inflammatory response syndrome with confirmed infection.
The research reported in this document deals with the problem of Sepsis data analysis in general and, more specifically, with the
problem of survival prediction for patients affected with Severe Sepsis. The tools at the core of the investigated data analysis
procedures stem from the fields of multivariate and algebraic statistics, algebraic geometry, machine learning and computational
intelligence.
Beyond data analysis itself, the current thesis makes contributions from a clinical point of view, as it provides substantial
evidence to the debate about the impact of the preadmission use of statin drugs in the ICU outcome. It also sheds light into the
dependence between Septic Shock and Multi Organic Dysfunction Syndrome. Moreover, it defines a latent set of Sepsis
descriptors to be used as prognostic factors for the prediction of mortality and achieves an improvement on predictive capability
over indicators currently in use.La gestió d'una Unitat de Cures Intensives (UCI) hospitalària presenta uns requisits força específics incloent, entre altres, la disminució de la taxa de mortalitat, la durada de l'ingrès, la rotació d'infermeres i la comunicació entre metges amb al finalitad de donar una atenció de qualitat atenent als requisits tant dels malalts com dels familiars. També és força important controlar i minimitzar els error mèdics deguts a canvis sobtats i a la presa ràpida de deicisions assistencials. Al cap i a la fi, la bona gestió de la UCI hauria de resultar en una reducció de la mortalitat i durada d'estada.
La presa de decisions en un entorn de crítics suposa un repte de presa de decisions en temps real d'acord a unes guies clíniques molt restrictives i que, pel que fa a la recerca, poden resultar en problemes ètics força sensibles i complexos. Per tant, el personal sanitari que ha de prendre decisions sobre la gestió de malalts crítics no només requereix eines de suport a la decisió que siguin fiables sinó que, a més a més, han de ser interpretables. Altrament qualsevol eina de decisió que no presenti aquests trets no és considerarà d'utilitat clínica.
Aquesta tesi doctoral adreça aquests requisits mitjançant el desenvolupament d'eines de suport a la decisió per als intensivistes i
es focalitza en un dels principals problemes als que s'han denfrontar: el maneig del malalt sèptic. La Sèpsia és una de les principals causes de mortalitats a les UCIS no-coronàries i la seva taxa de mortalitat pot arribar fins a la meitat dels malalts amb xoc sèptic, la seva manifestació més severa. La Sèpsia és un síndrome transversal, que afecta a persones de totes les edats. Sorprenentment, la seva definició ha estat estandaritzada, fa només vint anys, com a la resposta inflamatòria sistèmica a una infecció corfimada.
La recerca presentada en aquest document fa referència a l'anàlisi de dades de la Sèpsia en general i, de forma més específica, al problema de la predicció de la supervivència de malalts afectats amb Sèpsia Greu. Les eines i mètodes que formen la clau de bòveda d'aquest treball provenen de diversos camps com l'estadística multivariant i algebràica, geometria algebraica, aprenentatge automàtic i inteligència computacional.
Més enllà de l'anàlisi per-se, aquesta tesi també presenta una contribució des de el punt de vista clínic atès que presenta evidència substancial en el debat sobre l'impacte de l'administració d'estatines previ a l'ingrès a la UCI en els malalts sèptics. També s'aclareix la forta dependència entre el xoc sèptic i el Síndrome de Disfunció Multiorgànica. Finalment, també es defineix un conjunt de descriptors latents de la Sèpsia com a factors de pronòstic per a la predicció de la mortalitat, que millora sobre els mètodes actualment més utilitzats en la UCI
- …