442 research outputs found

    A Bayesian nonparametric approach for the analysis of multiple categorical item responses

    Get PDF
    We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogeneous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    The Applications of Mixtures of Normal Distributions in Empirical Finance: A Selected Survey

    Get PDF
    This paper provides a selected review of the recent developments and applications of mixtures of normal (MN) distribution models in empirical finance. Once attractive property of the MN model is that it is flexible enough to accommodate various shapes of continuous distributions, and able to capture leptokurtic, skewed and multimodal characteristics of financial time series data. In addition, the MN-based analysis fits well with the related regime-switching literature. The survey is conducted under two broad themes: (1) minimum-distance estimation methods, and (2) financial modeling and its applications.Mixtures of Normal, Maximum Likelihood, Moment Generating Function, Characteristic Function, Switching Regression Model, (G) ARCH Model, Stochastic Volatility Model, Autoregressive Conditional Duration Model, Stochastic Duration Model, Value at Risk.

    Applications of Vine Copulas in Commodity Risk Management and Price Analysis

    Get PDF
    This dissertation consists of three studies that focus on applications of vine copulas, a relatively new class of multivariate copula approach, in commodity risk management and price analysis. The first study proposes a vine copula approach to estimate multiproduct hedge ratios that minimize the risk of refining margin erosion – the downside risk facing a typical oil refinery whose profit greatly depends on its refining margin or the difference between the prices of its refined products and the cost of crude oil. The out-of-sample hedging effectiveness of two popular classes of vine copula models – canonical (C-) and drawable (D-) vine copula models – are evaluated and compared with that of a widely used nonparametric method and three standard multivariate copula models. The empirical results reveal that the D-vine copula model seems to be a good and safe choice in managing the downside risk of the refinery. The second study explores the importance of modeling heterogeneous dependence structures between different pairs of energy commodity returns with vine copulas in improving one-step-ahead density forecasts of these returns. The value of modeling heterogeneous dependence structures is measured by comparing the performance of density forecasts based on vine copulas with density forecasts based on standard copulas that assume homogeneous dependence structures. The empirical results suggest that modeling heterogeneous dependence structures using vine copulas does not help improve quality of multivariate density forecasts of energy commodity returns. The third study applies a vine copula approach to analyze the dependence structure and tail dependence patterns among daily prices of three agricultural commodities (corn, soybean, and wheat) and two energy commodities (ethanol and crude oil) from June 2006 to June 2016. Our findings suggest that the prices of corn and crude oil are linked through the ethanol market. We also find that crude oil and agricultural commodity prices are statistically dependent during the extreme market downturns but independent during the extreme market upturns. Moreover, the results from our sub-sample analysis show that both the upper and lower tail dependence between crude oil and other commodity markets become weaker in the recent years when the ethanol market became more mature

    Understanding and targeting network-level sheddase regulation in invasive disease

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-212).Regulated cell-surface proteolysis underpins key processes of cellular growth and motility in both physiological and pathological contexts. However, comprehending how multiple proteolytic events cohesively integrate to yield context-dependent cellular behavior remains a challenge in the fields of both protease biology and systems biology in general. This work begins to address that challenge by quantitatively investigating the integrated effect of multiple diverse proteolytic events and their interaction with cell-signaling pathways from a computational network perspective, particularly focusing on A Disintegrin and Metalloproteinases (ADAMs). ADAMs have been studied for decades as the principal cell-surface "sheddases" responsible for cleaving growth factor ligands and receptor tyrosine kinase ectodomains from the cell surface. However, activity regulation, feedback, and catalytic promiscuity impede our understanding of context-dependent sheddase function, and clinical trials targeting metalloproteinases in cancer have failed in part due to a poor understanding of the complex functions they mediate. This thesis outlines a conceptual framework for studying protease network biology (Chapter 1), describes novel experimental methods designed for such a framework (Chapters 2-3), and applies both to understand protease regulation in invasive disease (Chapter 4). Using combined measurement and computational modeling, we present a paradigm for monitoring and analyzing complex networks of protease activities that interface with signaling pathways to influence cellular migration in the invasive diseases of cancer and endometriosis. We find sheddase activity integrates with signaling pathways to direct cell migration, especially through concomitant proteolysis of both ligands and receptors. We find that indirect reduction of sheddase activity through kinase inhibition can lead to an accumulation of growth-factor receptors on the cell surface, consequently producing undesired compensatory signaling feedback. Thus, here we present a novel mechanism of rapid, protease-driven resistance to kinase inhibitors, and we subsequently demonstrate strategies for overcoming resistance through drug combinations. We develop a novel microfluidic platform to study protease activities in clinical samples, and apply the technology to study the peritoneal fluid from endometriosis patients. Results indicate joint dysregulation of sheddase activity with disease. Overall, this work provides a model for measuring, understanding, and targeting networks of proteases and the kinases with which they interact.by Miles Aaron Miller.Ph.D

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Multi-stream Longitudinal Data Analysis using Deep Learning

    Get PDF
    Longitudinal healthcare data encompasses all tasks where patients information are collected at multiple follow-up times. Analyzing this data is critical in addressing many real world problems in healthcare such as disease prediction and prevention. In this thesis, technical challenges in analyzing longitudinal administrative claims data are addressed and novel deep learning based models are proposed for multi-stream data analysis and disease prediction tasks. These algorithms and frameworks are assessed mainly on substance use disorders prediction tasks and specifically designed to tackled these disorders. Substance use disorder is a public health crisis costing the US an estimated $740 billion annually in healthcare, lost workplace productivity, and crime. Early identification and engagement of individuals at risk of developing a substance use disorder is a critical unmet need in healthcare which can be achieved by producing automatic artificial intelligence based tools trained using big healthcare data. In fact, healthcare data can be harnessed together with artificial intelligence and machine learning to advance our understanding of factors that increase the propensity for developing different diseases as well as those that aid in the treatment of these disorders. Here in, a disease prediction framework is first proposed based on recurrent neural networks. This framework includes three components: 1) data pre-processing, 2) disease prediction using long short term memory models, and 3) hypothesis exploration by varying the models and the inputs. This framework is assessed using two use cases: substance use disorder prediction and mild cognitive impairment prediction. Experimental results show that this proposed model can efficiently analyze patients\u27 data and creates efficient disease prediction tools. Second, the limitationsof current deep learning models including long short term memory models in claimsdata analysis are detected and addressed, and a novel model based on the transformer models is proposed. In fact, leveraging the real-world longitudinal claims data, a novel multi-stream transformer model is proposed for predicting opioid use disorder as an important case of substance use disorders. This model is designed to simultaneously analyze multiple types of data streams, such as medications, diagnoses, procedures and demographics, by attending to segments within and across these data streams. The proposed model tested on the IBM MarketScan data showed significantly better performance than the traditional models and recently developed deep learning models

    Analytical fusion of multimodal magnetic resonance imaging to identify pathological states in genetically selected Marchigian Sardinian alcohol-preferring (msP) rats

    Full text link
    [EN] Alcohol abuse is one of the most alarming issues for the health authorities. It is estimated that at least 23 million of European citizens are affected by alcoholism causing a cost around 270 million euros. Excessive alcohol consumption is related with physical harm and, although it damages the most of body organs, liver, pancreas, and brain are more severally affected. Not only physical harm is associated to alcohol-related disorders, but also other psychiatric disorders such as depression are often comorbiding. As well, alcohol is present in many of violent behaviors and traffic injures. Altogether reflects the high complexity of alcohol-related disorders suggesting the involvement of multiple brain systems. With the emergence of non-invasive diagnosis techniques such as neuroimaging or EEG, many neurobiological factors have been evidenced to be fundamental in the acquisition and maintenance of addictive behaviors, relapsing risk, and validity of available treatment alternatives. Alterations in brain structure and function reflected in non-invasive imaging studies have been repeatedly investigated. However, the extent to which imaging measures may precisely characterize and differentiate pathological stages of the disease often accompanied by other pathologies is not clear. The use of animal models has elucidated the role of neurobiological mechanisms paralleling alcohol misuses. Thus, combining animal research with non-invasive neuroimaging studies is a key tool in the advance of the disorder understanding. As the volume of data from very diverse nature available in clinical and research settings increases, an integration of data sets and methodologies is required to explore multidimensional aspects of psychiatric disorders. Complementing conventional mass-variate statistics, interests in predictive power of statistical machine learning to neuroimaging data is currently growing among scientific community. This doctoral thesis has covered most of the aspects mentioned above. Starting from a well-established animal model in alcohol research, Marchigian Sardinian rats, we have performed multimodal neuroimaging studies at several stages of alcohol-experimental design including the etiological mechanisms modulating high alcohol consumption (in comparison to Wistar control rats), alcohol consumption, and treatment with the opioid antagonist Naltrexone, a well-established drug in clinics but with heterogeneous response. Multimodal magnetic resonance imaging acquisition included Diffusion Tensor Imaging, structural imaging, and the calculation of magnetic-derived relaxometry maps. We have designed an analytical framework based on widely used algorithms in neuroimaging field, Random Forest and Support Vector Machine, combined in a wrapping fashion. Designed approach was applied on the same dataset with two different aims: exploring the validity of the approach to discriminate experimental stages running at subject-level and establishing predictive models at voxel-level to identify key anatomical regions modified during the experiment course. As expected, combination of multiple magnetic resonance imaging modalities resulted in an enhanced predictive power (between 3 and 16%) with heterogeneous modality contribution. Surprisingly, we have identified some inborn alterations correlating high alcohol preference and thalamic neuroadaptations related to Naltrexone efficacy. As well, reproducible contribution of DTI and relaxometry -related biomarkers has been repeatedly identified guiding further studies in alcohol research. In summary, along this research we demonstrate the feasibility of incorporating multimodal neuroimaging, machine learning algorithms, and animal research in the advance of the understanding alcohol-related disorders.[ES] El abuso de alcohol es una de las mayores preocupaciones de las autoridades sanitarias en la Unión Europea. El consumo de alcohol en exceso afecta en mayor o menor medida la totalidad del organismo siendo el páncreas e hígado los más severamente afectados. Además de estos, el sistema nervioso central sufre deterioros relacionados con el alcohol y con frecuencia se presenta en paralelo con otras patologías psiquiátricas como la depresión u otras adicciones como la ludopatía. La presencia de estas comorbidades demuestra la complejidad de la patología en la que multitud de sistemas neuronales interaccionan entre sí. El uso imágenes de resonancia magnética (RM) han ayudado en el estudio de enfermedades psiquiátricas facilitando el descubrimiento de mecanismos neurológicos fundamentales en el desarrollo y mantenimiento de la adicción al alcohol, recaídas y el efecto de los tratamientos disponibles. A pesar de los avances, todavía se necesita investigar más para identificar las bases biológicas que contribuyen a la enfermedad. En este sentido, los modelos animales sirven, por lo tanto, a discriminar aquellos factores únicamente relacionados con el alcohol controlando otros factores que facilitan el desarrollo del alcoholismo. Estudios de resonancia magnética en animales de laboratorio y su posterior evaluación en humanos juegan un papel fundamental en el entendimiento de las patologías psiquatricas como la addicción al alcohol. La imagen por resonancia magnética se ha integrado en entornos clínicos como prueba diagnósticas no invasivas. A medida que el volumen de datos se va incrementando, se necesitan herramientas y metodologías capaces de fusionar información de muy distinta naturaleza y así establecer criterios diagnósticos cada vez más exactos. El poder predictivo de herramientas derivadas de la inteligencia artificial como el aprendizaje automático sirven de complemento a tradicionales métodos estadísticos. En este trabajo se han abordado la mayoría de estos aspectos. Se han obtenido datos multimodales de resonancia magnética de un modelo validado en la investigación de patologías derivadas del consumo del alcohol, las ratas Marchigian-Sardinian desarrolladas en la Universidad de Camerino (Italia) y con consumos de alcohol comparables a los humanos. Para cada animal se han adquirido datos antes y después del consumo de alcohol y bajo dos condiciones de abstinencia (con y sin tratamiento de Naltrexona, una medicaciones anti-recaídas usada como farmacoterapia en el alcoholismo). Los datos de resonancia magnética multimodal consistentes en imágenes de difusión, de relaxometría y estructurales se han fusionado en un esquema analítico multivariable incorporando dos herramientas generalmente usadas en datos derivados de neuroimagen, Random Forest y Support Vector Machine. Nuestro esquema fue aplicado con dos objetivos diferenciados. Por un lado, determinar en qué fase experimental se encuentra el sujeto a partir de biomarcadores y por el otro, identificar sistemas cerebrales susceptibles de alterarse debido a una importante ingesta de alcohol y su evolución durante la abstinencia. Nuestros resultados demostraron que cuando biomarcadores derivados de múltiples modalidades de neuroimagen se fusionan en un único análisis producen diagnósticos más exactos que los derivados de una única modalidad (hasta un 16% de mejora). Biomarcadores derivados de imágenes de difusión y relaxometría discriminan estados experimentales. También se han identificado algunos aspectos innatos que están relacionados con posteriores comportamientos con el consumo de alcohol o la relación entre la respuesta al tratamiento y los datos de resonancia magnética. Resumiendo, a lo largo de esta tesis, se demuestra que el uso de datos de resonancia magnética multimodales en modelos animales combinados en esquemas analíticos multivariados es una herramienta válida en el entendimiento de patologías[CAT] L'abús de alcohol es una de les majors preocupacions per part de les autoritats sanitàries de la Unió Europea. Malgrat la dificultat de establir xifres exactes, se estima que uns 23 milions de europeus actualment sofreixen de malalties derivades del alcoholisme amb un cost que supera els 150.000 milions de euros per a la societat. Un consum de alcohol en excés afecta en major o menor mesura el cos humà sent el pàncreas i el fetge el més afectats. A més, el cervell sofreix de deterioraments produïts per l'alcohol i amb freqüència coexisteixen amb altres patologies com depressió o altres addiccions com la ludopatia. Tot aquest demostra la complexitat de la malaltia en la que múltiple sistemes neuronals interactuen entre si. Tècniques no invasives com el encefalograma (EEG) o imatges de ressonància magnètica (RM) han ajudat en l'estudi de malalties psiquiàtriques facilitant el descobriment de mecanismes neurològics fonamentals en el desenvolupament i manteniment de la addició, recaiguda i la efectivitat dels tractaments disponibles. Tot i els avanços, encara es necessiten més investigacions per identificar les bases biològiques que contribueixen a la malaltia. En aquesta direcció, el models animals serveixen per a identificar únicament dependents del abús del alcohol. Estudis de ressonància magnètica en animals de laboratori i posterior avaluació en humans jugarien un paper fonamental en l' enteniment de l'ús del alcohol. L'ús de probes diagnostiques no invasives en entorns clínics has sigut integrades. A mesura que el volum de dades es incrementa, eines i metodologies per a la fusió d' informació de molt distinta natura i per tant, establir criteris diagnòstics cada vegada més exactes. La predictibilitat de eines desenvolupades en el camp de la intel·ligència artificial com la aprenentatge automàtic serveixen de complement a mètodes estadístics tradicionals. En aquesta investigació se han abordat tots aquestes aspectes. Dades multimodals de ressonància magnètica se han obtingut de un model animal validat en l'estudi de patologies relacionades amb el consum d'alcohol, les rates Marchigian-Sardinian desenvolupades en la Universitat de Camerino (Italià) i amb consums d'alcohol comparables als humans. Per a cada animal es van adquirir dades previs i després al consum de alcohol i dos condicions diferents de abstinència (amb i sense tractament anti-recaiguda). Dades de ressonància magnètica multimodal constituides per imatges de difusió, de relaxometria magnètica i estructurals van ser fusionades en esquemes analítics multivariats incorporant dues metodologies validades en el camp de neuroimatge, Random Forest i Support Vector Machine. Nostre esquema ha sigut aplicat amb dos objectius diferenciats. El primer objectiu es determinar en quina fase experimental es troba el subjecte a partir de biomarcadors obtinguts per neuroimatge. Per l'altra banda, el segon objectiu es identificar el sistemes cerebrals susceptibles de ser alterats durant una important ingesta de alcohol i la seua evolució durant la fase del tractament. El nostres resultats demostraren que l'ús de biomarcadors derivats de varies modalitats de neuroimatge fusionades en un anàlisis multivariat produeixen diagnòstics més exactes que els derivats de una única modalitat (fins un 16% de millora). Biomarcadors derivats de imatges de difusió i relaxometria van contribuir de distints estats experimentals. També s'han identificat aspectes innats que estan relacionades amb posterior preferències d'alcohol o la relació entre la resposta al tractament anti-recaiguda i les dades de ressonància magnètica. En resum, al llarg de aquest treball, es demostra que l'ús de dades de ressonància magnètica multimodal en models animals combinats en esquemes analítics multivariats són una eina molt valida en l'enteniment i avanç de patologies psiquiàtriques com l'alcoholisme.Cosa Liñán, A. (2017). Analytical fusion of multimodal magnetic resonance imaging to identify pathological states in genetically selected Marchigian Sardinian alcohol-preferring (msP) rats [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90523TESI

    Learning classifier systems from first principles: A probabilistic reformulation of learning classifier systems from the perspective of machine learning

    Get PDF
    Learning Classifier Systems (LCS) are a family of rule-based machine learning methods. They aim at the autonomous production of potentially human readable results that are the most compact generalised representation whilst also maintaining high predictive accuracy, with a wide range of application areas, such as autonomous robotics, economics, and multi-agent systems. Their design is mainly approached heuristically and, even though their performance is competitive in regression and classification tasks, they do not meet their expected performance in sequential decision tasks despite being initially designed for such tasks. It is out contention that improvement is hindered by a lack of theoretical understanding of their underlying mechanisms and dynamics.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore