112 research outputs found
Exploring Dependence with Data on Spatial Lattices
The application of Markov random field models to problems involving spatial data on lattice systems requires decisions regarding a number of important aspects of model structure. Existing exploratory techniques appropriate for spatial data do not provide direct guidance to an investigator about these decisions. We introduce an exploratory quantity that is directly tied to the structure of Markov random field models based on one parameter exponential family conditional distributions. This exploratory diagnostic is shown to be a meaningful statistic that can inform decisions involved in modeling spatial structure with statistical dependence terms. In this article, we develop the diagnostic, show that it has stable statistical behavior, illustrate its use in guiding modeling decisions with simulated examples, and demonstrate that these properties have use in applications
The substrate of the biopsychosocial influences in the carcinogenesis of the digestive tract
Digestive cancer represents a severe public health problem, being one of the main causes of death. It is considered a multifactorial disease, with hereditary predisposition, environmental factors, and other factors involved in carcinogenesis. Both the evolution and the pathogenesis of digestive neoplasms remain incompletely elucidated. As a multifactorial disease, it can be approached by taking into account the biopsychosocial influences via enteric nervous system. Many peptides and non-peptides having a neurotransmitter role can be found in the enteric nervous system, which can influence the neoplastic process directly or indirectly by affecting some angiogenic, growth, and metastasis factors. However, neurotransmitters can also cause directly, through intercellular signalizing, the angiogenesis, the proliferation, and the digestive neoplasms’ metastasis. This new approach to neoplasms of the digestive tube assumes broader psychosocial factors can play an important role in the understanding the ethiopathogenie, the evolution of the disease, and determination of possible molecular targeted therapies; it also suggests that behavioral strategies may be important for maintaining a healthy state with respect to the digestive tract
Supervised keyphrase extraction as positive unlabeled learning
This paper shows that performance of trained keyphrase extractors approximates a classifier trained on articles labeled by multiple annotators, leading to higher average F₁ scores and better rankings of keyphrases
Clinical-evolutional particularities of the cryoglobulinemic vasculitis in the case of a patient diagnosed with hepatitis C virus in the predialitic phase
Hepatitis C virus (HCV) represents a fundamental issue for public health, with long term evolution and the gradual appearance of several complications and associated pathologies. One of these pathologies is represented by cryoglobulinemic vasculitis, a disorder characterized by the appearance in the patient’s serum of the cryoglobulins, which typically precipitate at temperatures below normal body temperature (37°C) and dissolve again if the serum is heated. Here, we describe the case of a patient diagnosed with HCV that, during the evolution of the hepatic disease, developed a form of cryoglobulinemic vasculitis. The connection between the vasculitis and the hepatic disorder was revealed following treatment with interferon, with the temporary remission of both pathologies and subsequent relapse at the end of the 12 months of treatment, the patient becoming a non-responder. The particularity of the case is represented by both the severity of the vasculitic disease from its onset and the deterioration of renal function up to the predialitic phase, a situation not typical of the evolution of cryoglobulinemia. Taking into account the hepatic disorder, the inevitable evolution towards cirrhosis, and the risk of developing the hepatocellular carcinoma, close monitoring is necessary
Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models
Parameters of Gaussian multivariate models are often estimated using the maximum likelihood approach. In spite of its merits, this methodology is not practical when the sample size is very large, as, for example, in the case of massive georeferenced data sets. In this paper, we study the asymptotic properties of the estimators that minimize three alternatives to the likelihood function, designed to increase the computational efficiency. This is achieved by applying the information sandwich technique to expansions of the pseudo-likelihood functions as quadratic forms of independent normal random variables. Theoretical calculations are given for a first-order autoregressive time series and then extended to a two-dimensional autoregressive process on a lattice. We compare the efficiency of the three estimators to that of the maximum likelihood estimator as well as among themselves, using numerical calculations of the theoretical results and simulations
Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling
Background: Identification of functionally important sites in biomolecular sequences has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental determination of such sites lags far behind the number of known biomolecular sequences. Hence, there is a need to develop reliable computational methods for identifying functionally important sites from biomolecular sequences.
Results: We present a mixture of experts approach to biomolecular sequence labeling that takes into account the global similarity between biomolecular sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian techniques to combine the predictions of the experts. We evaluate our approach on two biomolecular sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biomolecular sequence data.
Conclusion: The mixture of experts model helps improve the performance of machine learning methods for identifying functionally important sites in biomolecular sequences.This is a proceeding from IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 10 (2009): S4, doi: 10.1186/1471-2105-10-S4-S4. Posted with permission.</p
The compiler for the XMTC parallel language: Lessons for compiler developers and in-depth description
In this technical report, we present information on the XMTC compiler
and language. We start by presenting the XMTC Memory Model and the
issues we encountered when using GCC, the popular GNU compiler for C and
other sequential languages, as the basis for a compiler for XMTC, a
parallel language. These topics, along with some information on XMT
specific optimizations were presented in [10]. Then, we proceed to give
some more details on how outer spawn statements (i.e., parallel loops)
are compiled to take advantage of XMT’s unique hardware primitives for
scheduling flat parallelism and how we incremented this basic compiler
to support nested parallelism
Bayesian Dynamic Linear Models for Estimation of Phenological Events from Remote Sensing Data
Estimating the timing of the occurrence of events that characterize growth cycles in vegetation from time series of remote sensing data is desirable for a wide area of applications. For example, the timings of plant life cycle events are very sensitive to weather conditions and are often used to assess the impacts of changes in weather and climate. Likewise, understanding crop phenology can have a large impact on agricultural strategies. To study phenology using remote sensing data, the timings of annual phenological events must be estimated from noisy time series that may have many missing values. Many current state-of-the-art methods consist of smoothing time series and estimating events as features of smoothed curves. A shortcoming of many of these methods is that they do not easily handle missing values and require imputation as a preprocessing step. In addition, while some currently used methods may be extendable to allow for temporal uncertainty quantification, uncertainty intervals are not usually provided with phenological event estimates. We propose methodology utilizing Bayesian dynamic linear models to estimate the timing of key phenological events from remote sensing data with uncertainty intervals. We illustrate the methodology on weekly vegetation index data from 2003 to 2007 over a region of southern India, focusing on estimating the timing of start of season and peak of greenness. Additionally, we present methods utilizing the Bayesian formulation and MCMC simulation of the model to estimate the probability that more than one growing season occurred in a given year. Supplementary materials accompanying this paper appear online. © 2018, International Biometric Society
- …