36 research outputs found

    Clustering DNA words through distance distributions

    Get PDF
    Functional data appear in several domains of science, for example, in biomedical, meteorologic or engineering studies. A functional observation can exhibit an atypical behaviour during a short or a large part of the domain and this may be due to magnitude or to shape features. Over the last ten years many outlier detection methods have been proposed. In this work we use the functional data framework to investigate the existence of DNA words with outlying distance distribution, which may be related with biological motifs. A DNA word is a sequence defined in the genome alphabet {ACGT}. Distances between successive occurrences of the same word allow defining the inter-word distance distribution, interpretable as a discrete function. Each word length is associated with a functional dataset formed by 4 distance distributions. As the word length increases, greater is the diversity of observed patterns in the functional dataset and larger is the number of distributions displaying strong peaks of frequency. We propose a two-step procedure to detect words with an outlying pattern of distances: first, the functions are clustered according to their global trend; then, an outlier detection method is applied within each cluster. Each distribution trend is obtained by data smoothing, which avoids some distributions’ peaks, and similarities between smoothed data are explored through hierarchical complete linkage clustering. The dissimilarity between functions is evaluated using the Euclidean distance or the Generalized Minimum distance [1], which considers the dependence between domain points. The resulting dendograms are then cut leading to a partition of the distance distributions. For the second step we use the Directional Outlyingness measure which assigns a robust measure of outlyingness to each domain point and is the building block of a graphical tool for visualization of the centrality of the curves [2]. We focus on the human genome and words of length ≤ 7. Results are compared with those obtained by applying only the second step of the procedure [3].publishe

    Meta-analysis of a very low proportion through adjusted wald confidence intervals

    Get PDF
    In this paper we will discuss the meta-analysis of one low proportion. It is well known, that there are several methods to perform the meta-analysis of one proportion, based on a linear combination of proportions or transformed proportions. However, in the context of a linear combination of binomial proportions has been proposed some approximate estimators with some improvements on low proportion estimation. In this paper we will show, with a simple adaptation, the possible contribution of several approximate adjusted Wald confidence intervals (CIs) for the meta-analysis of proportions. In the context of low proportions, a simulation study scenario is carried out to compare these CIs amongst themselves and with other available methods with respect to bias and coverage probabilities, using the fixed effect or the random-effects model. Pointing our interest in rare events (analogous for the abundant events) and taking into account the prevalence estimation of the Methicillin-resistant Staphylococcus aureus with mecc gene, we discuss the choice of the meta-analysis methods on this low proportion. The default meta-analysis methods of meta-analysis software programs are not always the best choice, in particular to the meta-analysis of one low proportion, where the methods including the adjusted Wald can outperform.publishe

    Electrocardiography in hypertensive patients without cardiovascular events: a valuable predictor tool?

    Get PDF
    Background. Hypertension is an important risk factor of cardiovascular (CV) disease. An early diagnosis of target organ damage could prevent major CV events. Electrocardiography (ECG) is a valuable clinical technique, with wide availability and high speci city, used in evaluation of hypertensive patients. However, the use of ECG as a predictor tool is controversial given its low sensitivity. is study aims to characterise ECG features in a hypertensive population and identify ECG abnormalities that could predict CV events. Methods. We studied 175 hypertensive patients without previous CV events during a follow-up mean of 4.0 ± 2.20 years. ECGs and pulse wave velocity were performed in all patients. Clinical characteristics and ECG abnormalities were evaluated and compared between the patients as they presented CV events. Results. Considering the 175 patients (53.14% male), the median age was 62 years. Median systolic blood pressure was 140 mmHg and diastolic blood pressure was 78 mmHg. Median PWV was 9.8 m/s. Of the patients, 39.4% were diabetic, 78.3% had hyperlipidaemia, and 16.0% had smoking habits. ECG identi ed left ventricular (LV) hypertrophy in 29.71% of the patients, and a LV strain pattern was present in 9.7% of the patients. Twenty-nine patients (16.57%) had a CV event. Comparative analyses showed statistical signi cance for the presence of a LV strain pattern in patients with CV events (p 0.01). Univariate and multivariate analysis con rmed that a LV strain pattern was an independent predictor of CV event (HR 2.66, 95% IC 1.01–7.00). In the survival analysis, the Kaplan–Meier curve showed a worse prognosis for CV events in patients with a LV strain pattern (p 0.014). Conclusion. ECG is a useful daily method to identify end-organ damage in hypertensive patients. In our study, we also observed that it may be a valuable tool for the prediction of CV events.publishe

    Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances

    Get PDF
    The mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected

    Clusters of functional status in COPD: an exploratory analysis

    Get PDF
    Functional status is highly meaningful to the daily life of people with COPD but is often overlooked by treatmentoptions. Understanding its heterogeneity, might contribute to better personalised care. We aimed to explore clustersof functional status in people with COPD. Lung function, impact of the disease, activity-related dyspnoea and functional status were collected cross-sectionally.The 6-minute walk test, 1-minute sit-to-stand test, quadriceps maximum voluntary contraction and handgrip musclestrength were used to group individuals to clusters (K-means clustering). Total within cluster sum of squares wascomputed for different values of k and the optimum number of clusters was defined as the inflexion point on thecurve. Differences between clusters were explored using ANOVA and post-hoc multiple pairwise comparisons. 127 people with COPD (82% male, 68±8 years, FEV1 56±20 %pred) were included in the analysis. 4 clusters werefound (Fig. 1): ‘over-achievers’ (Cluster 2, n=30); ‘achievers’ (Cluster 1, n=28); ‘partial-achievers’ (Cluster 4, n=39);‘non-achievers’ (Cluster 3, n=29). Our 4 clusters of functional status may guide tailored treatment regimens to improve this highly meaningful outcome.Cluster validity, their behaviour over time and differential response to treatment needs further investigation.publishe

    Evaluating COVID-19 in Portugal: Bootstrap confidence interval

    Get PDF
    In this paper, we consider a compartmental model to fit the real data of confirmed active cases with COVID-19 in Portugal, from March 2, 2020 until September 10, 2021 in the Primary Care Cluster in Aveiro region, ACES BV, reported to the Public Health Unit. The model includes a deterministic component based on ordinary differential equations and a stochastic component based on bootstrap methods in regression. The main goal of this work is to take into account the variability underlying the data set and analyse the estimation accuracy of the model using a residual bootstrapped approach in order to compute confidence intervals for the prediction of COVID-19 confirmed active cases. All numerical simulations are performed in R environment ( version. 4.0.5). The proposed algorithm can be used, after a suitable adaptation, in other communicable diseases and outbreaks.info:eu-repo/semantics/publishedVersio

    Statistical, computational and visualization methodologies to unveil gene primary structure features

    Get PDF
    Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.publishe

    Evaluation of vancomycin MIC creep in methicillin-resistant Staphylococcus aureus infections-a systematic review and meta-analysis

    Get PDF
    Vancomycin is currently the primary option treatment for methicillin-resistant Staphylococcus aureus (MRSA). However, an increasing number of MRSA isolates with high MICs, within the susceptible range (vancomycin MIC creep), are being reported worldwide. Resorting to a meta-analysis approach, this study aims to assess the evidence of vancomycin MIC creep.publishe

    COPD profiles and treatable traits using minimal resources: identification, decision tree and longitudinal stability

    Get PDF
    Introduction: Chronic obstructive pulmonary disease (COPD) is highly heterogeneous and complex. Hence, personalising assessments and treatments to this population across different settings and available resources imposes challenges and debate. Research efforts have been made to identify clinical phenotypes or profiles for prognostic and therapeutic purposes. Nevertheless, such profiles often do not describe treatable traits, focus on complex physiological/ pulmonary measures which are frequently not available across settings, lack validation and/or their stability over time is unknown. Objectives: To identify profiles and their treatable traits based on simple and meaningful measures; to develop and validate a profile decision tree; and to explore profiles’ stability over time in people with COPD. Methods: An observational, prospective study was conducted with people with COPD. Clinical characteristics, lung function, symptoms, impact of the disease (COPD assessment test–CAT), healthrelated quality of life, physical activity, lower-limb muscle strength and functional status were collected cross-sectionally and a subsample was followed-up monthly over six months. A principal component analysis and a clustering procedure with k-medoids were applied to identify profiles. Pulmonary and extrapulmonary (i.e., physical, symptoms and health status, and behavioural/life-style risk factors) treatable traits were identified in each profile based on the established cut-offs for each measure available in the literature. The decision tree was developed with 70% and validated with 30% of the sample, cross-sectionally. Agreement between the profile predicted by the decision tree and the profile defined by the clustering procedure was determined using Cohen’s Kappa. Stability was explored over time with a stability score defined as the percentage ratio between the number of timepoints that a participant was classified in the same profile (most frequent profile for that participant) and the total number of timepoints (i.e., 6). Results: 352 people with COPD (67.4 ± 9.9 years; 78.1% male; FEV1 = 56.2 ± 20.6% predicted) participated and 90 (67.6 ± 8.9 years; 85.6% male; FEV1 = 52.1 ± 19.9% predicted) were followedup. Four profiles were identified with distinct treatable traits. The decision tree was composed by the CAT, age and FEV1% predicted and had an agreement of 71.7% (Cohen’s Kappa = 0.62, p < 0.001) with the actual profiles. 48.9% of participants remained in the same profile whilst 51.1% moved between two (47.8%) and three (3.3%) profiles over time. The overall stability of profiles was 86.8 ± 15%. Conclusions: Profiles and treatable traits can be identified in people with COPD with simple and meaningful measures possibly available even in minimal-resource settings. Regular assessments are recommended as people with COPD may change profile over time and hence their needs of personalised treatment.publishe

    Genome analysis with inter-nucleotide distances

    Get PDF
    Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description
    corecore