36 research outputs found
Clustering DNA words through distance distributions
Functional data appear in several domains of science, for example, in biomedical,
meteorologic or engineering studies. A functional observation can exhibit an atypical
behaviour during a short or a large part of the domain and this may be due to
magnitude or to shape features. Over the last ten years many outlier detection
methods have been proposed. In this work we use the functional data framework to
investigate the existence of DNA words with outlying distance distribution, which
may be related with biological motifs.
A DNA word is a sequence defined in the genome alphabet {ACGT}. Distances between successive occurrences of the same word allow defining the inter-word distance
distribution, interpretable as a discrete function. Each word length is associated
with a functional dataset formed by 4
distance distributions. As the word length
increases, greater is the diversity of observed patterns in the functional dataset and
larger is the number of distributions displaying strong peaks of frequency. We propose a two-step procedure to detect words with an outlying pattern of distances: first, the functions are clustered according to their global trend; then, an
outlier detection method is applied within each cluster. Each distribution trend is
obtained by data smoothing, which avoids some distributions’ peaks, and similarities
between smoothed data are explored through hierarchical complete linkage clustering. The dissimilarity between functions is evaluated using the Euclidean distance
or the Generalized Minimum distance [1], which considers the dependence between
domain points. The resulting dendograms are then cut leading to a partition of the
distance distributions. For the second step we use the Directional Outlyingness measure which assigns a robust measure of outlyingness to each domain point and is the
building block of a graphical tool for visualization of the centrality of the curves [2].
We focus on the human genome and words of length ≤ 7. Results are compared
with those obtained by applying only the second step of the procedure [3].publishe
Meta-analysis of a very low proportion through adjusted wald confidence intervals
In this paper we will discuss the meta-analysis of one low proportion. It is well known, that there are
several methods to perform the meta-analysis of one proportion, based on a linear combination of
proportions or transformed proportions. However, in the context of a linear combination of binomial
proportions has been proposed some approximate estimators with some improvements on low
proportion estimation. In this paper we will show, with a simple adaptation, the possible contribution
of several approximate adjusted Wald confidence intervals (CIs) for the meta-analysis of proportions. In
the context of low proportions, a simulation study scenario is carried out to compare these CIs amongst
themselves and with other available methods with respect to bias and coverage probabilities, using the
fixed effect or the random-effects model. Pointing our interest in rare events (analogous for the abundant
events) and taking into account the prevalence estimation of the Methicillin-resistant Staphylococcus
aureus with mecc gene, we discuss the choice of the meta-analysis methods on this low proportion. The
default meta-analysis methods of meta-analysis software programs are not always the best choice, in
particular to the meta-analysis of one low proportion, where the methods including the adjusted Wald
can outperform.publishe
Electrocardiography in hypertensive patients without cardiovascular events: a valuable predictor tool?
Background. Hypertension is an important risk factor of cardiovascular (CV) disease. An early diagnosis of target organ damage
could prevent major CV events. Electrocardiography (ECG) is a valuable clinical technique, with wide availability and high
speci city, used in evaluation of hypertensive patients. However, the use of ECG as a predictor tool is controversial given its low
sensitivity. is study aims to characterise ECG features in a hypertensive population and identify ECG abnormalities that could
predict CV events. Methods. We studied 175 hypertensive patients without previous CV events during a follow-up mean of
4.0 ± 2.20 years. ECGs and pulse wave velocity were performed in all patients. Clinical characteristics and ECG abnormalities were
evaluated and compared between the patients as they presented CV events. Results. Considering the 175 patients (53.14% male),
the median age was 62 years. Median systolic blood pressure was 140 mmHg and diastolic blood pressure was 78 mmHg. Median
PWV was 9.8 m/s. Of the patients, 39.4% were diabetic, 78.3% had hyperlipidaemia, and 16.0% had smoking habits. ECG
identi ed left ventricular (LV) hypertrophy in 29.71% of the patients, and a LV strain pattern was present in 9.7% of the patients.
Twenty-nine patients (16.57%) had a CV event. Comparative analyses showed statistical signi cance for the presence of a LV
strain pattern in patients with CV events (p 0.01). Univariate and multivariate analysis con rmed that a LV strain pattern was
an independent predictor of CV event (HR 2.66, 95% IC 1.01–7.00). In the survival analysis, the Kaplan–Meier curve showed a
worse prognosis for CV events in patients with a LV strain pattern (p 0.014). Conclusion. ECG is a useful daily method to
identify end-organ damage in hypertensive patients. In our study, we also observed that it may be a valuable tool for the prediction
of CV events.publishe
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
The mapping defined by inter-nucleotide distances (InD) provides a
reversible numerical representation of the primary structure of DNA. If nucleotides
were independently placed along the genome, a finite mixture model of four geometric
distributions could be fitted to the InD where the four marginal distributions would
be the expected distributions of the four nucleotide types. We analyze a finite mixture
model of geometric distributions (f2), with marginals not explicitly addressed to the
nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood
framework for choosing the number of components of the mixture and the EM algorithm
for estimating the model parameters. Based on divergence profiles, an experimental study
was carried out on the complete genomes of 45 species to evaluate f2. Although the
proposed model is not suited to the InD, our analysis shows that divergence profiles
involving the empirical distribution of the InD are also exhibited by profiles involving
f2. It suggests that statistical regularities of the InD can be described by the model f2.
Some characteristics of the DNA sequences captured by the model f2 are illustrated. In
particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and
plants) are detected
Clusters of functional status in COPD: an exploratory analysis
Functional status is highly meaningful to the daily life of people with COPD but is often overlooked by treatmentoptions. Understanding its heterogeneity, might contribute to better personalised care. We aimed to explore clustersof functional status in people with COPD.
Lung function, impact of the disease, activity-related dyspnoea and functional status were collected cross-sectionally.The 6-minute walk test, 1-minute sit-to-stand test, quadriceps maximum voluntary contraction and handgrip musclestrength were used to group individuals to clusters (K-means clustering). Total within cluster sum of squares wascomputed for different values of k and the optimum number of clusters was defined as the inflexion point on thecurve. Differences between clusters were explored using ANOVA and post-hoc multiple pairwise comparisons.
127 people with COPD (82% male, 68±8 years, FEV1 56±20 %pred) were included in the analysis. 4 clusters werefound (Fig. 1): ‘over-achievers’ (Cluster 2, n=30); ‘achievers’ (Cluster 1, n=28); ‘partial-achievers’ (Cluster 4, n=39);‘non-achievers’ (Cluster 3, n=29).
Our 4 clusters of functional status may guide tailored treatment regimens to improve this highly meaningful outcome.Cluster validity, their behaviour over time and differential response to treatment needs further investigation.publishe
Evaluating COVID-19 in Portugal: Bootstrap confidence interval
In this paper, we consider a compartmental model to fit the real data of confirmed active cases with COVID-19 in Portugal, from March 2, 2020 until September 10, 2021 in the Primary Care Cluster in Aveiro region, ACES BV, reported to the Public Health Unit. The model includes a deterministic component based on ordinary differential equations and a stochastic component based on bootstrap methods in regression. The main goal of this work is to take into account the variability underlying the data set and analyse the estimation accuracy of the model using a residual bootstrapped approach in order to compute confidence intervals for the prediction of COVID-19 confirmed active cases. All numerical simulations are performed in R environment ( version. 4.0.5). The proposed algorithm can be used, after a suitable adaptation, in other communicable diseases and outbreaks.info:eu-repo/semantics/publishedVersio
Statistical, computational and visualization methodologies to unveil gene primary structure features
Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.publishe
Evaluation of vancomycin MIC creep in methicillin-resistant Staphylococcus aureus infections-a systematic review and meta-analysis
Vancomycin is currently the primary option treatment for methicillin-resistant Staphylococcus aureus (MRSA). However, an increasing number of MRSA isolates with high MICs, within the susceptible range (vancomycin MIC creep), are being reported worldwide. Resorting to a meta-analysis approach, this study aims to assess the evidence of vancomycin MIC creep.publishe
COPD profiles and treatable traits using minimal resources: identification, decision tree and longitudinal stability
Introduction: Chronic obstructive pulmonary disease (COPD) is
highly heterogeneous and complex. Hence, personalising assessments
and treatments to this population across different settings
and available resources imposes challenges and debate. Research
efforts have been made to identify clinical phenotypes or profiles for prognostic and therapeutic purposes. Nevertheless, such profiles
often do not describe treatable traits, focus on complex physiological/
pulmonary measures which are frequently not available across
settings, lack validation and/or their stability over time is unknown.
Objectives: To identify profiles and their treatable traits based on
simple and meaningful measures; to develop and validate a profile
decision tree; and to explore profiles’ stability over time in people
with COPD.
Methods: An observational, prospective study was conducted with
people with COPD. Clinical characteristics, lung function, symptoms,
impact of the disease (COPD assessment test–CAT), healthrelated
quality of life, physical activity, lower-limb muscle strength
and functional status were collected cross-sectionally and a subsample
was followed-up monthly over six months. A principal component
analysis and a clustering procedure with k-medoids were
applied to identify profiles. Pulmonary and extrapulmonary (i.e.,
physical, symptoms and health status, and behavioural/life-style
risk factors) treatable traits were identified in each profile based
on the established cut-offs for each measure available in the literature.
The decision tree was developed with 70% and validated
with 30% of the sample, cross-sectionally. Agreement between the
profile predicted by the decision tree and the profile defined by the
clustering procedure was determined using Cohen’s Kappa. Stability
was explored over time with a stability score defined as the
percentage ratio between the number of timepoints that a participant
was classified in the same profile (most frequent profile for
that participant) and the total number of timepoints (i.e., 6).
Results: 352 people with COPD (67.4 ± 9.9 years; 78.1% male;
FEV1 = 56.2 ± 20.6% predicted) participated and 90 (67.6 ± 8.9 years; 85.6% male; FEV1 = 52.1 ± 19.9% predicted) were followedup.
Four profiles were identified with distinct treatable traits. The
decision tree was composed by the CAT, age and FEV1% predicted
and had an agreement of 71.7% (Cohen’s Kappa = 0.62, p < 0.001)
with the actual profiles. 48.9% of participants remained in the same
profile whilst 51.1% moved between two (47.8%) and three (3.3%)
profiles over time. The overall stability of profiles was 86.8 ± 15%.
Conclusions: Profiles and treatable traits can be identified in people
with COPD with simple and meaningful measures possibly available
even in minimal-resource settings. Regular assessments are
recommended as people with COPD may change profile over time
and hence their needs of personalised treatment.publishe
Genome analysis with inter-nucleotide distances
Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description