277 research outputs found
Testing statistical hypothesis on random trees and applications to the protein classification problem
Efficient automatic protein classification is of central importance in
genomic annotation. As an independent way to check the reliability of the
classification, we propose a statistical approach to test if two sets of
protein domain sequences coming from two families of the Pfam database are
significantly different. We model protein sequences as realizations of Variable
Length Markov Chains (VLMC) and we use the context trees as a signature of each
protein family. Our approach is based on a Kolmogorov--Smirnov-type
goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences
of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is
a supremum over the space of trees of a function of the two samples; its
computation grows, in principle, exponentially fast with the maximal number of
nodes of the potential trees. We show how to transform this problem into a
max-flow over a related graph which can be solved using a Ford--Fulkerson
algorithm in polynomial time on that number. We apply the test to 10 randomly
chosen protein domain families from the seed of Pfam-A database (high quality,
manually curated families). The test shows that the distributions of context
trees coming from different families are significantly different. We emphasize
that this is a novel mathematical approach to validate the automatic clustering
of sequences in any context. We also study the performance of the test via
simulations on Galton--Watson related processes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS218 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Castillejo’s sonnet
Discutimos la autorÃa del soneto con primer verso «Si las penas que dais son verdaderas», por siglos atribuido sin dudas a Cristóbal de Castillejo pero publicado recientemente por varios autores como escrito por Juan Boscán. Pensamos que esta última atribución es errónea, quizás motivada por el tÃtulo «Soneto de Boscán» que Velasco, editor en 1573 de la obra de Castillejo, le puso a este soneto. La comparación de dos versiones de la obra de Castillejo que incluyen este soneto nos permite apreciar cuán fino poeta era el autor, y cuán escrupuloso en sus revisiones.We discuss the authorship of the sonnet whose first line is «Si las penas que dais son verdaderas». For centuries it was undoubtedly attributed to Cristóbal de Castillejo but recently it has been published by several authors as written by Juan Boscán. We think that this final allocation is erroneous and it was perhaps motivated by the title "Soneto de Boscán" that Velasco, editor of the work of Castillejo, put to this sonnet in 1573. The comparison of  two versions of the work of Castillejo, both including this sonnet, allows us to appreciate what a fine poet the author was, and how careful he was with his revisions.
SnailVis: a Paradigm to Visualize Complex Networks
We propose a new non-parametric and linear-complexity algorithm to visualize complex networks, which were previously decomposed in subsets according to some criteria. We show two representations: the first including all edges and vertices and the second, summarized, highlighting subsets and their relations. In this paper we use a community decomposition algorithm to generate the subsets; then we rank them by the number of inter-community connections. We also highlight the central core of each community, that is, the subset with the highest connectivity level, which is the kmax-core of the k-core decomposition.Sociedad Argentina de Informática e Investigación Operativ
Obtaining Communities with a Fitness Growth Process
The study of community structure has been a hot topic of research over the
last years. But, while successfully applied in several areas, the concept lacks
of a general and precise notion. Facts like the hierarchical structure and
heterogeneity of complex networks make it difficult to unify the idea of
community and its evaluation. The global functional known as modularity is
probably the most used technique in this area. Nevertheless, its limits have
been deeply studied. Local techniques as the ones by Lancichinetti et al. and
Palla et al. arose as an answer to the resolution limit and degeneracies that
modularity has.
Here we start from the algorithm by Lancichinetti et al. and propose a unique
growth process for a fitness function that, while being local, finds a
community partition that covers the whole network, updating the scale parameter
dynamically. We test the quality of our results by using a set of benchmarks of
heterogeneous graphs. We discuss alternative measures for evaluating the
community structure and, in the light of them, infer possible explanations for
the better performance of local methods compared to global ones in these cases
Association of candidate gene polymorphisms with clinical subtypes of preterm birth in a Latin American population
Background. Preterm birth (PTB) is the leading cause of neonatal mortality and morbidity. PTB is often classified according to clinical presentation: Idiopathic (PTB-I), preterm premature rupture of membranes (PTB-PPROM), and medically induced (PTBM).
The aim of this study was to evaluate the associations between specific candidate genes and clinical subtypes of PTB.
Methods. 24 SNPs were genotyped in 18 candidate genes in 709 infant triads. Of them, 243 were PTB-I, 256 PTB-PPROM, and 210 PTB-M. These data were analyzed with a Family-Based Association.
Results. PTB was nominally associated with rs2272365 in PON1, rs883319 in KCNN3, rs4458044 in CRHR1, and rs610277 in F3. Regarding clinical subtypes analysis, 3 SNPs were associated with PTB-I (rs2272365 in PON1, rs10178458 in COL4A3, and rs4458044 in CRHR1), rs610277 in F3 was associated with PTBPPROM, and rs883319 in KCNN3 and rs610277 in F3 were associated with PTB-M.
Conclusions. Our study identified polymorphisms potentially associated with specific clinical subtypes of PTB in this Latin American population. These results could suggest a specific role of such genes in the mechanisms involved in each clinical subtype. Further studies are required to confirm our results and to determine the role of these genes in the pathophysiology of clinical subtypes
The ocean sampling day consortium
Ocean Sampling Day was initiated by the EU-funded Micro B3 (Marine Microbial Biodiversity, Bioinformatics, Biotechnology) project to obtain a snapshot of the marine microbial biodiversity and function of the world’s oceans. It is a simultaneous global mega-sequencing campaign aiming to generate the largest standardized microbial data set in a single day. This will be achievable only through the coordinated efforts of an Ocean Sampling Day Consortium, supportive partnerships and networks between sites. This commentary outlines the establishment, function and aims of the Consortium and describes our vision for a sustainable study of marine microbial communities and their embedded functional traits
A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction
The developmental and physiological complexity of the auditory system is likely reflected in the underlying set of genes involved in auditory function. In humans, over 150 non-syndromic loci have been identified, and there are more than 400 human genetic syndromes with a hearing loss component. Over 100 non-syndromic hearing loss genes have been identified in mouse and human, but we remain ignorant of the full extent of the genetic landscape involved in auditory dysfunction. As part of the International Mouse Phenotyping Consortium, we undertook a hearing loss screen in a cohort of 3006 mouse knockout strains. In total, we identify 67 candidate hearing loss genes. We detect known hearing loss genes, but the vast majority, 52, of the candidate genes were novel. Our analysis reveals a large and unexplored genetic landscape involved with auditory function
Relaxation of Adaptive Evolution during the HIV-1 Infection Owing to Reduction of CD4+ T Cell Counts
Background: the first stages of HIV-1 infection are essential to establish the diversity of virus population within host. It has been suggested that adaptation to host cells and antibody evasion are the leading forces driving HIV evolution at the initial stages of AIDS infection. in order to gain more insights on adaptive HIV-1 evolution, the genetic diversity was evaluated during the infection time in individuals contaminated by the same viral source in an epidemic cluster. Multiple sequences of V3 loop region of the HIV-1 were serially sampled from four individuals: comprising a single blood donor, two blood recipients, and another sexually infected by one of the blood recipients. the diversity of the viral population within each host was analyzed independently in distinct time points during HIV-1 infection.Results: Phylogenetic analysis identified multiple HIV-1 variants transmitted through blood transfusion but the establishing of new infections was initiated by a limited number of viruses. Positive selection (d(N)/d(S)>1) was detected in the viruses within each host in all time points. in the intra-host viruses of the blood donor and of one blood recipient, X4 variants appeared respectively in 1993 and 1989. in both patients X4 variants never reached high frequencies during infection time. the recipient, who X4 variants appeared, developed AIDS but kept narrow and constant immune response against HIV-1 during the infection time.Conclusion: Slowing rates of adaptive evolution and increasing diversity in HIV-1 are consequences of the CD4+ T cells depletion. the dynamic of R5 to X4 shift is not associated with the initial amplitude of humoral immune response or intensity of positive selection.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Fed Univ Para, Inst Biotechnol, BR-66059 Belem, Para, BrazilUniv São Paulo, Inst Trop Med, São Paulo, SP, BrazilCDC, Ctr Dis Control & Prevent, Branch Lab, Atlanta, GA 30333 USAUniv Calif San Francisco, Dept Lab Med, San Francisco, CA 94143 USABlood Syst Res Inst, San Francisco, CA USABlood Syst Inc, San Francisco, CA USAUniversidade Federal de São Paulo, São Paulo, BrazilUniversidade Federal de São Paulo, São Paulo, BrazilFAPESP: 07/52841-8Web of Scienc
- …