98,376 research outputs found
On the entropy of protein families
Proteins are essential components of living systems, capable of performing a
huge variety of tasks at the molecular level, such as recognition, signalling,
copy, transport, ... The protein sequences realizing a given function may
largely vary across organisms, giving rise to a protein family. Here, we
estimate the entropy of those families based on different approaches, including
Hidden Markov Models used for protein databases and inferred statistical models
reproducing the low-order (1-and 2-point) statistics of multi-sequence
alignments. We also compute the entropic cost, that is, the loss in entropy
resulting from a constraint acting on the protein, such as the fixation of one
particular amino-acid on a specific site, and relate this notion to the escape
probability of the HIV virus. The case of lattice proteins, for which the
entropy can be computed exactly, allows us to provide another illustration of
the concept of cost, due to the competition of different folds. The relevance
of the entropy in relation to directed evolution experiments is stressed.Comment: to appear in Journal of Statistical Physic
The Roles of Gene Duplication, Gene Conversion and Positive Selection in Rodent \u3ci\u3eEsp\u3c/i\u3e and \u3ci\u3eMup\u3c/i\u3e Pheromone Gene Families with Comparison to the \u3ci\u3eAbp\u3c/i\u3e Family
Three proteinaceous pheromone families, the androgen-binding proteins (ABPs), the exocrine-gland secreting peptides (ESPs) and the major urinary proteins (MUPs) are encoded by large gene families in the genomes of Mus musculus and Rattus norvegicus. We studied the evolutionary histories of the Mup and Esp genes and compared them with what is known about the Abp genes. Apparently gene conversion has played little if any role in the expansion of the mouse Class A and Class B Mup genes and pseudogenes, and the rat Mups. By contrast, we found evidence of extensive gene conversion in many Esp genes although not in all of them. Our studies of selection identified at least two amino acid sites in ÎČ-sheets as having evolved under positive selection in the mouse Class A and Class B MUPs and in rat MUPs. We show that selection may have acted on the ESPs by determining Ka/Ks for Exon 3 sequences with and without the converted sequence segment. While it appears that purifying selection acted on the ESP signal peptides, the secreted portions of the ESPs probably have undergone much more rapid evolution. When the inner gene converted fragment sequences were removed, eleven Esp paralogs were present in two or more pairs with Ka/Ks \u3e1.0 and thus we propose that positive selection is detectable by this means in at least some mouse Esp paralogs. We compare and contrast the evolutionary histories of all three mouse pheromone gene families in light of their proposed functions in mouse communication
Family-specific scaling laws in bacterial genomes
Among several quantitative invariants found in evolutionary genomics, one of
the most striking is the scaling of the overall abundance of proteins, or
protein domains, sharing a specific functional annotation across genomes of
given size. The size of these functional categories change, on average, as
power-laws in the total number of protein-coding genes. Here, we show that such
regularities are not restricted to the overall behavior of high-level
functional categories, but also exist systematically at the level of single
evolutionary families of protein domains. Specifically, the number of proteins
within each family follows family-specific scaling laws with genome size.
Functionally similar sets of families tend to follow similar scaling laws, but
this is not always the case. To understand this systematically, we provide a
comprehensive classification of families based on their scaling properties.
Additionally, we develop a quantitative score for the heterogeneity of the
scaling of families belonging to a given category or predefined group. Under
the common reasonable assumption that selection is driven solely or mainly by
biological function, these findings point to fine-tuned and interdependent
functional roles of specific protein domains, beyond our current functional
annotations. This analysis provides a deeper view on the links between
evolutionary expansion of protein families and the functional constraints
shaping the gene repertoire of bacterial genomes.Comment: 41 pages, 16 figure
The regulation of differentiation in mesenchymal stem cells
Peer reviewedPublisher PD
Inverse Statistical Physics of Protein Sequences: A Key Issues Review
In the course of evolution, proteins undergo important changes in their amino
acid sequences, while their three-dimensional folded structure and their
biological function remain remarkably conserved. Thanks to modern sequencing
techniques, sequence data accumulate at unprecedented pace. This provides large
sets of so-called homologous, i.e.~evolutionarily related protein sequences, to
which methods of inverse statistical physics can be applied. Using sequence
data as the basis for the inference of Boltzmann distributions from samples of
microscopic configurations or observables, it is possible to extract
information about evolutionary constraints and thus protein function and
structure. Here we give an overview over some biologically important questions,
and how statistical-mechanics inspired modeling approaches can help to answer
them. Finally, we discuss some open questions, which we expect to be addressed
over the next years.Comment: 18 pages, 7 figure
High-resolution temporal profiling of transcripts during Arabidopsis leaf senescence reveals a distinct chronology of processes and regulation
Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered
regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The
regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little
information on how these function in the global control of the process. We used microarray analysis to obtain a highresolution
time-course profile of gene expression during development of a single leaf over a 3-week period to senescence.
A complex experimental design approach and a combination of methods were used to extract high-quality replicated data
and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to
reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well
as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups
of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic
processes, signaling pathways, and specific TF activity, which will underpin the development of network models to
elucidate the process of senescence
Skewed Factor Models Using Selection Mechanisms
Traditional factor models explicitly or implicitly assume that the factors follow a multivariate normal distribution; that is, only moments up to order two are involved. However, it may happen in real data problems that the first two moments cannot explain the factors. Based on this motivation, here we devise three new skewed factor models, the skew-normal, the skew-t, and the generalized skew-normal factor models depending on a selection mechanism on the factors. The ECME algorithms are adopted to estimate related parameters for statistical inference. Monte Carlo simulations validate our new models and we demonstrate the need for skewed factor models using the classic open/closed book exam scores dataset
Spinocerebellar Ataxia Type 2
1. Introduction: The autosomal dominant cerebellar ataxias (ADCA) are a clinically, pathologically and genetically heterogeneous group of neurodegenerative disorders caused by degeneration of cerebellum and its afferent and efferent connections. The degenerative process may additionally involves the ponto- medullar systems, pyramidal tracts, basal ganglia, cerebral cortex, peripheral nerves (ADCA I) and the retina (ADCA II), or can be limited to the cerebellum (ADCA III) (Harding et al., 1993). The most common of these dominantly inherited autosomal ataxias, ADCA I, includes many Spinocerebellar Ataxias (SCA) subtypes, some of which are caused by pathological CAG trinucleotide repeat expansion in the coding region on the mutated gene. Such is the case for SCA1, SCA2, SCA3/MJD, SCA6, SCA7, SCA17 and Dentatorubral-pallidoluysian atrophy (DRPLA) (Matilla et al., 2006). Among the almost 30 SCAs, the variant SCA2 is the second most prevalent subtype worldwide, only surpassed by SCA3 (Schöls et al., 2004; Matilla et al., 2006; Auburger, 2011)..
- âŠ