241 research outputs found

    Building from scratch: de novo gene birth

    Get PDF

    On homology searches by protein Blast and the characterization of the age of genes

    Get PDF
    BACKGROUND: It has been shown in a variety of organisms, including mammals, that genes that appeared recently in evolution, for example orphan genes, evolve faster than older genes. Low functional constraints at the time of origin of novel genes may explain these results. However, this observation has been recently attributed to an artifact caused by the inability of Blast to detect the fastest genes in different eukaryotic genomes. Distinguishing between these two possible explanations would be of great importance for any studies dealing with the taxon distribution of proteins and the origin of novel genes. RESULTS: Here we used simulations of protein sequences to examine the capacity of Blast to detect proteins of diverse evolutionary rates in the different species of an eukaryotic phylogenetic tree that included metazoans, fungi and plants. We simulated the evolution of protein genes with the same evolutionary rates than those observed in functional mammalian genes and with among-site rate heterogeneity. Under these conditions, we found that only a very small percentage of simulated ancestral eukaryotic proteins was affected by the Blast artifact. We show that the good detectability of Blast is due to the heterogeneity of protein evolutionary rates at different sites, since only a small conserved motif in a sequence suffices to detect its homologues. Our results indicate that Blast, at least when applied within eukaryotes, only misses homologues of extremely fast-evolving sequences, which are rare in the mammalian genome, as well as sequences evolving homogeneously or pseudogenes. CONCLUSION: Although great care should be exercised in the recognition of remote homologues, most functional mammalian genes can be detected in eukaryotic genomes by Blast. That is, the majority of functional mammalian genes are not as fast as for not being detected in other metazoans, fungi or plants, if they had been present in these organisms. Thus, the correlation previously found between age and rate seems not to be due to a pure Blast artifact, at least for mammals. This may have important implications to understand the mechanisms by which novel genes originate

    Mutation patterns of amino acid tandem repeats in the human proteome

    Get PDF
    BACKGROUND: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs). RESULTS: We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes. CONCLUSION: This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease

    ABS: a database of Annotated regulatory Binding Sites from orthologous promoters

    Get PDF
    Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS () is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs

    The properties of the AGN torus as revealed from a set of unbiased NuSTAR observations

    Get PDF
    The obscuration observed in active galactic nuclei (AGN) is mainly caused by dust and gas distributed in a torus-like structure surrounding the supermassive black hole (SMBH). However, properties of the obscuring torus of the AGN in X-ray have not been fully investigated yet due to the lack of high-quality data and proper models. In this work, we perform a broadband X-ray spectral analysis of a large, unbiased sample of obscured AGN (with line-of-sight column density 23\lelog(NH)\le24) in the nearby universe which has high-quality archival NuSTAR data. The source spectra are analyzed using the recently developed borus02 model, which enables us to accurately characterize the physical and geometrical properties of AGN obscuring tori. We also compare our results obtained from the unbiased Compton thin AGN with those of Compton-thick AGN. We find that Compton thin and Compton-thick AGN may possess similar tori, whose average column density is Compton thick (NH,tor,ave\rm _{H,tor,ave} \sim1.4×\times1024^{24} cm2^{-2}), but they are observed through different (under-dense or over-dense) regions of the tori. We also find that the obscuring torus medium is significantly inhomogeneous, with the torus average column densities significantly different from their line-of-sight column densities (for most of the sources in the sample). The average torus covering factor of sources in our unbiased sample is cf_f=0.67, suggesting that the fraction of unobscured AGN is \sim33%. We develop a new method to measure the intrinsic line-of-sight column density distribution of AGN in the nearby universe, which we find the result is in good agreement with the constraints from recent population synthesis models.Comment: 16 pages, 14 figures, 7 tables; accepted by A&

    Representació no-lineal de les imatges per a codificació perceptiva

    Get PDF
    JPEG2000 és un estàndard de compressió d'imatges que utilitza la transformada wavelet i, posteriorment, una quantificació uniforme dels coeficients amb dead-zone. Els coeficients wavelet presenten certes dependències tant estadístiques com visuals. Les dependències estadístiques es tenen en compte a l'esquema JPEG2000, no obstant, no passa el mateix amb les dependències visuals. En aquest treball, es pretén trobar una representació més adaptada al sistema visual que la que proporciona JPEG2000 directament. Per trobar-la utilitzarem la normalització divisiva dels coeficients, tècnica que ja ha demostrat resultats tant en decorrelació estadística de coeficients com perceptiva. Idealment, el que es voldria fer és reconvertir els coeficients a un espai de valors en els quals un valor més elevat dels coeficients impliqui un valor més elevat d'aportació visual, i utilitzar aquest espai de valors per a codificar. A la pràctica, però, volem que el nostre sistema de codificació estigui integrat a un estàndard. És per això que utilitzarem JPEG2000, estàndard de la ITU que permet una elecció de les distorsions en la codificació, i utilitzarem la distorsió en el domini de coeficients normalitzats com a mesura de distorsió per a escollir quines dades s'envien abans.JPEG2000 es un estándar de compresión de imágenes que utiliza la transformada wavelet y, posteriormente, una cuantificación uniforme de los coeficientes con dead-zone. Los coeficientes wavelets presenta ciertas dependencias tanto estadísticas como visuales. Las dependencias estadísticas se tienen en cuenta en el esquema de JPEG2000, no obstante, no ocurre lo mismo en el caso de las visuales. En este trabajo se pretende encontrar una representación más adaptada al sistema visual humano que la que proporciona JPEG2000 directamente. Para hallarla utilizaremos la normalización divisiva de los coeficientes, técnica que ya ha demostrado resultados tanto en descorrelación estadística de coeficientes como perceptiva. Idealmente, se quiere reconvertir los coeficientes a un espacio de valores en los cuales un valor elevado de los coeficientes implique un valor más elevado de aportación visual, y utilizar este espacio de valores para codificar. A la práctica, no obstante, queremos que nuestro sistema de codificación este integrado en un estándar. Es por eso que utilizaremos JPEG2000, estándar de la ITU que permite una elección de las distorsiones en la codificación, y utilizaremos la distorsión en el dominio de los coeficientes normalizados como medida de distorsión para escoger que datos se envían antes.JPEG2000 is a wavelet-based image compression standard. After the wavelet transform, the coefficients are scalar-quantized using a dead-zone quantizer. Wavelet coefficients present both statistical and perceptual dependencies. JPEG2000 takes into account coefficient statistical dependencies in its entropy coding scheme, but not the visual ones. In this work, we aim at finding a representation that is more adapted to visual perception than that of the JPEG200 standard. Given the great statistical and perceptual redundancy reduction rates shown by divisive normalization. We propose to introduce the use of divisive coefficient normalization into the JPEG2000 encoding scheme. Ideally, we would like to reconvert the coefficients in a space of values in which higher value of the coefficients implies higher value of visual contribution, and use this space of values to encode. In practice, we want our coding system to be integrated into a standard, so we will use JPEG2000, an ITU standard that allows a choice of the distortions in the coding, and we will use the distortion in the normalized domain as a measure to choose which data have to be sent before.Nota: Aquest document conté originàriament altre material i/o programari només consultable a la Biblioteca de Ciència i Tecnologia

    C-GOALS II. Chandra Observations of the Lower Luminosity Sample of Nearby Luminous Infrared Galaxies in GOALS

    Get PDF
    We analyze Chandra X-ray observatory data for a sample of 63 luminous infrared galaxies (LIRGs), sampling the lower-infrared luminosity range of the Great Observatories All-Sky LIRG survey (GOALS), which includes the most luminous infrared selected galaxies in the local universe. X-rays are detected for 84 individual galaxies within the 63 systems, for which arcsecond resolution X-ray images, fluxes, infrared and X-ray luminosities, spectra and radial profiles are presented. Using X-ray and MIR selection criteria, we find AGN in (31±\pm5)% of the galaxy sample, compared to the (38±\pm6)% previously found for GOALS galaxies with higher infrared luminosities (C-GOALS I). Using mid-infrared data, we find that (59±\pm9)% of the X-ray selected AGN in the full C-GOALS sample do not contribute significantly to the bolometric luminosity of the host galaxy. Dual AGN are detected in two systems, implying a dual AGN fraction in systems that contain at least one AGN of (29±\pm14)%, compared to the (11±\pm10)% found for the C-GOALS I sample. Through analysis of radial profiles, we derive that most sources, and almost all AGN, in the sample are compact, with half of the soft X-ray emission generated within the inner 1\sim 1 kpc. For most galaxies, the soft X-ray sizes of the sources are comparable to those of the MIR emission. We also find that the hard X-ray faintness previously reported for the bright C-GOALS I sources is also observed in the brightest LIRGs within the sample, with LFIR>8×1010L_{\rm FIR}>8\times10^{10} L_{\odot}.Comment: 24 pages, 13 figures, 11 tables, accepted for publication in A&

    A Compton-thick nucleus in the dual active galactic nuclei of Mrk 266

    Get PDF
    We present the results from our analysis of NuSTAR data of the luminous infrared galaxy Mrk 266, which contains two nuclei, south-western (SW) and north-eastern (NE), which were resolved in previous Chandra imaging. Combining this with the Chandra data, we intepret the hard X-ray spectrum obtained from a NuSTAR observation to result from a steeply rising flux from a Compton-thick active galactic nuclei (AGN) in the SW nucleus which is very faint in the Chandra band, confirming the previous claim. This hard X-ray component is dominated by reflection, and its intrinsic 2–10 keV luminosity is likely to be ∼1 × 10⁴³ erg s⁻¹. Although it is bright in soft X-ray, only a moderately absorbed NE nucleus has a 2–10 keV luminosity of 4 × 10⁴¹ erg s⁻¹, placing it in the low-luminosity AGN class. These results have implications for understanding the detectability and duty cycles of emission from dual AGN in heavily obscured mergers
    corecore