1,437 research outputs found

    Faster algorithms for 1-mappability of a sequence

    Full text link
    In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where {\sigma} is the alphabet size

    Une méthode générale d'estimation des paramètres génétiques dans un échantillon sélectionné, avec une application à une sélection sur un indice à trois caractères

    Get PDF
    L’estimation des paramètres génétiques d’une population, héritabilités et corrélations génétiques, à partir d’un échantillon repose sur de nombreuses hypothèses, qui concernent tant la population elle-même que l’échantillon qu’on en tire. La principale condition à laquelle doit satisfaire l’échantillon est d’être aléatoire, ce qui n’est pas le cas lorsque les individus mesurés, ou leurs parents, sont sélectionnés, situation fréquente chez les animaux d’élevage. Dans ce cas, les estimations obtenues par les techniques classiques de régression et d’analyse de variance peuvent être faussées et des corrections sont nécessaires pour tenir compte de la sélection. Lorsque la sélection se fait sur plusieurs caractères simultanément, combinés par exemple dans un indice de sélection, la correction repose sur la connaissance des variances-covariances des parents sélectionnés relativement à celles de la population non sélectionnée, selon une méthode générale décrite, en particulier, par AI TKEN (1934). Cette méthode inclut, comme cas particulier, les résultats de plusieurs travaux récents concernant les effets de la sélection sur un seul caractère. Cependant, la variance d’échantillonnage des estimations corrigées ainsi obtenues a une expression complexe. Un exemple numérique, concernant une sélection sur un indice à 3 caractères chez le Porc, est traité. Il montre un bon accord entre les estimées corrigées pour les effets de la sélection et les estimées tirées de l’échantillon non sélectionné.The estimation of genetic parameters rests on several hypotheses, concerning either the population considered itself or the sample drawn from it. In particular, random sampling has to be assumed, which is not the case when the individuals measured, or their parents, are selected, a frequent situation in farm animals. In this case, the usual estimates from regression and analysis of variance may be biassed. When selection is based on several traits, which for instance are combined into an index, the bias may be derived from the knowledge of the phenotypic variance-covariance matrix of the selected parents, according to a general method described, in particular, by AiruErt (1934). This method includes, as particular cases, several more recent results concerning the bias due to selection on one character. However, the sampling variances of the estimates so obtained have complex expressions. A numerical illustration is given, which concerns a three-trait index selection in the pig. The results show a good agreement between the estimates corrected for the bias due to selection and the estimates drawn from the unselected sample

    Possible surface plasmon polariton excitation under femtosecond laser irradiation of silicon

    Full text link
    The mechanisms of ripple formation on silicon surface by femtosecond laser pulses are investigated. We demonstrate the transient evolution of the density of the excited free-carriers. As a result, the experimental conditions required for the excitation of surface plasmon polaritons are revealed. The periods of the resulting structures are then investigated as a function of laser parameters, such as the angle of incidence, laser fluence, and polarization. The obtained dependencies provide a way of better control over the properties of the periodic structures induced by femtosecond laser on the surface of a semiconductor material.Comment: 11 pages, 8 figures, accepted for publication in Journal of Applied Physic

    Photoionization and transient Wannier-Stark ladder in silicon: First-principles simulations versus Keldysh theory

    Get PDF
    Nonlinear photoionization of dielectrics and semiconductors is widely treated in the framework of the Keldysh theory whose validity is limited to photon energies that are small compared to the band gap and relatively low laser intensities. The time-dependent density functional theory (TDDFT) simulations, which are free of these limitations, enable one to gain insight into nonequilibrium dynamics of the electronic structure. Here we apply TDDFT to investigate the photoionization of silicon crystal by ultrashort laser pulses in a wide range of laser wavelengths and intensities and compare the results with predictions of the Keldysh theory. Photoionization rates derived from the simulations considerably exceed the data obtained with the Keldysh theory within the validity range of the latter. Possible reasons for the discrepancy are discussed and we provide fundamental data on the photoionization rates beyond the limits of the Keldysh theory. By investigating the features of the Stark shift as a function of photon energy and laser field strength, a manifestation of the transient Wannier-Stark ladder states is revealed, which become blurred with increasing laser field strength. Finally, it is shown that the TDDFT simulations can potentially provide reliable data on the electron damping time that is of high importance for large-scale modeling

    Revisiting the missing protein-coding gene catalog of the domestic dog

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Among mammals for which there is a high sequence coverage, the whole genome assembly of the dog is unique in that it predicts a low number of protein-coding genes, ~19,000, compared to the over 20,000 reported for other mammalian species. Of particular interest are the more than 400 of genes annotated in primates and rodent genomes, but missing in dog.</p> <p>Results</p> <p>Using over 14,000 orthologous genes between human, chimpanzee, mouse rat and dog, we built multiple pairwise synteny maps to infer short orthologous intervals that were targeted for characterizing the canine missing genes. Based on gene prediction and a functionality test using the ratio of replacement to silent nucleotide substitution rates (<it>d</it><sub>N</sub>/<it>d</it><sub>S</sub>), we provide compelling structural and functional evidence for the identification of 232 new protein-coding genes in the canine genome and 69 gene losses, characterized as undetected gene or pseudogenes. Gene loss phyletic pattern analysis using ten species from chicken to human allowed us to characterize 28 canine-specific gene losses that have functional orthologs continuously from chicken or marsupials through human, and 10 genes that arose specifically in the evolutionary lineage leading to rodent and primates.</p> <p>Conclusion</p> <p>This study demonstrates the central role of comparative genomics for refining gene catalogs and exploring the evolutionary history of gene repertoires, particularly as applied for the characterization of species-specific gene gains and losses.</p

    The origins, evolution, and functional potential of alternative splicing in vertebrates.

    Get PDF
    Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution

    A novel stepwise integrative analysis pipeline reveals distinct microbiota-host interactions and link to symptoms in irritable bowel syndrome

    Get PDF
    Although incompletely understood, microbiota-host interactions are assumed to be altered in irritable bowel syndrome (IBS). We, therefore, aimed to develop a novel analysis pipeline tailored for the integrative analysis of microbiota-host interactions and association to symptoms and prove its utility in a pilot cohort. A multilayer stepwise integrative analysis pipeline was developed to visualize complex variable associations. Application of the pipeline was demonstrated on a dataset of IBS patients and healthy controls (HC), using the R software package to analyze colonic host mRNA and mucosal microbiota (16S rRNA gene sequencing), as well as gastrointestinal (GI) and psychological symptoms. In total, 42 IBS patients (57% female, mean age 33.6 (range 18–58)) and 20 HC (60% female, mean age 26.8 (range 23–41)) were included. Only in IBS patients, mRNA expression of Toll-like receptor 4 and genes associated with barrier function (PAR2, OCLN, TJP1) intercorrelated closely, suggesting potential functional relationships. This host genes-based “permeability cluster” was associated to mucosa-adjacent Chlamydiae and Lentisphaerae, and furthermore associated to satiety as well as to anxiety, depression and fatigue. In both IBS patients and HC, chromogranins, secretogranins and TLRs clustered together. In IBS patients, this host genes-based “immune-enteroendocrine cluster” was associated to specific members of Firmicutes, and to depression and fatigue, whereas in HC no significant association to microbiota was identified. We have developed a stepwise integrative analysis pipeline that allowed identification of unique host-microbiota intercorrelation patterns and association to symptoms in IBS patients. This analysis pipeline may aid in advancing the understanding of complex variable associations in health and disease

    A robust SNP barcode for typing Mycobacterium tuberculosis complex strains

    Get PDF
    Strain-specific genomic diversity in the Mycobacterium tuberculosis complex (MTBC) is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Several systems have been proposed to classify MTBC strains into distinct lineages and families. Here, we investigate single-nucleotide polymorphisms (SNPs) as robust (stable) markers of genetic variation for phylogenetic analysis. We identify ~92k SNP across a global collection of 1,601 genomes. The SNP-based phylogeny is consistent with the gold-standard regions of difference (RD) classification system. Of the ~7k strain-specific SNPs identified, 62 markers are proposed to discriminate known circulating strains. This SNP-based barcode is the first to cover all main lineages, and classifies a greater number of sublineages than current alternatives. It may be used to classify clinical isolates to evaluate tools to control the disease, including therapeutics and vaccines whose effectiveness may vary by strain type
    corecore