212 research outputs found

    Wikipedias: Collaborative web-based encyclopedias as complex networks

    Full text link
    Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.Comment: v3: 9 pages, 12 figures, Change of title, few paragraphs and two figures. Accepted for publication in Phys. Rev.

    Algoritmi za učinkovitu usporedbu sekvenci bez korištenja sravnjivanja

    Get PDF
    Sequence comparison is an essential tool in modern biology. It is used to identify homologous regions between sequences, and to detect evolutionary relationships between organisms. Sequence comparison is usually based on alignments. However, aligning whole genomes is computationally difficult. As an alternative approach, alignment-free sequence comparison can be used. In my thesis, I concentrate on two problems that can be solved without alignment: (i) estimation of substitution rates between nucleotide sequences, and (ii) detection of local sequence homology. In the first part of my thesis, I developed and implemented a new algorithm for the efficient alignment-free computation of the number of nucleotide substitutions per site, and applied it to the analysis of large data sets of complete genomes. In the second part of my thesis, I developed and implemented a new algorithm for detecting matching regions between nucleotide sequences. I applied this solution to the classification of circulating recombinant forms of HIV, and to the analysis of bacterial genomes subject to horizontal gene transfer.Table of Contents 1. GENERAL INTRODUCTION.........................................................................1 1.1. Suffix trees and other index data structures used in biological sequence analysis.....................................................................................................................9 1.1.1. Suffix Tree..........................................................................................11 1.1.2. The space and the time complexity of the algorithms for the suffix tree construction.......................................................................................................13 1.1.3. Suffix Array........................................................................................14 1.1.4. The space and the time complexity of the algorithms for suffix array construction.......................................................................................................15 1.1.5. Enhanced Suffix Array.......................................................................17 1.1.6. The 64-bit implementation of the lightweight suffix array construction algorithm 21 1.1.7. Self-index...........................................................................................22 1.1.8. Burrows-Wheeler transform..............................................................23 1.1.9. The FM-Index and the backward search algorithm..........................25 1.1.10. The space and the time-complexity of the FM-index.........................29 2. EFFICIENT ESTIMATION OF PAIRWISE DISTANCES BETWEEN GENOMES...............................................................................................................31 2.1. Introduction................................................................................................31 2.2. Methods.....................................................................................................33 2.2.1. Definition of an alignment-free estimator of the rate of substitution, Kr 33 2.2.2. Problem statement.............................................................................35 2.2.3. Time complexity analysis of the previous approach (kr 1)................35 2.2.4. Time complexity analysis of the new approach (kr 2).......................37 2.2.5. Algorithm 1: Computation of all Kr values during the traversal of a generalized suffix tree of n sequences................................................................38 2.2.6. The implementation of kr version 2...................................................44 2.3. Analysis of Kr on simulated data sets........................................................45 2.3.1. Auxiliary programs............................................................................45 2.3.2. Consistency of Kr...............................................................................46 i 2.3.3. The affect of horizontal gene transfer on the accuracy of Kr............48 2.3.4. The effect of genome duplication on the accuracy of Kr....................49 2.3.5. Run time comparison of kr 1 and kr 2...............................................50 2.4. Application of kr version 2........................................................................53 2.4.1. Auxililary software used for the analysis of real data sets................56 2.4.2. The analysis of 12 Drosophila genomes............................................57 2.4.3. The analysis of 13 Escherichia coli and Shigella genomes...............58 2.4.4. The analysis of 825 HIV-1 pure subtype genomes.............................61 2.5. Discussion..................................................................................................62 3. EFFICIENT ALIGNMENT-FREE DETECTION OF LOCAL SEQUENCE HOMOLOGY....................................................................................66 3.1. Introduction................................................................................................66 3.2. Methods.....................................................................................................69 3.2.1. Problem statement – determining subtype(s) of a query sequence....69 3.2.2. Construction of locally homologous segments..................................71 3.2.3. Time complexity of computing a list of intervals Ii............................72 3.2.4. Algorithm 2: Construction of an interval tree...................................73 3.2.5. Computing a list of segements Gi.......................................................80 3.3. Analysis of st on simulated data sets.........................................................82 3.3.1. Run-time and memory usage analysis of st........................................82 3.3.2. Consistency of st................................................................................85 3.3.3. Comparison to SCUEAL on simulated data sets...............................92 3.4. Application of st.........................................................................................97 3.4.1. The analysis of Neisseria meningitidis..............................................98 3.4.2. The analysis of a recombinant form of HIV-1...................................99 3.4.3. The analysis of circulating recombinant forms of HIV-1................103 3.4.4. The analysis of an avian pathogenic Escherichia coli strain..........104 3.5. Discussion................................................................................................107 4. CONCLUSION..............................................................................................110 5. REFERENCES..............................................................................................112 6. ELECTRONIC SOURCES...........................................................................121 7. LIST OF ABBREVIATIONS AND SYMBOLS.........................................122 ii iii ABSTRACT............................................................................................................124 SAŽETAK..............................................................................................................125 CURRICULUM VITAE........................................................................................126 ŽIVOTOPIS...........................................................................................................12

    Renormalization group scale-setting from the action - a road to modified gravity theories

    Get PDF
    The renormalization group (RG) corrected gravitational action in Einstein-Hilbert and other truncations is considered. The running scale of the renormalization group is treated as a scalar field at the level of the action and determined in a scale-setting procedure recently introduced by Koch and Ramirez for the Einstein-Hilbert truncation. The scale-setting procedure is elaborated for other truncations of the gravitational action and applied to several phenomenologically interesting cases. It is shown how the logarithmic dependence of the Newton's coupling on the RG scale leads to exponentially suppressed effective cosmological constant and how the scale-setting in particular RG corrected gravitational theories yields the effective f(R)f(R) modified gravity theories with negative powers of the Ricci scalar RR. The scale-setting at the level of the action at the non-gaussian fixed point in Einstein-Hilbert and more general truncations is shown to lead to universal effective action quadratic in Ricci tensor.Comment: v1: 15 pages; v2: shortened to 10 pages, main results unchanged, published in Class. Quant. Gra

    ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin

    Get PDF
    The evolutionary history of a protein reflects the functional history of its ancestors. Recent phylogenetic studies identified distinct evolutionary signatures that characterize proteins involved in cancer, Mendelian disease, and different ontogenic stages. Despite the potential to yield insight into the cellular functions and interactions of proteins, such comparative phylogenetic analyses are rarely performed, because they require custom algorithms. We developed ProteinHistorian to make tools for performing analyses of protein origins widely available. Given a list of proteins of interest, ProteinHistorian estimates the phylogenetic age of each protein, quantifies enrichment for proteins of specific ages, and compares variation in protein age with other protein attributes. ProteinHistorian allows flexibility in the definition of protein age by including several algorithms for estimating ages from different databases of evolutionary relationships. We illustrate the use of ProteinHistorian with three example analyses. First, we demonstrate that proteins with high expression in human, compared to chimpanzee and rhesus macaque, are significantly younger than those with human-specific low expression. Next, we show that human proteins with annotated regulatory functions are significantly younger than proteins with catalytic functions. Finally, we compare protein length and age in many eukaryotic species and, as expected from previous studies, find a positive, though often weak, correlation between protein age and length. ProteinHistorian is available through a web server with an intuitive interface and as a set of command line tools; this allows biologists and bioinformaticians alike to integrate these approaches into their analysis pipelines. ProteinHistorian's modular, extensible design facilitates the integration of new datasets and algorithms. The ProteinHistorian web server, source code, and pre-computed ages for 32 eukaryotic genomes are freely available under the GNU public license at http://lighthouse.ucsf.edu/ProteinHistorian/

    Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa

    Get PDF
    Background: Phylostratigraphy is a method used to correlate the evolutionary origin of founder genes (that is, functional founder protein domains) of gene families with particular macroevolutionary transitions. It is based on a model of genome evolution that suggests that the origin of complex phenotypic innovations will be accompanied by the emergence of such founder genes, the descendants of which can still be traced in extant organisms. The origin of multicellularity can be considered to be a macroevolutionary transition, for which new gene functions would have been required. Cancer should be tightly connected to multicellular life since it can be viewed as a malfunction of interaction between cells in a multicellular organism. A phylostratigraphic tracking of the origin of cancer genes should, therefore, also provide insights into the origin of multicellularity. Results: We find two strong peaks of the emergence of cancer related protein domains, one at the time of the origin of the first cell and the other around the time of the evolution of the multicellular metazoan organisms. These peaks correlate with two major classes of cancer genes, the 'caretakers', which are involved in general functions that support genome stability and the 'gatekeepers', which are involved in cellular signalling and growth processes. Interestingly, this phylogenetic succession mirrors the ontogenetic succession of tumour progression, where mutations in caretakers are thought to precede mutations in gatekeepers. Conclusions: A link between multicellularity and formation of cancer has often been predicted. However, this has not so far been explicitly tested. Although we find that a significant number of protein domains involved in cancer predate the origin of multicellularity, the second peak of cancer protein domain emergence is, indeed, connected to a phylogenetic level where multicellular animals have emerged. The fact that we can find a strong and consistent signal for this second peak in the phylostratigraphic map implies that a complex multi-level selection process has driven the transition to multicellularity

    Hubble expansion and structure formation in the "running FLRW model" of the cosmic evolution

    Full text link
    A new class of FLRW cosmological models with time-evolving fundamental parameters should emerge naturally from a description of the expansion of the universe based on the first principles of quantum field theory and string theory. Within this general paradigm, one expects that both the gravitational Newton's coupling, G, and the cosmological term, Lambda, should not be strictly constant but appear rather as smooth functions of the Hubble rate. This scenario ("running FLRW model") predicts, in a natural way, the existence of dynamical dark energy without invoking the participation of extraneous scalar fields. In this paper, we perform a detailed study of these models in the light of the latest cosmological data, which serves to illustrate the phenomenological viability of the new dark energy paradigm as a serious alternative to the traditional scalar field approaches. By performing a joint likelihood analysis of the recent SNIa data, the CMB shift parameter, and the BAOs traced by the Sloan Digital Sky Survey, we put tight constraints on the main cosmological parameters. Furthermore, we derive the theoretically predicted dark-matter halo mass function and the corresponding redshift distribution of cluster-size halos for the "running" models studied. Despite the fact that these models closely reproduce the standard LCDM Hubble expansion, their normalization of the perturbation's power-spectrum varies, imposing, in many cases, a significantly different cluster-size halo redshift distribution. This fact indicates that it should be relatively easy to distinguish between the "running" models and the LCDM cosmology using realistic future X-ray and Sunyaev-Zeldovich cluster surveys.Comment: Version published in JCAP 08 (2011) 007: 1+41 pages, 6 Figures, 1 Table. Typos corrected. Extended discussion on the computation of the linearly extrapolated density threshold above which structures collapse in time-varying vacuum models. One appendix, a few references and one figure adde

    Fate and transformation of silver nanoparticles in different biological conditions

    Get PDF
    The exploitation of silver nanoparticles (AgNPs) in biomedicine represents more than one third of their overall application. Despite their wide use and significant amount of scientific data on their effects on biological systems, detailed insight into their in vivo fate is still lacking. This study aimed to elucidate the biotransformation patterns of AgNPs following oral administration. Colloidal stability, biochemical transformation, dissolution, and degradation behaviour of different types of AgNPs were evaluated in systems modelled to represent biological environments relevant for oral administration, as well as in cell culture media and tissue compartments obtained from animal models. A multimethod approach was employed by implementing light scattering (dynamic and electrophoretic) techniques, spectroscopy (UV–vis, atomic absorption, nuclear magnetic resonance) and transmission electron microscopy. The obtained results demonstrated that AgNPs may transform very quickly during their journey through different biological conditions. They are able to degrade to an ionic form and again reconstruct to a nanoparticulate form, depending on the biological environment determined by specific body compartments. As suggested for other inorganic nanoparticles by other research groups, AgNPs fail to preserve their specific integrity in in vivo settings

    Directed transport of CRP across in vitro models of the blood-saliva barrier strengthens the feasibility of salivary CRP as biomarker for neonatal sepsis

    Get PDF
    C-reactive protein (CRP) is a commonly used serum biomarker for detecting sepsis in neonates. After the onset of sepsis, serial measurements are necessary to monitor disease progression; therefore, a non-invasive detection method is beneficial for neonatal well-being. While some studies have shown a correlation between serum and salivary CRP levels in septic neonates, the causal link behind this correlation remains unclear. To investigate this relationship, CRP was examined in serum and saliva samples from 18 septic neonates and compared with saliva samples from 22 healthy neonates. While the measured blood and saliva concentrations of the septic neonates varied individually, a correlation of CRP levels between serum and saliva samples was observed over time. To clarify the presence of active transport of CRP across the blood–salivary barrier (BSB), transport studies were performed with CRP using in vitro models of oral mucosa and submandibular salivary gland epithelium. The results showed enhanced transport toward saliva in both models, supporting the clinical relevance for salivary CRP as a biomarker. Furthermore, CRP regulated the expression of the receptor for advanced glycation end products (RAGE) and the addition of soluble RAGE during the transport studies indicated a RAGE-dependent transport process for CRP from blood to saliva
    • …
    corecore