411 research outputs found
Biological networks and epistasis in genome-wide association studies
Over the last few years, technological improvements have made possible the genotyping of hundreds of thousands of SNPs, enabling whole-genome association studies. The first genome-wide association studies have recently been completed to detect causal variant for complex traits. Although increasing evidence suggests that interaction between loci, such as epistasis between two loci, should be considered, most of these studies proceed by considering each SNP independently. One reason for this choice is that looking at all pairs of SNPs increases dramatically the number of tests (approximatively 50 billions of tests for a 300,000 SNPs data set) that faces with computational limitation and strong multiple testing correction.
We proposed to reduce the number of tests by focusing on pairs of SNPs that belong to genes known to interact in some metabolic network. Although some interactions might be missed, these pairs of genes are good candidates for epistasis. Furthermore the use of protein interaction databases (such as the STRING database) may reduce the number of tests by a factor of 5,000.
Results using this approach will be presented on simulated data sets and on public data sets.

A principal component analysis of 39 scientific impact measures
The impact of scientific publications has traditionally been expressed in
terms of citation counts. However, scientific activity has moved online over
the past decade. To better capture scientific impact in the digital era, a
variety of new impact measures has been proposed on the basis of social network
analysis and usage log data. Here we investigate how these new measures relate
to each other, and how accurately and completely they express scientific
impact. We performed a principal component analysis of the rankings produced by
39 existing and proposed measures of scholarly impact that were calculated on
the basis of both citation and usage log data. Our results indicate that the
notion of scientific impact is a multi-dimensional construct that can not be
adequately measured by any single indicator, although some measures are more
suitable than others. The commonly used citation Impact Factor is not
positioned at the core of this construct, but at its periphery, and should thus
be used with caution
A practical <i>O</i>(<i>n</i> log<sup>2</sup> <i>n</i>) time algorithm for computing the triplet distance on binary trees
The triplet distance is a distance measure that compares two rooted trees on the same set of leaves by enumerating all sub-sets of three leaves and counting how often the induced topologies of the tree are equal or different. We present an algorithm that computes the triplet distance between two rooted binary trees in time O (n log(2 )n). The algorithm is related to an algorithm for computing the quartet distance between two unrooted binary trees in time O (n log n). While the quartet distance algorithm has a very severe overhead in the asymptotic time complexity that makes it impractical compared to O (n(2)) time algorithms, we show through experiments that the triplet distance algorithm can be implemented to give a competitive wall-time running time
Measurement Units in R
We briefly review SI units, and discuss R packages that deal with measurement units, their compatibility and conversion. Built upon udunits2 and the UNIDATA udunits library, we introduce the package units that provides a class for maintaining unit metadata. When used in expression, it automatically converts units, and simplifies units of results when possible; in case of incompatible units, errors are raised. The class flexibly allows expansion beyond predefined units. Using units may eliminate a whole class of potential scientific programming mistakes. We discuss the potential and limitations of computing with explicit units
Haplotype frequencies in a sub-region of chromosome 19q13.3, related to risk and prognosis of cancer, differ dramatically between ethnic groups
<p>Abstract</p> <p>Background</p> <p>A small region of about 70 kb on human chromosome 19q13.3 encompasses 4 genes of which 3, <it>ERCC1</it>, <it>ERCC2</it>, and <it>PPP1R13L </it>(aka <it>RAI</it>) are related to DNA repair and cell survival, and one, <it>CD3EAP</it>, aka <it>ASE1</it>, may be related to cell proliferation. The whole region seems related to the cellular response to external damaging agents and markers in it are associated with risk of several cancers.</p> <p>Methods</p> <p>We downloaded the genotypes of all markers typed in the 19q13.3 region in the HapMap populations of European, Asian and African descent and inferred haplotypes. We combined the European HapMap individuals with a Danish breast cancer case-control data set and inferred the association between HapMap haplotypes and disease risk.</p> <p>Results</p> <p>We found that the susceptibility haplotype in our European sample had increased from 2 to 50 percent very recently in the European population, and to almost the same extent in the Asian population. The cause of this increase is unknown. The maximal proportion of overall genetic variation due to differences between groups for Europeans versus Africans and Europeans versus Asians (the F<sub>st </sub>value) closely matched the putative location of the susceptibility variant as judged from haplotype-based association mapping.</p> <p>Conclusion</p> <p>The combined observation that a common haplotype causing an increased risk of cancer in Europeans and a high differentiation between human populations is highly unusual and suggests a causal relationship with a recent increase in Europeans caused either by genetic drift overruling selection against the susceptibility variant or a positive selection for the same haplotype. The data does not allow us to distinguish between these two scenarios. The analysis suggests that the region is not involved in cancer risk in Africans and that the susceptibility variants may be more finely mapped in Asian populations.</p
A Method for the Automated, Reliable Retrieval of Publication-Citation Records
BACKGROUND: Publication records and citation indices often are used to evaluate academic performance. For this reason, obtaining or computing them accurately is important. This can be difficult, largely due to a lack of complete knowledge of an individual's publication list and/or lack of time available to manually obtain or construct the publication-citation record. While online publication search engines have somewhat addressed these problems, using raw search results can yield inaccurate estimates of publication-citation records and citation indices. METHODOLOGY: In this paper, we present a new, automated method that produces estimates of an individual's publication-citation record from an individual's name and a set of domain-specific vocabulary that may occur in the individual's publication titles. Because this vocabulary can be harvested directly from a research web page or online (partial) publication list, our method delivers an easy way to obtain estimates of a publication-citation record and the relevant citation indices. Our method works by applying a series of stringent name and content filters to the raw publication search results returned by an online publication search engine. In this paper, our method is run using Google Scholar, but the underlying filters can be easily applied to any existing publication search engine. When compared against a manually constructed data set of individuals and their publication-citation records, our method provides significant improvements over raw search results. The estimated publication-citation records returned by our method have an average sensitivity of 98% and specificity of 72% (in contrast to raw search result specificity of less than 10%). When citation indices are computed using these records, the estimated indices are within of the true value 10%, compared to raw search results which have overestimates of, on average, 75%. CONCLUSIONS: These results confirm that our method provides significantly improved estimates over raw search results, and these can either be used directly for large-scale (departmental or university) analysis or further refined manually to quickly give accurate publication-citation records
Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model
The genealogical relationship of human, chimpanzee, and gorilla varies along the genome. We develop a hidden Markov model (HMM) that incorporates this variation and relate the model parameters to population genetics quantities such as speciation times and ancestral population sizes. Our HMM is an analytically tractable approximation to the coalescent process with recombination, and in simulations we see no apparent bias in the HMM estimates. We apply the HMM to four autosomal contiguous human–chimp–gorilla–orangutan alignments comprising a total of 1.9 million base pairs. We find a very recent speciation time of human–chimp (4.1 ± 0.4 million years), and fairly large ancestral effective population sizes (65,000 ± 30,000 for the human–chimp ancestor and 45,000 ± 10,000 for the human–chimp–gorilla ancestor). Furthermore, around 50% of the human genome coalesces with chimpanzee after speciation with gorilla. We also consider 250,000 base pairs of X-chromosome alignments and find an effective population size much smaller than 75% of the autosomal effective population sizes. Finally, we find that the rate of transitions between different genealogies correlates well with the region-wide present-day human recombination rate, but does not correlate with the fine-scale recombination rates and recombination hot spots, suggesting that the latter are evolutionarily transient
- …
