1,499 research outputs found
Recommended from our members
Error, reproducibility and sensitivity : a pipeline for data processing of Agilent oligonucleotide expression arrays
Background
Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples.
Results
We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log2 units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators.
Conclusions
This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells
The geography of recent genetic ancestry across Europe
The recent genealogical history of human populations is a complex mosaic
formed by individual migration, large-scale population movements, and other
demographic events. Population genomics datasets can provide a window into this
recent history, as rare traces of recent shared genetic ancestry are detectable
due to long segments of shared genomic material. We make use of genomic data
for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of
recent genealogical ancestry over the past three thousand years at a
continental scale. We detected 1.9 million shared genomic segments, and used
the lengths of these to infer the distribution of shared ancestors across time
and geography. We find that a pair of modern Europeans living in neighboring
populations share around 10-50 genetic common ancestors from the last 1500
years, and upwards of 500 genetic ancestors from the previous 1000 years. These
numbers drop off exponentially with geographic distance, but since genetic
ancestry is rare, individuals from opposite ends of Europe are still expected
to share millions of common genealogical ancestors over the last 1000 years.
There is substantial regional variation in the number of shared genetic
ancestors: especially high numbers of common ancestors between many eastern
populations likely date to the Slavic and/or Hunnic expansions, while much
lower levels of common ancestry in the Italian and Iberian peninsulas may
indicate weaker demographic effects of Germanic expansions into these areas
and/or more stably structured populations. Recent shared ancestry in modern
Europeans is ubiquitous, and clearly shows the impact of both small-scale
migration and large historical events. Population genomic datasets have
considerable power to uncover recent demographic history, and will allow a much
fuller picture of the close genealogical kinship of individuals across the
world.Comment: Full size figures available from
http://www.eve.ucdavis.edu/~plralph/research.html; or html version at
http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm
Inference of population splits and mixtures from genome-wide allele frequency data
Many aspects of the historical relationships between populations in a species
are reflected in genetic data. Inferring these relationships from genetic data,
however, remains a challenging task. In this paper, we present a statistical
model for inferring the patterns of population splits and mixtures in multiple
populations. In this model, the sampled populations in a species are related to
their common ancestor through a graph of ancestral populations. Using
genome-wide allele frequency data and a Gaussian approximation to genetic
drift, we infer the structure of this graph. We applied this method to a set of
55 human populations and a set of 82 dog breeds and wild canids. In both
species, we show that a simple bifurcating tree does not fully describe the
data; in contrast, we infer many migration events. While some of the migration
events that we find have been detected previously, many have not. For example,
in the human data we infer that Cambodians trace approximately 16% of their
ancestry to a population ancestral to other extant East Asian populations. In
the dog data, we infer that both the boxer and basenji trace a considerable
fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to
domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese)
result from admixture between modern toy breeds and "ancient" Asian breeds.
Software implementing the model described here, called TreeMix, is available at
http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15
figures. This is an updated version of the preprint available at
http://precedings.nature.com/documents/6956/version/
Variability in EIT Images of Lung Ventilation as a Function of Electrode Planes and Body Positions
This study is aimed at investigating the variability in resistivity changes in the lung region as a function of air volume, electrode plane and body position. Six normal subjects (33.8 ± 4.7 years, range from 26 to 37 years) were studied using the Sheffield Electrical Impedance Tomography (EIT) portable system. Three transverse planes at the level of second intercostal space, the level of the xiphisternal joint, and midway between upper and lower locations were chosen for measurements. For each plane, sixteen electrodes were uniformly positioned around the thorax. Data were collected with the breath held at end expiration and after inspiring 0.5, 1.0, or 1.5 liters of air from end expiration, with the subject in both the supine and sitting position. The average resistivity change in five regions, two 8x8 pixel local regions in the right lung, entire right, entire left and total lung regions, were calculated. The results show the resistivity change averaged over electrode positions and subject positions was 7-9% per liter of air, with a slightly larger resistivity change of 10 % per liter air in the lower electrode plane. There was no significant difference (p\u3e0.05) between supine and sitting. The two 8x8 regions show a larger inter individual variability (coefficient of variation, CV, is from 30% to 382%) compared to the entire left, entire right and total lung (CV is from 11% to 51%). The results for the global regions are more consistent. The large inter individual variability appears to be a problem for clinical applications of EIT, such as regional ventilation. The variability may be mitigated by choosing appropriate electrode plane, body position and region of interest for the analysis
A preliminary study of genetic factors that influence susceptibility to bovine tuberculosis in the British cattle herd
Associations between specific host genes and susceptibility to Mycobacterial infections such as tuberculosis have been reported in several species. Bovine tuberculosis (bTB) impacts greatly the UK cattle industry, yet genetic predispositions have yet to be identified. We therefore used a candidate gene approach to study 384 cattle of which 160 had reacted positively to an antigenic skin test (‘reactors’). Our approach was unusual in that it used microsatellite markers, embraced high breed diversity and focused particularly on detecting genes showing heterozygote advantage, a mode of action often overlooked in SNP-based studies. A panel of neutral markers was used to control for population substructure and using a general linear model-based approach we were also able to control for age. We found that substructure was surprisingly weak and identified two genomic regions that were strongly associated with reactor status, identified by markers INRA111 and BMS2753. In general the strength of association detected tended to vary depending on whether age was included in the model. At INRA111 a single genotype appears strongly protective with an overall odds ratio of 2.2, the effect being consistent across nine diverse breeds. Our results suggest that breeding strategies could be devised that would appreciably increase genetic resistance of cattle to bTB (strictly, reduce the frequency of incidence of reactors) with implications for the current debate concerning badger-culling
Timescales of Massive Human Entrainment
The past two decades have seen an upsurge of interest in the collective
behaviors of complex systems composed of many agents entrained to each other
and to external events. In this paper, we extend concepts of entrainment to the
dynamics of human collective attention. We conducted a detailed investigation
of the unfolding of human entrainment - as expressed by the content and
patterns of hundreds of thousands of messages on Twitter - during the 2012 US
presidential debates. By time locking these data sources, we quantify the
impact of the unfolding debate on human attention. We show that collective
social behavior covaries second-by-second to the interactional dynamics of the
debates: A candidate speaking induces rapid increases in mentions of his name
on social media and decreases in mentions of the other candidate. Moreover,
interruptions by an interlocutor increase the attention received. We also
highlight a distinct time scale for the impact of salient moments in the
debate: Mentions in social media start within 5-10 seconds after the moment;
peak at approximately one minute; and slowly decay in a consistent fashion
across well-known events during the debates. Finally, we show that public
attention after an initial burst slowly decays through the course of the
debates. Thus we demonstrate that large-scale human entrainment may hold across
a number of distinct scales, in an exquisitely time-locked fashion. The methods
and results pave the way for careful study of the dynamics and mechanisms of
large-scale human entrainment.Comment: 20 pages, 7 figures, 6 tables, 4 supplementary figures. 2nd version
revised according to peer reviewers' comments: more detailed explanation of
the methods, and grounding of the hypothese
Genome-wide study of association and interaction with maternal cytomegalovirus infection suggests new schizophrenia loci.
Genetic and environmental components as well as their interaction contribute to the risk of schizophrenia, making it highly relevant to include environmental factors in genetic studies of schizophrenia. This study comprises genome-wide association (GWA) and follow-up analyses of all individuals born in Denmark since 1981 and diagnosed with schizophrenia as well as controls from the same birth cohort. Furthermore, we present the first genome-wide interaction survey of single nucleotide polymorphisms (SNPs) and maternal cytomegalovirus (CMV) infection. The GWA analysis included 888 cases and 882 controls, and the follow-up investigation of the top GWA results was performed in independent Danish (1396 cases and 1803 controls) and German-Dutch (1169 cases, 3714 controls) samples. The SNPs most strongly associated in the single-marker analysis of the combined Danish samples were rs4757144 in ARNTL (P=3.78 × 10(-6)) and rs8057927 in CDH13 (P=1.39 × 10(-5)). Both genes have previously been linked to schizophrenia or other psychiatric disorders. The strongest associated SNP in the combined analysis, including Danish and German-Dutch samples, was rs12922317 in RUNDC2A (P=9.04 × 10(-7)). A region-based analysis summarizing independent signals in segments of 100 kb identified a new region-based genome-wide significant locus overlapping the gene ZEB1 (P=7.0 × 10(-7)). This signal was replicated in the follow-up analysis (P=2.3 × 10(-2)). Significant interaction with maternal CMV infection was found for rs7902091 (P(SNP × CMV)=7.3 × 10(-7)) in CTNNA3, a gene not previously implicated in schizophrenia, stressing the importance of including environmental factors in genetic studies
The genetic prehistory of southern Africa
Southern and eastern African populations that speak non-Bantu languages with
click consonants are known to harbour some of the most ancient genetic lineages
in humans, but their relationships are poorly understood. Here, we report data
from 23 populations analyzed at over half a million single nucleotide
polymorphisms, using a genome-wide array designed for studying human history.
The southern African Khoisan fall into two genetic groups, loosely
corresponding to the northwestern and southeastern Kalahari, which we show
separated within the last 30,000 years. We find that all individuals derive at
least a few percent of their genomes from admixture with non-Khoisan
populations that began approximately 1,200 years ago. In addition, the east
African Hadza and Sandawe derive a fraction of their ancestry from admixture
with a population related to the Khoisan, supporting the hypothesis of an
ancient link between southern and eastern AfricaComment: To appear in Nature Communication
Nondisjunction and transmission ratio distortion ofChromosome 2 in a (2.8) Robertsonian translocation mouse strain
Aneuploidy results from nondisjunction of chromosomes in meiosis and is the leading cause of developmental disabilities and mental retardation in humans. Therefore, understanding aspects of chromosome segregation in a genetic model is of value. Mice heterozygous for a (2.8) Robertsonian translocation were intercrossed with chromosomally normal mice and Chromosome 2 was genotyped for number and parental origin in 836 individuals at 8.5 dpc. The frequency of nondisjunction of this Robertsonian chromosome is 1.58%. Trisomy of Chromosome 2 with two maternally derived chromosomes is the most developmentally successful aneuploid karyotype at 8.5 dpc. Trisomy of Chromosome 2 with two paternally derived chromosomes is developmentally delayed and less frequent than the converse. Individuals with maternal or paternal uniparental disomy of Chromosome 2 were not detected at 8.5 dpc. Nondisjunction events were distributed randomly across litters, i.e., no evidence for clustering was found. Transmission ratio distortion is frequently observed in Robertsonian chromosomes and a bias against the transmission of the (2.8) Chromosome was detected. Interestingly, this was observed for female and male transmitting parents
- …