152 research outputs found

    Rooted triple consensus and anomalous gene trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Anomalous gene trees (AGTs) are gene trees with a topology different from a species tree that are more probable to observe than congruent gene trees. In this paper we propose a rooted triple approach to finding the correct species tree in the presence of AGTs.</p> <p>Results</p> <p>Based on simulated data we show that our method outperforms the <it>extended majority rule consensus </it>strategy, while still resolving the species tree. Applying both methods to a metazoan data set of 216 genes, we tested whether AGTs substantially interfere with the reconstruction of the metazoan phylogeny.</p> <p>Conclusion</p> <p>Evidence of AGTs was not found in this data set, suggesting that erroneously reconstructed gene trees are the most significant challenge in the reconstruction of phylogenetic relationships among species with current data. The new method does however rule out the erroneous reconstruction of deep or poorly resolved splits in the presence of lineage sorting.</p

    Big Genomes Facilitate the Comparative Identification of Regulatory Elements

    Get PDF
    The identification of regulatory sequences in animal genomes remains a significant challenge. Comparative genomic methods that use patterns of evolutionary conservation to identify non-coding sequences with regulatory function have yielded many new vertebrate enhancers. However, these methods have not contributed significantly to the identification of regulatory sequences in sequenced invertebrate taxa. We demonstrate here that this differential success, which is often attributed to fundamental differences in the nature of vertebrate and invertebrate regulatory sequences, is instead primarily a product of the relatively small size of sequenced invertebrate genomes. We sequenced and compared loci involved in early embryonic patterning from four species of true fruit flies (family Tephritidae) that have genomes four to six times larger than those of Drosophila melanogaster. Unlike in Drosophila, where virtually all non-coding DNA is highly conserved, blocks of conserved non-coding sequence in tephritids are flanked by large stretches of poorly conserved sequence, similar to what is observed in vertebrate genomes. We tested the activities of nine conserved non-coding sequences flanking the even-skipped gene of the teprhitid Ceratis capitata in transgenic D. melanogaster embryos, six of which drove patterns that recapitulate those of known D. melanogaster enhancers. In contrast, none of the three non-conserved tephritid non-coding sequences that we tested drove expression in D. melanogaster embryos. Based on the landscape of non-coding conservation in tephritids, and our initial success in using conservation in tephritids to identify D. melanogaster regulatory sequences, we suggest that comparison of tephritid genomes may provide a systematic means to annotate the non-coding portion of the D. melanogaster genome. We also propose that large genomes be given more consideration in the selection of species for comparative genomics projects, to provide increased power to detect functional non-coding DNAs and to provide a less biased view of the evolution and function of animal genomes

    Reduced Lentivirus Susceptibility in Sheep with TMEM154 Mutations

    Get PDF
    Visna/Maedi, or ovine progressive pneumonia (OPP) as it is known in the United States, is an incurable slow-acting disease of sheep caused by persistent lentivirus infection. This disease affects multiple tissues, including those of the respiratory and central nervous systems. Our aim was to identify ovine genetic risk factors for lentivirus infection. Sixty-nine matched pairs of infected cases and uninfected controls were identified among 736 naturally exposed sheep older than five years of age. These pairs were used in a genome-wide association study with 50,614 markers. A single SNP was identified in the ovine transmembrane protein (TMEM154) that exceeded genome-wide significance (unadjusted p-value 3×10−9). Sanger sequencing of the ovine TMEM154 coding region identified six missense and two frameshift deletion mutations in the predicted signal peptide and extracellular domain. Two TMEM154 haplotypes encoding glutamate (E) at position 35 were associated with infection while a third haplotype with lysine (K) at position 35 was not. Haplotypes encoding full-length E35 isoforms were analyzed together as genetic risk factors in a multi-breed, matched case-control design, with 61 pairs of 4-year-old ewes. The odds of infection for ewes with one copy of a full-length TMEM154 E35 allele were 28 times greater than the odds for those without (p-value<0.0001, 95% CI 5–1,100). In a combined analysis of nine cohorts with 2,705 sheep from Nebraska, Idaho, and Iowa, the relative risk of infection was 2.85 times greater for sheep with a full-length TMEM154 E35 allele (p-value<0.0001, 95% CI 2.36–3.43). Although rare, some sheep were homozygous for TMEM154 deletion mutations and remained uninfected despite a lifetime of significant exposure. Together, these findings indicate that TMEM154 may play a central role in ovine lentivirus infection and removing sheep with the most susceptible genotypes may help eradicate OPP and protect flocks from reinfection

    Similar exemplar pooling processes underlie the learning of facial identity and handwriting style: Evidence from typical observers and individuals with Autism

    Get PDF
    Considerable research has addressed whether the cognitive and neural representations recruited by faces are similar to those engaged by other types of visual stimuli. For example, research has examined the extent to which objects of expertise recruit holistic representation and engage the fusiform face area. Little is known, however, about the domain-specificity of the exemplar pooling processes thought to underlie the acquisition of familiarity with particular facial identities. In the present study we sought to compare observers’ ability to learn facial identities and handwriting styles from exposure to multiple exemplars. Crucially, while handwritten words and faces differ considerably in their topographic form, both learning tasks share a common exemplar pooling component. In our first experiment, we find that typical observers’ ability to learn facial identities and handwriting styles from exposure to multiple exemplars correlates closely. In our second experiment, we show that observers with autism spectrum disorder (ASD) are impaired at both learning tasks. Our findings suggest that similar exemplar pooling processes are recruited when learning facial identities and handwriting styles. Models of exemplar pooling originally developed to explain face learning, may therefore offer valuable insights into exemplar pooling across a range of domains, extending beyond faces. Aberrant exemplar pooling, possibly resulting from structural differences in the inferior longitudinal fasciculus, may underlie difficulties recognising familiar faces often experienced by individuals with ASD, and leave observers overly reliant on local details present in particular exemplars

    Commercially Available Outbred Mice for Genome-Wide Association Studies

    Get PDF
    Genome-wide association studies using commercially available outbred mice can detect genes involved in phenotypes of biomedical interest. Useful populations need high-frequency alleles to ensure high power to detect quantitative trait loci (QTLs), low linkage disequilibrium between markers to obtain accurate mapping resolution, and an absence of population structure to prevent false positive associations. We surveyed 66 colonies for inbreeding, genetic diversity, and linkage disequilibrium, and we demonstrate that some have haplotype blocks of less than 100 Kb, enabling gene-level mapping resolution. The same alleles contribute to variation in different colonies, so that when mapping progress stalls in one, another can be used in its stead. Colonies are genetically diverse: 45% of the total genetic variation is attributable to differences between colonies. However, quantitative differences in allele frequencies, rather than the existence of private alleles, are responsible for these population differences. The colonies derive from a limited pool of ancestral haplotypes resembling those found in inbred strains: over 95% of sequence variants segregating in outbred populations are found in inbred strains. Consequently it is possible to impute the sequence of any mouse from a dense SNP map combined with inbred strain sequence data, which opens up the possibility of cataloguing and testing all variants for association, a situation that has so far eluded studies in completely outbred populations. We demonstrate the colonies' potential by identifying a deletion in the promoter of H2-Ea as the molecular change that strongly contributes to setting the ratio of CD4+ and CD8+ lymphocytes

    The Cotton Centromere Contains a Ty3-gypsy-like LTR Retroelement

    Get PDF
    The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species

    Integrated physical, genetic and genome map of chickpea (Cicer arietinum L.)

    Get PDF
    Physical map of chickpea was developed for the reference chickpea genotype (ICC 4958) using bacterial artificial chromosome (BAC) libraries targeting 71,094 clones (~12× coverage). High information content fingerprinting (HICF) of these clones gave high-quality fingerprinting data for 67,483 clones, and 1,174 contigs comprising 46,112 clones and 3,256 singletons were defined. In brief, 574 Mb genome size was assembled in 1,174 contigs with an average of 0.49 Mb per contig and 3,256 singletons represent 407 Mb genome. The physical map was linked with two genetic maps with the help of 245 BAC-end sequence (BES)-derived simple sequence repeat (SSR) markers. This allowed locating some of the BACs in the vicinity of some important quantitative trait loci (QTLs) for drought tolerance and reistance to Fusarium wilt and Ascochyta blight. In addition, fingerprinted contig (FPC) assembly was also integrated with the draft genome sequence of chickpea. As a result, ~965 BACs including 163 minimum tilling path (MTP) clones could be mapped on eight pseudo-molecules of chickpea forming 491 hypothetical contigs representing 54,013,992 bp (~54 Mb) of the draft genome. Comprehensive analysis of markers in abiotic and biotic stress tolerance QTL regions led to identification of 654, 306 and 23 genes in drought tolerance “QTL-hotspot” region, Ascochyta blight resistance QTL region and Fusarium wilt resistance QTL region, respectively. Integrated physical, genetic and genome map should provide a foundation for cloning and isolation of QTLs/genes for molecular dissection of traits as well as markers for molecular breeding for chickpea improvement

    Modelling malaria treatment practices in Bangladesh using spatial statistics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Malaria treatment-seeking practices vary worldwide and Bangladesh is no exception. Individuals from 88 villages in Rajasthali were asked about their treatment-seeking practices. A portion of these households preferred malaria treatment from the National Control Programme, but still a large number of households continued to use drug vendors and approximately one fourth of the individuals surveyed relied exclusively on non-control programme treatments. The risks of low-control programme usage include incomplete malaria treatment, possible misuse of anti-malarial drugs, and an increased potential for drug resistance.</p> <p>Methods</p> <p>The spatial patterns of treatment-seeking practices were first examined using hot-spot analysis (Local Getis-Ord Gi statistic) and then modelled using regression. Ordinary least squares (OLS) regression identified key factors explaining more than 80% of the variation in control programme and vendor treatment preferences. Geographically weighted regression (GWR) was then used to assess where each factor was a strong predictor of treatment-seeking preferences.</p> <p>Results</p> <p>Several factors including tribal affiliation, housing materials, household densities, education levels, and proximity to the regional urban centre, were found to be effective predictors of malaria treatment-seeking preferences. The predictive strength of each of these factors, however, varied across the study area. While education, for example, was a strong predictor in some villages, it was less important for predicting treatment-seeking outcomes in other villages.</p> <p>Conclusion</p> <p>Understanding where each factor is a strong predictor of treatment-seeking outcomes may help in planning targeted interventions aimed at increasing control programme usage. Suggested strategies include providing additional training for the Building Resources across Communities (BRAC) health workers, implementing educational programmes, and addressing economic factors.</p

    Genome wide SNP discovery, analysis and evaluation in mallard (Anas platyrhynchos)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next generation sequencing technologies allow to obtain at low cost the genomic sequence information that currently lacks for most economically and ecologically important organisms. For the mallard duck genomic data is limited. The mallard is, besides a species of large agricultural and societal importance, also the focal species when it comes to long distance dispersal of Avian Influenza. For large scale identification of SNPs we performed Illumina sequencing of wild mallard DNA and compared our data with ongoing genome and EST sequencing of domesticated conspecifics. This is the first study of its kind for waterfowl.</p> <p>Results</p> <p>More than one billion base pairs of sequence information were generated resulting in a 16× coverage of a reduced representation library of the mallard genome. Sequence reads were aligned to a draft domesticated duck reference genome and allowed for the detection of over 122,000 SNPs within our mallard sequence dataset. In addition, almost 62,000 nucleotide positions on the domesticated duck reference showed a different nucleotide compared to wild mallard. Approximately 20,000 SNPs identified within our data were shared with SNPs identified in the sequenced domestic duck or in EST sequencing projects. The shared SNPs were considered to be highly reliable and were used to benchmark non-shared SNPs for quality. Genotyping of a representative sample of 364 SNPs resulted in a SNP conversion rate of 99.7%. The correlation of the minor allele count and observed minor allele frequency in the SNP discovery pool was 0.72.</p> <p>Conclusion</p> <p>We identified almost 150,000 SNPs in wild mallards that will likely yield good results in genotyping. Of these, ~101,000 SNPs were detected within our wild mallard sequences and ~49,000 were detected between wild and domesticated duck data. In the ~101,000 SNPs we found a subset of ~20,000 SNPs shared between wild mallards and the sequenced domesticated duck suggesting a low genetic divergence. Comparison of quality metrics between the total SNP set (122,000 + 62,000 = 184,000 SNPs) and the validated subset shows similar characteristics for both sets. This indicates that we have detected a large amount (~150,000) of accurately inferred mallard SNPs, which will benefit bird evolutionary studies, ecological studies (e.g. disentangling migratory connectivity) and industrial breeding programs.</p

    Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches

    Get PDF
    A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5′ ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated
    corecore