158 research outputs found

    The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish

    Get PDF
    Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish were investigated by applying the Geographic Population Structure (GPS) to a cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that resemble the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German origins for Yiddish. We discuss how these findings advance three ongoing debates concerning (1) the historical meaning of the term "Ashkenaz;" (2) the genetic structure of AJs and their geographical origins as inferred from multiple studies employing both modern and ancient DNA and original ancient DNA analyses; and (3) the development of Yiddish. We provide additional validation to the non-Levantine origin of AJs using ancient DNA from the Near East and the Levant. Due to the rising popularity of geo-localization tools to address questions of origin, we briefly discuss the advantages and limitations of popular tools with focus on the GPS approach. Our results reinforce the non-Levantine origins of AJs

    Responding to an enquiry concerning the geographic population structure (GPS) approach and the origin of Ashkenazic Jews - a reply to Flegontov et al

    Get PDF
    Recently, we investigated the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish by applying a biogeographical tool, the Geographic Population Structure (GPS), to a cohort of 367 exclusively Yiddish-speaking and multilingual AJs genotyped on the Genochip microarray. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a German origin of both. Our approach has been recently adopted by Flegontov et al. (2016a) to trace the origin of the Siberian Ket people and their language. Recently, Flegontov et al. (2016b) have raised several questions concerning the accuracy of the Genochip microarray and GPS, specifically in relation to AJs and Yiddish. Although many of these issues have been addressed in our previous papers, we take this opportunity to clarify the principles of the GPS approach, review the recent biogeographical and ancient DNA findings regarding AJs, and comment on the origin of Yiddish

    Damping of zero sound in Luttinger liquids

    Full text link
    We calculate the damping gamma_q of collective density oscillations (zero sound) in a one-dimensional Fermi gas with dimensionless forward scattering interaction F and quadratic energy dispersion k^2 / 2 m at zero temperature. For wave-vectors | q| /k_F small compared with F we find to leading order gamma_q = v_F^{-1} m^{-2} Y (F) | q |^3, where v_F is the Fermi velocity, k_F is the Fermi wave-vector, and Y (F) is proportional to F^3 for small F. We also show that zero-sound damping leads to a finite maximum proportional to |k - k_F |^{-2 + 2 eta} of the charge peak in the single-particle spectral function, where eta is the anomalous dimension. Our prediction agrees with photoemission data for the blue bronze K_{0.3}MoO_3.Comment: final version as published; with more technical details; we have added a discussion of recent work which appeared after our initial cond-mat posting; 13 pages, 5 figure

    Dynamic structure factor of Luttinger liquids with quadratic energy dispersion and long-range interactions

    Full text link
    We calculate the dynamic structure factor S (omega, q) of spinless fermions in one dimension with quadratic energy dispersion k^2/2m and long range density-density interaction whose Fourier transform f_q is dominated by small momentum-transfers q << q_0 << k_F. Here q_0 is a momentum-transfer cutoff and k_F is the Fermi momentum. Using functional bosonization and the known properties of symmetrized closed fermion loops, we obtain an expansion of the inverse irreducible polarization to second order in the small parameter q_0 / k_F. In contrast to perturbation theory based on conventional bosonization, our functional bosonization approach is not plagued by mass-shell singularities. For interactions which can be expanded as f_q = f_0 + f_0^{2} q^2/2 + O (q^4) with finite f_0^{2} we show that the momentum scale q_c = 1/ | m f_0^{2} | separates two regimes characterized by a different q-dependence of the width gamma_q of the collective zero sound mode and other features of S (omega, q). For q_c << q << k_F we find that the line-shape in this regime is non-Lorentzian with an overall width gamma_q of order q^3/(m q_c) and a threshold singularity at the lower edge.Comment: 33 Revtex pages, 17 figure

    Validation and assessment of variant calling pipelines for next-generation sequencing

    Get PDF
    Background: The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal. Results: We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable. Conclusions: Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes

    A Hybrid Likelihood Model for Sequence-Based Disease Association Studies

    Get PDF
    In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values<0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing. © 2013 Chen et al

    SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Get PDF
    MOTIVATION: Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. RESULTS: The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. CONCLUSION: We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at

    QuickGO: a user tutorial for the web-based Gene Ontology browser

    Get PDF
    The Gene Ontology (GO) has proven to be a valuable resource for functional annotation of gene products. At well over 27 000 terms, the descriptiveness of GO has increased rapidly in line with the biological data it represents. Therefore, it is vital to be able to easily and quickly mine the functional information that has been made available through these GO terms being associated with gene products. QuickGO is a fast, web-based tool for browsing the GO and all associated GO annotations provided by the GOA group. After undergoing a redevelopment, QuickGO is now able to offer many more features beyond simple browsing. Users have responded well to the new tool and given very positive feedback about its usefulness. This tutorial will demonstrate how some of these features could be useful to the researcher wanting to discover more about their dataset, particular areas of biology or to find new ways of directing their research

    VennPlex--a novel Venn diagram program for comparing and visualizing datasets with differentially regulated datapoints.

    Get PDF
    With the development of increasingly large and complex genomic and proteomic data sets, an enhancement in the complexity of available Venn diagram analytical programs is becoming increasingly important. Current freely available Venn diagram programs often fail to represent extra complexity among datasets, such as regulation pattern differences between different groups. Here we describe the development of VennPlex, a program that illustrates the often diverse numerical interactions among multiple, high-complexity datasets, using up to four data sets. VennPlex includes versatile output features, where grouped data points in specific regions can be easily exported into a spreadsheet. This program is able to facilitate the analysis of two to four gene sets and their corresponding expression values in a user-friendly manner. To demonstrate its unique experimental utility we applied VennPlex to a complex paradigm, i.e. a comparison of the effect of multiple oxygen tension environments (1–20% ambient oxygen) upon gene transcription of primary rat astrocytes. VennPlex accurately dissects complex data sets reliably into easily identifiable groups for straightforward analysis and data output. This program, which is an improvement over currently available Venn diagram programs, is able to rapidly extract important datasets that represent the variety of expression patterns available within the data sets, showing potential applications in fields like genomics, proteomics, and bioinformatics

    Design, Validation and Annotation of Transcriptome-Wide Oligonucleotide Probes for the Oligochaete Annelid Eisenia fetida

    Get PDF
    High density oligonucleotide probe arrays have increasingly become an important tool in genomics studies. In organisms with incomplete genome sequence, one strategy for oligo probe design is to reduce the number of unique probes that target every non-redundant transcript through bioinformatic analysis and experimental testing. Here we adopted this strategy in making oligo probes for the earthworm Eisenia fetida, a species for which we have sequenced transcriptome-scale expressed sequence tags (ESTs). Our objectives were to identify unique transcripts as targets, to select an optimal and non-redundant oligo probe for each of these target ESTs, and to annotate the selected target sequences. We developed a streamlined and easy-to-follow approach to the design, validation and annotation of species-specific array probes. Four 244K-formatted oligo arrays were designed using eArray and were hybridized to a pooled E. fetida cRNA sample. We identified 63,541 probes with unsaturated signal intensities consistently above the background level. Target transcripts of these probes were annotated using several sequence alignment algorithms. Significant hits were obtained for 37,439 (59%) probed targets. We validated and made publicly available 63.5K oligo probes so the earthworm research community can use them to pursue ecological, toxicological, and other functional genomics questions. Our approach is efficient, cost-effective and robust because it (1) does not require a major genomics core facility; (2) allows new probes to be easily added and old probes modified or eliminated when new sequence information becomes available, (3) is not bioinformatics-intensive upfront but does provide opportunities for more in-depth annotation of biological functions for target genes; and (4) if desired, EST orthologs to the UniGene clusters of a reference genome can be identified and selected in order to improve the target gene specificity of designed probes. This approach is particularly applicable to organisms with a wealth of EST sequences but unfinished genome
    • …
    corecore