7 research outputs found
Additional file 1: of A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Figure S1. Detailed sample description displayed in the internal GWAS Catalog curation interface. Figure S2. a Structured ancestry and recruitment information displayed in the internal GWAS Catalog curation interface. b GWAS Catalog ancestry and recruitment data entry page of internal curation interface. Supplementary Box 1. Genomic methods of ancestry determination. Figure S3. Distribution of studies by ancestry category focused on Catalog traits with highest number of studies in the Catalog. Figure S4. Methods of ancestry ascertainment used in a subset of publications included in the GWAS Catalog. Supplementary References. (DOCX 893 kb
Additional file 2: of A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Table S1. GWAS Catalog countries of recruitment for which no ancestry information was provided. (XLSX 77 kb
Additional file 3: of A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Table S2. GWAS Catalog detailed descriptions with ancestry category assignments. (XLSX 77 kb
Additional file 5: of A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Table S4. HapMap Project and 1000 Genomes Project populations with assigned ancestry category. (XLSX 27 kb
Additional file 4: of A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Table S3. Specific examples to illustrate the application of the framework to the GWAS Catalog. (XLSX 70 kb
Distribution of issues addressed and an example region.
<p>(Top Panel) Issues for GRCh37, GRCh37.p1, and GRCh37.p2, broken down by type. Issue types are: Clone Problem: The issue is contained within a single clone. This may be a single nucleotide difference or a clone mis-assembly. Path Problem: There is evidence that the tiling path within a given region is incorrect and we will need to update the path. GRC Housekeeping: Changes use to help regularize the tiling path. Missing Sequence: Sequence that we can’t yet place on the assembly. Mapping studies are ongoing to help place these sequences. Variation: There is evidence to suggest that complex variation is complicating a region and an alternate allele may need to be produced. Gap: The issue concerns filling a gap. Unknown: Issue is still under investigation for classification. (Bottom Panel) Details for issue HG-2, a Path Problem. The representation in NCBI36 was a mixed haplotype. The tiling paths for NCBI36 and GRCh37 are shown. Blue clones are anchor clones that are in NCBI36, the GRCh37 chr4 path, and the GRCh37 alternate locus path. Red clones represent the UGT2B17 insertion path and dark gray clones represent the UGT2B17 deletion path. The light gray clone was not used in NCBI36, but was used in GRCh37 to complete the alternate locus.</p
Assembly representation for GRCh37.p3.
<p>The top panel shows an ideogram representation of the human genome. The primary assembly unit contains sequences for the non-redundant haploid assembly; this includes the scaffolds that make up the chromosome sequence as well as unplaced and unlocalized scaffolds that are thought to represent novel sequence (not shown in this picture). Alternate loci and patches are placed in separate assembly units to facilitate annotation. Note the seven alternate scaffolds in the MHC region are all placed in different assembly units, as they all represent different representations of the same sequences. Other alternate loci can be added to these assembly units at the next major release if they don’t overlap the existing alternates. All patches are placed in the PATCHES assembly unit and minor releases are cumulative such that the latest minor release will contain all patches. The red triangle, yellow circles, and blue circles represent regions that contain additional sequences that are not given actual chromosome coordinates, but rather are given a chromosome context via alignment to the primary assembly. The red triangles represent regions’ alternate loci; these are sequences that provide an additional tiling path to the one given in the chromosome representation and are essential for representing structurally complex loci. The circles represent patch sequences; these are minor updates made to the assembly outside of the major build cycle. Yellow circles represent “fix” patches: regions of the chromosome assembly that will change with the next major assembly update. Blue circles represent “novel” patches: these are sequences that represent new alternate loci in the next major assembly update. Unlocalized and unplaced sequences are not represented in this figure. Sequences within the assembly are placed within containers known as assembly units. Note: a region can point to more than one type of extra chromosomal sequence; for example, a region could point to an alternate locus and to a fix or novel patch.</p