111 research outputs found

    PCA and K-Means decipher genome

    Full text link
    In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plot

    Does socioeconomic disparity in cancer incidence vary across racial/ethnic groups?

    Get PDF
    Objective Very few studies have simultaneously examined incidence of the leading cancers in relation to socioeconomic status (SES) and race/ethnicity in populations including Hispanics and Asians. This study aims to describe SES disparity in cancer incidence within each of four major racial/ethnic groups (non-Hispanic white, black, Hispanic, and Asian/Pacific Islander) for five major cancer sites, including female breast cancer, colorectal cancer, cervical cancer, lung cancer, and prostate cancer. Methods Invasive cancers of the five major sites diagnosed from 1998 to 2002 (n = 376,158) in California were included in the study. Composite area-based SES measures were used to quantify SES level and to calculate cancer incidence rates stratified by SES. Relative index of inequality (RII) was generated to measure SES gradient of cancer incidence within each racial/ethnic group. Results Significant variations were detected in SES disparities across the racial/ethnic groups for all five major cancer sites. Female breast cancer and prostate cancer incidence increased with increased SES in all groups, with the trend strongest among Hispanics. Incidence of cervical cancer increased with decreased SES, with the largest gradient among non-Hispanic white women. Lung cancer incidence increased with decreased SES with the exception of Hispanic men and women, for whom SES gradient was in the opposite direction. For colorectal cancer, higher incidence was associated with lower SES in non-Hispanic whites but with higher SES in Hispanics and Asian/Pacific Islander women. Conclusions Examining SES disparity stratified by race/ethnicity enhances our understanding of the complex relationships between cancer incidence, SES, and race/ethnicity

    Spin transport and spin torque in antiferromagnetic devices

    Get PDF
    Ferromagnets are key materials for sensing and memory applications. In contrast, antiferromagnets which represent the more common form of magnetically ordered materials, have found less practical application beyond their use for establishing reference magnetic orientations via exchange bias. This might change in the future due to the recent progress in materials research and discoveries of antiferromagnetic spintronic phenomena suitable for device applications. Experimental demonstration of the electrical switching and detection of the Néel order open a route towards memory devices based on antiferromagnets. Apart from the radiation and magnetic-field hardness, memory cells fabricated from antiferromagnets can be inherently multilevel, which could be used for neuromorphic computing. Switching speeds attainable in antiferromagnets far exceed those of ferromagnetic and semiconductor memory technologies. Here we review the recent progress in electronic spin-transport and spin-torque phenomena in antiferromagnets that are dominantly of the relativistic quantum mechanical origin. We discuss their utility in pure antiferromagnetic or hybrid ferromagnetic/antiferromagnetic memory devices

    Outlook for inverse design in nanophotonics

    Full text link
    Recent advancements in computational inverse design have begun to reshape the landscape of structures and techniques available to nanophotonics. Here, we outline a cross section of key developments at the intersection of these two fields: moving from a recap of foundational results to motivation of emerging applications in nonlinear, topological, near-field and on-chip optics.Comment: 13 pages, 6 figure

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe
    corecore