20 research outputs found

    Escape Excel: A tool for preventing gene symbol and accession conversion errors

    No full text
    <div><p>Background</p><p>Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.</p><p>Results</p><p>Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (<a href="http://www.github.com/pstew/escape_excel" target="_blank">http://www.github.com/pstew/escape_excel</a>), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (<a href="http://apostl.moffitt.org" target="_blank">http://apostl.moffitt.org</a>) and simple non-Galaxy web server (<a href="http://apostl.moffitt.org:8000/" target="_blank">http://apostl.moffitt.org:8000/</a>).</p><p>Conclusions</p><p>Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.</p></div

    Screen capture of escape excel Galaxy web server interface.

    No full text
    <p>The results from an example data processing workflow are shown, ready for download (right pane), after uploading a file with the Upload Data tool (left pane) and processing the selected file with the Escape Excel tool (tool selected in the left pane, options selected in the middle pane). A step-by-step tutorial is provided underneath the form in the middle pane.</p

    Screen capture of escape excel OS X application.

    No full text
    <p>Files to be escaped can be drag-and-dropped onto the application, which will then automatically export escaped versions of the files.</p

    Screen capture of escape excel command line tool help text.

    No full text
    <p>Unknown command line options, including—help, will abort the program with a brief usage statement, including command syntax and descriptions of supported option flags.</p

    Differentially Expressed Transcripts and Dysregulated Signaling Pathways and Networks in African American Breast Cancer

    No full text
    <div><p>African Americans (AAs) have higher mortality rate from breast cancer than that of Caucasian Americans (CAs) even when socioeconomic factors are accounted for. To better understand the driving biological factors of this health disparity, we performed a comprehensive differential gene expression analysis, including subtype- and stage-specific analysis, using the breast cancer data in the Cancer Genome Atlas (TCGA). In total, 674 unique genes and other transcripts were found differentially expressed between these two populations. The numbers of differentially expressed genes between AA and CA patients increased in each stage of tumor progression: there were 26 in stage I, 161 in stage II, and 223 in stage III. Resistin, a gene that is linked to obesity, insulin resistance, and breast cancer, was expressed more than four times higher in AA tumors. An uncharacterized, long, non-coding RNA, LOC90784, was down-regulated in AA tumors, and its expression was inversely related to cancer stage and was the lowest in triple negative AA breast tumors. Network analysis showed increased expression of a majority of components in p53 and BRCA1 subnetworks in AA breast tumor samples, and members of the aurora B and polo-like kinase signaling pathways were also highly expressed. Higher gene expression diversity was observed in more advanced stage breast tumors suggesting increased genomic instability during tumor progression. Amplified resistin expression may indicate insulin-resistant type II diabetes and obesity are associated with AA breast cancer. Expression of LOC90784 may have a protective effect on breast cancer patients, and its loss, particularly in triple negative breast cancer, could be having detrimental effects. This work helps elucidate molecular mechanisms of breast cancer health disparity and identifies putative biomarkers and therapeutic targets such as resistin, and the aurora B and polo-like kinase signaling pathways for treating AA breast cancer patients. </p> </div

    LOC90784, a long non-coding RNA, was differentially expressed across comparisons of African- and Caucasian-American tumors.

    No full text
    <p>The expression of this transcript was consistent in CA tumors, but its expression was inversely related to stage and was lowest in AA with TNBC. P-values for the comparisons were overall: 1.46E-14, luminal A: 1.98E-04, stage I: 6.61E-04, stage II: 3.63E-08, stage III: 2.57E-05, and triple negative: 1.86E-09 (see Supplementary Tables). Error bars are standard error.</p

    Venn diagram depicting overlap of differentially expressed genes and other transcripts between stage-matched African- and Caucasian-American tumors.

    No full text
    <p>Increases or decreases indicate AA gene expression and are relative to CA expression. Gene names: FYCO1 - FYVE and coiled-coil domain containing 1; LOC90784 - Not Available; LOC285359 - Not Available; LRRC37A2 - leucine rich repeat containing 37, member A2; MEIS3P1 - Meis homeobox 3 pseudogene 1; NOTCH2NL - notch 2 N-terminal like; PRSS45 - protease, serine, 45.</p

    Differentially expressed subnetworks identified by Gene Expression Network Analysis.

    No full text
    <p>Subnetworks containing p53 (A) and BRCA1 (B) were differentially expressed in AA tumors. Subnetworks were identified using GXNA and visualized using STRING. Starred results were not differentially expressed but were included in the subnetwork by GXNA. Values in parentheses are the mean fold changes of log<sub>2</sub>-transformed AA expression relative to CA expression, calculated as log<sub>2</sub>(CA)/log<sub>2</sub>(AA). Gene names: HMGB2: BRCA1 - breast cancer 1, early onset; CCNB2 - cyclin B2; CDC25B - cell division cycle 25B; CDK1 - cyclin-dependent kinase 1; CKS2 - CDC28 protein kinase regulatory subunit 2; ELAVL1 - ELAV (embryonic lethal, abnormal vision, Drosophila)-like 1 (Hu antigen R); FANCA - Fanconi anemia, complementation group A; HMGB2 - high mobility group box 2; HSF1 - heat shock transcription factor 1; HSPA8 - heat shock 70kDa protein 8; PKMYT1 - protein kinase, membrane. associated tyrosine/threonine 1; PML - promyelocytic leukemia; PPP1R13L - protein phosphatase 1, regulatory subunit 13 like; PTMA - prothymosin, alpha; RAD50 - RAD50 homolog (S. cerevisiae); RAD51 - RAD51 recombinase; TP53 - tumor protein p53; TXN – thioredoxin.</p

    APOSTL: An Interactive Galaxy Pipeline for Reproducible Analysis of Affinity Proteomics Data

    No full text
    With continuously increasing scale and depth of coverage in affinity proteomics (AP–MS) data, the analysis and visualization is becoming more challenging. A number of tools have been developed to identify high-confidence interactions; however, a cohesive and intuitive pipeline for analysis and visualization is still needed. Here we present Automated Processing of SAINT Templated Layouts (APOSTL), a freely available Galaxy-integrated software suite and analysis pipeline for reproducible, interactive analysis of AP–MS data. APOSTL contains a number of tools woven together using Galaxy workflows, which are intuitive for the user to move from raw data to publication-quality figures within a single interface. APOSTL is an evolving software project with the potential to customize individual analyses with additional Galaxy tools and widgets using the R web application framework, Shiny. The source code, data, and documentation are freely available from GitHub (https://github.com/bornea/APOSTL) and other sources
    corecore