17 research outputs found
Escape Excel: A tool for preventing gene symbol and accession conversion errors
<div><p>Background</p><p>Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.</p><p>Results</p><p>Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (<a href="http://www.github.com/pstew/escape_excel" target="_blank">http://www.github.com/pstew/escape_excel</a>), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (<a href="http://apostl.moffitt.org" target="_blank">http://apostl.moffitt.org</a>) and simple non-Galaxy web server (<a href="http://apostl.moffitt.org:8000/" target="_blank">http://apostl.moffitt.org:8000/</a>).</p><p>Conclusions</p><p>Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.</p></div
Screen capture of escape excel Galaxy web server interface.
<p>The results from an example data processing workflow are shown, ready for download (right pane), after uploading a file with the Upload Data tool (left pane) and processing the selected file with the Escape Excel tool (tool selected in the left pane, options selected in the middle pane). A step-by-step tutorial is provided underneath the form in the middle pane.</p
Screen capture of escape excel command line tool help text.
<p>Unknown command line options, including—help, will abort the program with a brief usage statement, including command syntax and descriptions of supported option flags.</p
Screen capture of escape excel OS X application.
<p>Files to be escaped can be drag-and-dropped onto the application, which will then automatically export escaped versions of the files.</p
Identifying Regulatory Changes to Facilitate Nitrogen Fixation in the Nondiazotroph <i>Synechocystis</i> sp. PCC 6803
The
incorporation of biological nitrogen fixation into a nondiazotrophic
photosynthetic organism provides a promising solution to the increasing
fixed nitrogen demand, but is accompanied by a number of challenges
for accommodating two incompatible processes within the same organism.
Here we present regulatory influence networks for two cyanobacteria, <i>Synechocystis</i> PCC 6803 and <i>Cyanothece</i> ATCC
51142, and evaluate them to co-opt native transcription factors that
may be used to control the <i>nif</i> gene cluster once
it is transferred to <i>Synechocystis</i>. These networks
were further examined to identify candidate transcription factors
for other metabolic processes necessary for temporal separation of
photosynthesis and nitrogen fixation, glycogen catabolism and cyanophycin
synthesis. Two transcription factors native to <i>Synechocystis</i>, LexA and Rcp1, were identified as promising candidates for the
control of the <i>nif</i> gene cluster and other pertinent
metabolic processes, respectively. Lessons learned in the incorporation
of nitrogen fixation into a nondiazotrophic prokaryote may be leveraged
to further progress the incorporation of nitrogen fixation in plants
A Pilot Proteogenomic Study with Data Integration Identifies MCT1 and GLUT1 as Prognostic Markers in Lung Adenocarcinoma
<div><p>We performed a pilot proteogenomic study to compare lung adenocarcinoma to lung squamous cell carcinoma using quantitative proteomics (6-plex TMT) combined with a customized Affymetrix GeneChip. Using MaxQuant software, we identified 51,001 unique peptides that mapped to 7,241 unique proteins and from these identified 6,373 genes with matching protein expression for further analysis. We found a minor correlation between gene expression and protein expression; both datasets were able to independently recapitulate known differences between the adenocarcinoma and squamous cell carcinoma subtypes. We found 565 proteins and 629 genes to be differentially expressed between adenocarcinoma and squamous cell carcinoma, with 113 of these consistently differentially expressed at both the gene and protein levels. We then compared our results to published adenocarcinoma versus squamous cell carcinoma proteomic data that we also processed with MaxQuant. We selected two proteins consistently overexpressed in squamous cell carcinoma in all studies, MCT1 (SLC16A1) and GLUT1 (SLC2A1), for further investigation. We found differential expression of these same proteins at the gene level in our study as well as in other public gene expression datasets. These findings combined with survival analysis of public datasets suggest that MCT1 and GLUT1 may be potential prognostic markers in adenocarcinoma and druggable targets in squamous cell carcinoma. Data are available via ProteomeXchange with identifier PXD002622.</p></div
Comparison with Existing NSCLC Proteomic Datasets.
<p>Mean intensities are given in log<sub>2</sub> scale. (A) The correlations between reporter ion intensities and peptide intensities from Kikuchi et al. were low (R = 0.3, <i>P</i> < 2.2E-16; ρ = 0.26, <i>P</i> < 2.2E-16). (B) As with the Kikuchi et al. data, correlations between reporter ion intensities and peptide intensities from Li et al. were also low (R = 0.23, <i>P</i> < 2.2E-16; ρ = 0.21, <i>P</i> < 2.2E-16).</p
Proteogenomic Data Recapitulates NSCLC Histology.
<p>(A) Clustering of all identified proteins (7,241) from quantitative TMT analysis group tissues by ADC/SCC histology. (B) Clustering of Affymetrix array probes with standard deviation > 1 (11,008 or 18% of total probes) also groups tissues by ADC/SCC histology.</p
Differentially Expressed Proteins from Quantitative TMT Shared Between Proteomic Datasets.
<p>Italics denote entries that were also differentially expressed at the gene level. Log<sub>2</sub> fold-change was calculated as log<sub>2</sub>(SCC/ADC).</p><p>Differentially Expressed Proteins from Quantitative TMT Shared Between Proteomic Datasets.</p