409 research outputs found
Reconstructing a Z' Lagrangian using the LHC and low-energy data
We study the potential of the LHC and future low-energy experiments to
precisely measure the underlying model parameters of a new Z' boson. We
emphasize the complimentary information obtained from both on- and off-peak LHC
dilepton data, from the future Q-weak measurement of the weak charge of the
proton, and from a proposed measurement of parity violation in low-energy
Moller scattering. We demonstrate the importance of off-peak LHC data and
Q-weak for removing sign degeneracies between Z' couplings that occur if only
on-peak LHC data is studied. A future precision measurement of low-energy
Moller scattering can resolve a scaling degeneracy between quark and lepton
couplings that remains after analyzing LHC dilepton data, permitting an
extraction of the individual Z' couplings rather than combinations of them. We
study how precisely Z' properties can be extracted for LHC integrated
luminosities ranging from a few inverse femtobarns to super-LHC values of an
inverse attobarn. For the several example cases studied with M_Z'=1.5 TeV, we
find that coupling combinations can be determined with relative uncertainties
reaching 30% with 30 fb^-1 of integrated luminosity, while 50% is possible with
10 fb^-1. With SLHC luminosities of 1 ab^-1, we find that products of quark and
lepton couplings can be probed to 10%.Comment: 36 pages, 17 figure
Transcriptional Analysis of Walleye Dermal Sarcoma Virus (WDSV)
AbstractWalleye dermal sarcoma virus (WDSV) is a complex retrovirus associated with dermal sarcomas of walleye that develop and regress on a seasonal basis. WDSV contains, in addition togag, pol,andenv,three open reading frames (ORFs) designated ORF A, ORF B, and ORF C. The polymerase chain reaction technique was used to amplify and clone cDNAs representing subgenomic viral mRNAs isolated from developing (fall) and regressing (spring) tumors. Nine different singly or multiply spliced viral transcripts were identified and all were found to utilize a common 5′ leader sequence. This leader sequence is spliced to thepol/envjunction or downstream ofenvto generate singly spliced transcripts. Multiply spliced transcripts contain the 5′ leader, the pol/env junction, and sequences derived from the 3′ end of the genome. One multiply spliced transcript was isolated with the potential to encode the full-length ORF A protein. In addition, WDSV produced mRNAs that utilize alternative splice acceptor sites which would allow synthesis of five variant forms of the ORF A protein. In contrast, the ORF B protein is postulated to arise from a singly spliced transcript with the potential to encode the entire open reading frame. Spliced subgenomic transcripts representing ORF C mRNAs were not identified, suggesting that ORF C may be encoded from the full-length viral genomic transcript. We estimate that at least a 100-fold lower amount of the accessory/regulatory subgenomic transcripts exists in developing vs regressing tumors. These results demonstrate that WDSV undergoes an elaborate pattern of mRNA splicing similar to that of other complex retroviruses
Recommended from our members
A multi-tissue gene expression dataset for hibernating brown bears
ObjectivesComplex physiological adaptations often involve the coordination of molecular responses across multiple tissues. Establishing transcriptomic resources for non-traditional model organisms with phenotypes of interest can provide a foundation for understanding the genomic basis of these phenotypes, and the degree to which these resemble, or contrast, those of traditional model organisms. Here, we present a one-of-a-kind gene expression dataset generated from multiple tissues of two hibernating brown bears (Ursus arctos).Data descriptionThis dataset is comprised of 26 samples collected from 13 tissues of two hibernating brown bears. These samples were collected opportunistically and are typically not possible to attain, resulting in a highly unique and valuable gene expression dataset. In combination with previously published datasets, this new transcriptomic resource will facilitate detailed investigation of hibernation physiology in bears, and the potential to translate aspects of this biology to treat human disease
Knowledge-based gene expression classification via matrix factorization
Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks.
Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas
Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories
<p>Abstract</p> <p>Background</p> <p>The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.</p> <p>Results</p> <p>A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.</p> <p>Conclusion</p> <p>The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.</p
Impact of the spotted microarray preprocessing method on fold-change compression and variance stability
<p>Abstract</p> <p>Background</p> <p>The standard approach for preprocessing spotted microarray data is to subtract the local background intensity from the spot foreground intensity, to perform a log2 transformation and to normalize the data with a global median or a lowess normalization. Although well motivated, standard approaches for background correction and for transformation have been widely criticized because they produce high variance at low intensities. Whereas various alternatives to the standard background correction methods and to log2 transformation were proposed, impacts of both successive preprocessing steps were not compared in an objective way.</p> <p>Results</p> <p>In this study, we assessed the impact of eight preprocessing methods combining four background correction methods and two transformations (the log2 and the glog), by using data from the MAQC study. The current results indicate that most preprocessing methods produce fold-change compression at low intensities. Fold-change compression was minimized using the Standard and the Edwards background correction methods coupled with a log2 transformation. The drawback of both methods is a high variance at low intensities which consequently produced poor estimations of the p-values. On the other hand, effective stabilization of the variance as well as better estimations of the p-values were observed after the glog transformation.</p> <p>Conclusion</p> <p>As both fold-change magnitudes and p-values are important in the context of microarray class comparison studies, we therefore recommend to combine the Edwards correction with a hybrid transformation method that uses the log2 transformation to estimate fold-change magnitudes and the glog transformation to estimate p-values.</p
GeneSigDB: a manually curated database and resource for analysis of gene expression signatures
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org
Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement
New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Project No. TEC2008-06382/TEC.Publicad
Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
<p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
<p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p
- …