572 research outputs found
Fractal geometry of spin-glass models
Stability and diversity are two key properties that living entities share
with spin glasses, where they are manifested through the breaking of the phase
space into many valleys or local minima connected by saddle points. The
topology of the phase space can be conveniently condensed into a tree
structure, akin to the biological phylogenetic trees, whose tips are the local
minima and internal nodes are the lowest-energy saddles connecting those
minima. For the infinite-range Ising spin glass with p-spin interactions, we
show that the average size-frequency distribution of saddles obeys a power law
, where w=w(s) is the number of minima that can be
connected through saddle s, and D is the fractal dimension of the phase space
Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes
BACKGROUND: Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. RESULTS: We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. CONCLUSION: The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable
A stitch in time: Efficient computation of genomic DNA melting bubbles
Background: It is of biological interest to make genome-wide predictions of
the locations of DNA melting bubbles using statistical mechanics models.
Computationally, this poses the challenge that a generic search through all
combinations of bubble starts and ends is quadratic.
Results: An efficient algorithm is described, which shows that the time
complexity of the task is O(NlogN) rather than quadratic. The algorithm
exploits that bubble lengths may be limited, but without a prior assumption of
a maximal bubble length. No approximations, such as windowing, have been
introduced to reduce the time complexity. More than just finding the bubbles,
the algorithm produces a stitch profile, which is a probabilistic graphical
model of bubbles and helical regions. The algorithm applies a probability peak
finding method based on a hierarchical analysis of the energy barriers in the
Poland-Scheraga model.
Conclusions: Exact and fast computation of genomic stitch profiles is thus
feasible. Sequences of several megabases have been computed, only limited by
computer memory. Possible applications are the genome-wide comparisons of
bubbles with promotors, TSS, viral integration sites, and other melting-related
regions.Comment: 16 pages, 10 figure
Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen
The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells
Long term productivity and collaboration in information science
This is an accepted manuscript of an article published by Springer in Scientometrics on 02/07/2016, available online: https://doi.org/10.1007/s11192-016-2061-8
The accepted version of the publication may differ from the final published version.Funding bodies have tended to encourage collaborative research because it is generally more highly cited than sole author research. But higher mean citation for collaborative articles does not imply collaborative researchers are in general more research productive. This article assesses the extent to which research productivity varies with the number of collaborative partners for long term researchers within three Web of Science subject areas: Information Science & Library Science, Communication and Medical Informatics. When using the whole number counting system, researchers who worked in groups of 2 or 3 were generally the most productive, in terms of producing the most papers and citations. However, when using fractional counting, researchers who worked in groups of 1 or 2 were generally the most productive. The findings need to be interpreted cautiously, however, because authors that produce few academic articles within a field may publish in other fields or leave academia and contribute to society in other ways
Why Are Older People More Likely to Vote? The Impact of Ageing on Electoral Turnout in Europe
This article analyses the reasons for higher voting participation among older people in Europe. Over their lifetimes, citizens tend to habituate voting and comply with a growing subjective norm of voting. Furthermore, the average voting participation of older people is influenced by their longer duration of residence, the lack of a mobilising partner, worse physical health and less education, although life experience replaces the function of formal education over a lifetime. Most of these factors are founded on the very nature of human behaviour and the social context of our life course. Thus, they arguably stand outside of the political process and will remain stable into the future
Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data
Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe
Generalizing across stimuli as well as subjects: A non-mathematical tutorial on mixed-effects models
Comparison of transcriptional responses in liver tissue and primary hepatocyte cell cultures after exposure to hexahydro-1, 3, 5-trinitro-1, 3, 5-triazine
BACKGROUND: Cell culture systems are useful in studying toxicological effects of chemicals such as Hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), however little is known as to how accurately isolated cells reflect responses of intact organs. In this work, we compare transcriptional responses in livers of Sprague-Dawley rats and primary hepatocyte cells after exposure to RDX to determine how faithfully the in vitro model system reflects in vivo responses. RESULTS: Expression patterns were found to be markedly different between liver tissue and primary cell cultures before exposure to RDX. Liver gene expression was enriched in processes important in toxicology such as metabolism of amino acids, lipids, aromatic compounds, and drugs when compared to cells. Transcriptional responses in cells exposed to 7.5, 15, or 30 mg/L RDX for 24 and 48 hours were different from those of livers isolated from rats 24 hours after exposure to 12, 24, or 48 mg/Kg RDX. Most of the differentially expressed genes identified across conditions and treatments could be attributed to differences between cells and tissue. Some similarity was observed in RDX effects on gene expression between tissue and cells, but also significant differences that appear to reflect the state of the cell or tissue examined. CONCLUSION: Liver tissue and primary cells express different suites of genes that suggest they have fundamental differences in their cell physiology. Expression effects related to RDX exposure in cells reflected a fraction of liver responses indicating that care must be taken in extrapolating from primary cells to whole animal organ toxicity effects
Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories
<p>Abstract</p> <p>Background</p> <p>The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.</p> <p>Results</p> <p>A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.</p> <p>Conclusion</p> <p>The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.</p
- …
