2,156 research outputs found
RCytoscape: Tools for Exploratory Network Analysis
Background: Biomolecular pathways and networks are dynamic and complex, and the perturbations to them which cause disease are often multiple, heterogeneous and contingent. Pathway and network visualizations, rendered on a computer or published on paper, however, tend to be static, lacking in detail, and ill-equipped to explore the variety and quantities of data available today, and the complex causes we seek to understand.
Results: RCytoscape integrates R (an open-ended programming environment rich in statistical power and datahandling facilities) and Cytoscape (powerful network visualization and analysis software). RCytoscape extends Cytoscape\u27s functionality beyond what is possible with the Cytoscape graphical user interface. To illustrate the power of RCytoscape, a portion of the Glioblastoma multiforme (GBM) data set from the Cancer Genome Atlas (TCGA) is examined. Network visualization reveals previously unreported patterns in the data suggesting heterogeneous signaling mechanisms active in GBM Proneural tumors, with possible clinical relevance.
Conclusions: Progress in bioinformatics and computational biology depends upon exploratory and confirmatory data analysis, upon inference, and upon modeling. These activities will eventually permit the prediction and control of complex biological systems. Network visualizations -- molecular maps -- created from an open-ended programming environment rich in statistical power and data-handling facilities, such as RCytoscape, will play an essential role in this progression
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
Data science for buildings, a multi-scale approach bridging occupants to smart-city energy planning
In a context of global carbon emission reduction goals, buildings have been identified to detain valuable energy-saving abilities. With the exponential increase of smart, connected building automation systems, massive amounts of data are now accessible for analysis. These coupled with powerful data science methods and machine learning algorithms present a unique opportunity to identify untapped energy-saving potentials from field information, and effectively turn buildings into active assets of the built energy infrastructure.However, the diversity of building occupants, infrastructures, and the disparities in collected information has produced disjointed scales of analytics that make it tedious for approaches to scale and generalize over the building stock.This coupled with the lack of standards in the sector has hindered the broader adoption of data science practices in the field, and engendered the following questioning:How can data science facilitate the scaling of approaches and bridge disconnected spatiotemporal scales of the built environment to deliver enhanced energy-saving strategies?This thesis focuses on addressing this interrogation by investigating data-driven, scalable, interpretable, and multi-scale approaches across varying types of analytical classes. The work particularly explores descriptive, predictive, and prescriptive analytics to connect occupants, buildings, and urban energy planning together for improved energy performances.First, a novel multi-dimensional data-mining framework is developed, producing distinct dimensional outlines supporting systematic methodological approaches and refined knowledge discovery. Second, an automated building heat dynamics identification method is put forward, supporting large-scale thermal performance examination of buildings in a non-intrusive manner. The method produced 64\% of good quality model fits, against 14\% close, and 22\% poor ones out of 225 Dutch residential buildings. %, which were open-sourced in the interest of developing benchmarks. Third, a pioneering hierarchical forecasting method was designed, bridging individual and aggregated building load predictions in a coherent, data-efficient fashion. The approach was evaluated over hierarchies of 37, 140, and 383 nodal elements and showcased improved accuracy and coherency performances against disjointed prediction systems.Finally, building occupants and urban energy planning strategies are investigated under the prism of uncertainty. In a neighborhood of 41 Dutch residential buildings, occupants were determined to significantly impact optimal energy community designs in the context of weather and economic uncertainties.Overall, the thesis demonstrated the added value of multi-scale approaches in all analytical classes while fostering best data-science practices in the sector from benchmarks and open-source implementations
Analysis of Student Behaviour in Habitable Worlds Using Continuous Representation Visualization
We introduce a novel approach to visualizing temporal clickstream behaviour
in the context of a degree-satisfying online course, Habitable Worlds, offered
through Arizona State University. The current practice for visualizing
behaviour within a digital learning environment has been to generate plots
based on hand engineered or coded features using domain knowledge. While this
approach has been effective in relating behaviour to known phenomena, features
crafted from domain knowledge are not likely well suited to make unfamiliar
phenomena salient and thus can preclude discovery. We introduce a methodology
for organically surfacing behavioural regularities from clickstream data,
conducting an expert in-the-loop hyperparameter search, and identifying
anticipated as well as newly discovered patterns of behaviour. While these
visualization techniques have been used before in the broader machine learning
community to better understand neural networks and relationships between word
vectors, we apply them to online behavioural learner data and go a step
further; exploring the impact of the parameters of the model on producing
tangible, non-trivial observations of behaviour that are suggestive of
pedagogical improvement to the course designers and instructors. The
methodology introduced in this paper led to an improved understanding of
passing and non-passing student behaviour in the course and is widely
applicable to other datasets of clickstream activity where investigators and
stakeholders wish to organically surface principal patterns of behaviour
Trends in student behavior in online courses
Learning management systems provide an easy and effective means of access to learning materials. Students’ access to course material is logged and the amount of interaction is assumed to be a measure of student engagement within the course. In previous research, typically frequencies of student activities have been used, but this disregards any temporal information. Here, we analyze the amount of student activity over time during courses. Based on activity data over 11 online courses, we cluster students who show similar behavior over time. This results in three different groups: a large group of students who are mostly inactive; another group of students who are very active throughout the course; and a group of students who start out being active, but their activity diminishes throughout the course. These groups of students show different performance. Overall, more active students yield better results. In addition to these general trends, we identified courses in which alternative trends can be found, such as a group of students who become more active during the course. This shows that student behavior is more complex than can be identified from an individual course and more research into patterns of learning activities in multiple courses is essential
Measurement invariance in the social sciences:Historical development, methodological challenges, state of the art, and future perspectives
This review summarizes the current state of the art of statistical and (survey) methodological research on measurement (non)invariance, which is considered a core challenge for the comparative social sciences. After outlining the historical roots, conceptual details, and standard procedures for measurement invariance testing, the paper focuses in particular on the statistical developments that have been achieved in the last 10 years. These include Bayesian approximate measurement invariance, the alignment method, measurement invariance testing within the multilevel modeling framework, mixture multigroup factor analysis, the measurement invariance explorer, and the response shift-true change decomposition approach. Furthermore, the contribution of survey methodological research to the construction of invariant measurement instruments is explicitly addressed and highlighted, including the issues of design decisions, pretesting, scale adoption, and translation. The paper ends with an outlook on future research perspectives.</p
Recommended from our members
Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics.
Cancer is driven by genomic alterations, but the processes causing this disease are largely performed by proteins. However, proteins are harder and more expensive to measure than genes and transcripts. To catalyze developments of methods to infer protein levels from other omics measurements, we leveraged crowdsourcing via the NCI-CPTAC DREAM proteogenomic challenge. We asked for methods to predict protein and phosphorylation levels from genomic and transcriptomic data in cancer patients. The best performance was achieved by an ensemble of models, including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types, and phosphosite proximity for phosphorylation prediction. Proteins from metabolic pathways and complexes were the best and worst predicted, respectively. The performance of even the best-performing model was modest, suggesting that many proteins are strongly regulated through translational control and degradation. Our results set a reference for the limitations of computational inference in proteogenomics. A record of this paper's transparent peer review process is included in the Supplemental Information
Measurement invariance in the social sciences: Historical development, methodological challenges, state of the art, and future perspectives
This review summarizes the current state of the art of statistical and (survey) methodological research on measurement (non)invariance, which is considered a core challenge for the comparative social sciences. After outlining the historical roots, conceptual details, and standard procedures for measurement invariance testing, the paper focuses in particular on the statistical developments that have been achieved in the last 10 years. These include Bayesian approximate measurement invariance, the alignment method, measurement invariance testing within the multilevel modeling framework, mixture multigroup factor analysis, the measurement invariance explorer, and the response shift-true change decomposition approach. Furthermore, the contribution of survey methodological research to the construction of invariant measurement instruments is explicitly addressed and highlighted, including the issues of design decisions, pretesting, scale adoption, and translation. The paper ends with an outlook on future research perspectives
- …