2,053 research outputs found
Exploring the Evolution of Node Neighborhoods in Dynamic Networks
Dynamic Networks are a popular way of modeling and studying the behavior of
evolving systems. However, their analysis constitutes a relatively recent
subfield of Network Science, and the number of available tools is consequently
much smaller than for static networks. In this work, we propose a method
specifically designed to take advantage of the longitudinal nature of dynamic
networks. It characterizes each individual node by studying the evolution of
its direct neighborhood, based on the assumption that the way this neighborhood
changes reflects the role and position of the node in the whole network. For
this purpose, we define the concept of \textit{neighborhood event}, which
corresponds to the various transformations such groups of nodes can undergo,
and describe an algorithm for detecting such events. We demonstrate the
interest of our method on three real-world networks: DBLP, LastFM and Enron. We
apply frequent pattern mining to extract meaningful information from temporal
sequences of neighborhood events. This results in the identification of
behavioral trends emerging in the whole network, as well as the individual
characterization of specific nodes. We also perform a cluster analysis, which
reveals that, in all three networks, one can distinguish two types of nodes
exhibiting different behaviors: a very small group of active nodes, whose
neighborhood undergo diverse and frequent events, and a very large group of
stable nodes
Why Does Technology Advance in Cycles?
Long-run technological progress is cyclical because drastic innovations that introduce new technological opportunity are only profitable at times when repeated incremental innovation has nearly exhausted existing technological opportunity and driven entrepreneurial profit and income growth towards zero. The article presents a ’technological opportunity model’ where endogenous drastic and incremental innovations interact with exogenous discoveries in an idealized metric technology space. New ideas are created by convex combinations of existing ideas. Diminishing technological opportunity results in lower profits and growth, which then makes costly and risky drastic innovations profitable again. This relationship between intense drastic innovation intensity and poor levels of economic growth receives some empirical support.technology; growth; long waves; cycles; techno-logical paradigms; innovations
Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data
B cells develop high affinity receptors during the course of affinity
maturation, a cyclic process of mutation and selection. At the end of affinity
maturation, a number of cells sharing the same ancestor (i.e. in the same
"clonal family") are released from the germinal center, their amino acid
frequency profile reflects the allowed and disallowed substitutions at each
position. These clonal-family-specific frequency profiles, called "substitution
profiles", are useful for studying the course of affinity maturation as well as
for antibody engineering purposes. However, most often only a single sequence
is recovered from each clonal family in a sequencing experiment, making it
impossible to construct a clonal-family-specific substitution profile. Given
the public release of many high-quality large B cell receptor datasets, one may
ask whether it is possible to use such data in a prediction model for
clonal-family-specific substitution profiles. In this paper, we present the
method "Substitution Profiles Using Related Families" (SPURF), a penalized
tensor regression framework that integrates information from a rich assemblage
of datasets to predict the clonal-family-specific substitution profile for any
single input sequence. Using this framework, we show that substitution profiles
from similar clonal families can be leveraged together with simulated
substitution profiles and germline gene sequence information to improve
prediction. We fit this model on a large public dataset and validate the
robustness of our approach on an external dataset. Furthermore, we provide a
command-line tool in an open-source software package
(https://github.com/krdav/SPURF) implementing these ideas and providing easy
prediction using our pre-fit models.Comment: 23 page
Unsupervised Cluster Analysis Reveals Distinct Subtypes of ME/CFS Patients Based on Peak Oxygen Consumption and SF-36 Scores
Biomarker; Cardiopulmonary exercise test; Chronic fatigue syndromeBiomarcador; Prova d'esforç cardiopulmonar; Síndrome de fatiga crònicaBiomarcador; Prueba de esfuerzo cardiopulmonar; Síndrome de fatiga crónicaPurpose
Myalgic encephalomyelitis, commonly referred to as chronic fatigue syndrome (ME/CFS), is a severe, disabling chronic disease and an objective assessment of prognosis is crucial to evaluate the efficacy of future drugs. Attempts are ongoing to find a biomarker to objectively assess the health status of (ME/CFS), patients. This study therefore aims to demonstrate that oxygen consumption is a biomarker of ME/CFS provides a method to classify patients diagnosed with ME/CFS based on their responses to the Short Form-36 (SF-36) questionnaire, which can predict oxygen consumption using cardiopulmonary exercise testing (CPET).
Methods
Two datasets were used in the study. The first contained SF-36 responses from 2,347 validated records of ME/CFS diagnosed participants, and an unsupervised machine learning model was developed to cluster the data. The second dataset was used as a validation set and included the cardiopulmonary exercise test (CPET) results of 239 participants diagnosed with ME/CFS. Participants from this dataset were grouped by peak oxygen consumption according to Weber's classification. The SF-36 questionnaire was correctly completed by only 92 patients, who were clustered using the machine learning model. Two categorical variables were then entered into a contingency table: the cluster with values {0,1} and Weber classification {A, B, C, D} were assigned. Finally, the Chi-square test of independence was used to assess the statistical significance of the relationship between the two parameters.
Findings
The results indicate that the Weber classification is directly linked to the score on the SF-36 questionnaire. Furthermore, the 36-response matrix in the machine learning model was shown to give more reliable results than the subscale matrix (p − value < 0.05) for classifying patients with ME/CFS.
Implications
Low oxygen consumption on CPET can be considered a biomarker in patients with ME/CFS. Our analysis showed a close relationship between the cluster based on their SF-36 questionnaire score and the Weber classification, which was based on peak oxygen consumption during CPET. The dataset for the training model comprised raw responses from the SF-36 questionnaire, which is proven to better preserve the original information, thus improving the quality of the model
Unsupervised cluster analysis reveals distinct subtypes of ME/CFS patients based on peak oxygen consumption and SF-36 scores
PURPOSE: Myalgic encephalomyelitis, commonly referred to as chronic fatigue syndrome (ME/CFS), is a severe, disabling chronic disease and an objective assessment of prognosis is crucial to evaluate the efficacy of future drugs. Attempts are ongoing to find a biomarker to objectively assess the health status of (ME/CFS), patients. This study therefore aims to demonstrate that oxygen consumption is a biomarker of ME/CFS provides a method to classify patients diagnosed with ME/CFS based on their responses to the Short Form-36 (SF-36) questionnaire, which can predict oxygen consumption using cardiopulmonary exercise testing (CPET). METHODS: Two datasets were used in the study. The first contained SF-36 responses from 2,347 validated records of ME/CFS diagnosed participants, and an unsupervised machine learning model was developed to cluster the data. The second dataset was used as a validation set and included the cardiopulmonary exercise test (CPET) results of 239 participants diagnosed with ME/CFS. Participants from this dataset were grouped by peak oxygen consumption according to Weber's classification. The SF-36 questionnaire was correctly completed by only 92 patients, who were clustered using the machine learning model. Two categorical variables were then entered into a contingency table: the cluster with values {0,1} and Weber classification {A, B, C, D} were assigned. Finally, the Chi-square test of independence was used to assess the statistical significance of the relationship between the two parameters. FINDINGS: The results indicate that the Weber classification is directly linked to the score on the SF-36 questionnaire. Furthermore, the 36-response matrix in the machine learning model was shown to give more reliable results than the subscale matrix (p - value < 0.05) for classifying patients with ME/CFS. IMPLICATIONS: Low oxygen consumption on CPET can be considered a biomarker in patients with ME/CFS. Our analysis showed a close relationship between the cluster based on their SF-36 questionnaire score and the Weber classification, which was based on peak oxygen consumption during CPET. The dataset for the training model comprised raw responses from the SF-36 questionnaire, which is proven to better preserve the original information, thus improving the quality of the model
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms
Abstract Analyzing data streams has received considerable attention over the past decades due to the widespread usage of sensors, social media and other streaming data sources. A core research area in this field is stream clustering which aims to recognize patterns in an unordered, infinite and evolving stream of observations. Clustering can be a crucial support in decision making, since it aims for an optimized aggregated representation of a continuous data stream over time and allows to identify patterns in large and high-dimensional data. A multitude of algorithms and approaches has been developed that are able to find and maintain clusters over time in the challenging streaming scenario. This survey explores, summarizes and categorizes a total of 51 stream clustering algorithms and identifies core research threads over the past decades. In particular, it identifies categories of algorithms based on distance thresholds, density grids and statistical models as well as algorithms for high dimensional data. Furthermore, it discusses applications scenarios, available software and how to configure stream clustering algorithms. This survey is considerably more extensive than comparable studies, more up-to-date and highlights how concepts are interrelated and have been developed over time
Debt literacy, financial experiences, and overindebtedness
We analyze a national sample of Americans with respect to their debt literacy, financial experiences, and their judgments about the extent of their indebtedness. Debt literacy is measured by questions testing knowledge of fundamental concepts related to debt and by selfassessed financial knowledge. Financial experiences are the participants’ reported experiences with traditional borrowing, alternative borrowing, and investing activities. Overindebtedness is a self-reported measure. Overall, we find that debt literacy is low: only about one-third of the population seems to comprehend interest compounding or the workings of credit cards. Even after controlling for demographics, we find a strong relationship between debt literacy and both financial experiences and debt loads. Specifically, individuals with lower levels of debt literacy tend to transact in high-cost manners, incurring higher fees and using high-cost borrowing. In applying our results to credit cards, we estimate that as much as one-third of the charges and fees paid by less knowledgeable individuals can be attributed to ignorance. The less knowledgeable also report that their debt loads are excessive or that they are unable to judge their debt position. JEL Classification: D14, D9
- …