2,678 research outputs found

    Contributions to Collective Dynamical Clustering-Modeling of Discrete Time Series

    Get PDF
    The analysis of sequential data is important in business, science, and engineering, for tasks such as signal processing, user behavior mining, and commercial transactions analysis. In this dissertation, we build upon the Collective Dynamical Modeling and Clustering (CDMC) framework for discrete time series modeling, by making contributions to clustering initialization, dynamical modeling, and scaling. We first propose a modified Dynamic Time Warping (DTW) approach for clustering initialization within CDMC. The proposed approach provides DTW metrics that penalize deviations of the warping path from the path of constant slope. This reduces over-warping, while retaining the efficiency advantages of global constraint approaches, and without relying on domain dependent constraints. Second, we investigate the use of semi-Markov chains as dynamical models of temporal sequences in which state changes occur infrequently. Semi-Markov chains allow explicitly specifying the distribution of state visit durations. This makes them superior to traditional Markov chains, which implicitly assume an exponential state duration distribution. Third, we consider convergence properties of the CDMC framework. We establish convergence by viewing CDMC from an Expectation Maximization (EM) perspective. We investigate the effect on the time to convergence of our efficient DTW-based initialization technique and selected dynamical models. We also explore the convergence implications of various stopping criteria. Fourth, we consider scaling up CDMC to process big data, using Storm, an open source distributed real-time computation system that supports batch and distributed data processing. We performed experimental evaluation on human sleep data and on user web navigation data. Our results demonstrate the superiority of the strategies introduced in this dissertation over state-of-the-art techniques in terms of modeling quality and efficiency

    Analyzing the Correlations between the Uninsured and Diabetes Prevalence Rates in Geographic Regions in the United States

    Get PDF
    The increasing prevalence of diagnosed diabetes has drawn attentions of researchers in recently years. Research has been done in finding the correlations between diabetes prevalence with socioeconomic factors, obesity, social behaviors and so on. Since 2010, diabetes preventive services have been covered under health insurance plans in order to reduce diabetes burden and control the increasing of diabetes prevalence. In this study, a hierarchical clustering model is proposed by using Expectation-Maximization algorithm to investigate the correlations between the uninsured and diabetes prevalence rates in 3142 counties in United States for years from 2009 to 2013. The results identified geographic disparities in the uninsured and diabetes prevalence rates of individual years and over consecutive years

    Mobile application usage prediction through context-based learning

    Get PDF
    The purchase and download of new applications on all types of smartphones and tablet computers has become increasingly popular. On each mobile device, many applications are installed, often resulting in crowded icon-based interfaces. In this paper, we present a framework for the prediction of a user's future mobile application usage behavior. On the mobile device, the framework continuously monitors the user's previous use of applications together with several context parameters such as speed and location. Based on the retrieved information, the framework automatically deduces application usage patterns. These patterns define a correlation between a used application and the monitored context information or between different applications. Furthermore, by combining several context parameters, context profiles are automatically generated. These profiles typically match with real life situations such as 'at home' or 'on the train' and are used to delimit the number of possible patterns, increasing both the positive prediction rate and the scalability of the system. A concept demonstrator for Android OS was developed and the implemented algorithms were evaluated in a detailed simulation setup. It is shown that the developed algorithms perform very well with a true positive rate of up to 90% for the considered evaluation scenarios

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    On the interpretation of differences between groups for compositional data

    Get PDF
    Social polices are designed using information collected in surveys; such as the Catalan TimeUse survey. Accurate comparisons of time use data among population groups are commonlyanalysed using statistical methods. The total daily time expended on different activities by asingle person is equal to 24 hours. Because this type of data are compositional, its sample spacehas particular properties that statistical methods should respect. The critical points required tointerpret differences between groups are provided and described in terms of log-ratio methods.These techniques facilitate the interpretation of the relative differences detected in multivariateand univariate analysis

    Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing

    Get PDF
    Two bottlenecks impeding the genetic analysis of complex traits in rodents are access to mapping populations able to deliver gene-level mapping resolution and the need for population-specific genotyping arrays and haplotype reference panels. Here we combine low-coverage (0.15Ă—) sequencing with a new method to impute the ancestral haplotype space in 1,887 commercially available outbred mice. We mapped 156 unique quantitative trait loci for 92 phenotypes at a 5% false discovery rate. Gene-level mapping resolution was achieved at about one-fifth of the loci, implicating Unc13c and Pgc1a at loci for the quality of sleep, Adarb2 for home cage activity, Rtkn2 for intensity of reaction to startle, Bmp2 for wound healing, Il15 and Id2 for several T cell measures and Prkca for bone mineral content. These findings have implications for diverse areas of mammalian biology and demonstrate how genome-wide association studies can be extended via low-coverage sequencing to species with highly recombinant outbred populations

    Disruption to control network function correlates with altered dynamic connectivity in the wider autism spectrum.

    Get PDF
    Autism is a common developmental condition with a wide, variable range of co-occurring neuropsychiatric symptoms. Contrasting with most extant studies, we explored whole-brain functional organization at multiple levels simultaneously in a large subject group reflecting autism's clinical diversity, and present the first network-based analysis of transient brain states, or dynamic connectivity, in autism. Disruption to inter-network and inter-system connectivity, rather than within individual networks, predominated. We identified coupling disruption in the anterior-posterior default mode axis, and among specific control networks specialized for task start cues and the maintenance of domain-independent task positive status, specifically between the right fronto-parietal and cingulo-opercular networks and default mode network subsystems. These appear to propagate downstream in autism, with significantly dampened subject oscillations between brain states, and dynamic connectivity configuration differences. Our account proposes specific motifs that may provide candidates for neuroimaging biomarkers within heterogeneous clinical populations in this diverse condition

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

    The Construction of Semantic Memory: Grammar-Based Representations Learned from Relational Episodic Information

    Get PDF
    After acquisition, memories underlie a process of consolidation, making them more resistant to interference and brain injury. Memory consolidation involves systems-level interactions, most importantly between the hippocampus and associated structures, which takes part in the initial encoding of memory, and the neocortex, which supports long-term storage. This dichotomy parallels the contrast between episodic memory (tied to the hippocampal formation), collecting an autobiographical stream of experiences, and semantic memory, a repertoire of facts and statistical regularities about the world, involving the neocortex at large. Experimental evidence points to a gradual transformation of memories, following encoding, from an episodic to a semantic character. This may require an exchange of information between different memory modules during inactive periods. We propose a theory for such interactions and for the formation of semantic memory, in which episodic memory is encoded as relational data. Semantic memory is modeled as a modified stochastic grammar, which learns to parse episodic configurations expressed as an association matrix. The grammar produces tree-like representations of episodes, describing the relationships between its main constituents at multiple levels of categorization, based on its current knowledge of world regularities. These regularities are learned by the grammar from episodic memory information, through an expectation-maximization procedure, analogous to the inside–outside algorithm for stochastic context-free grammars. We propose that a Monte-Carlo sampling version of this algorithm can be mapped on the dynamics of “sleep replay” of previously acquired information in the hippocampus and neocortex. We propose that the model can reproduce several properties of semantic memory such as decontextualization, top-down processing, and creation of schemata

    Reviewing the connection between speech and obstructive sleep apnea

    Full text link
    The electronic version of this article is the complete one and can be found online at: http://link.springer.com/article/10.1186/s12938-016-0138-5Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. Methods: A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea–hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients’ condition. We first evaluate AHI prediction using state-of-theart speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. Results: The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. Conclusion: The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.Authors thank to Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study. This research was partly supported by the Ministry of Economy and Competitiveness of Spain and the European Union (FEDER) under project "CMC-V2", TEC2012-37585-C02
    • …
    corecore