6,019 research outputs found
Bayesian spline-based hidden Markov models with applications to actimetry data and sleep analysis
B-spline-based hidden Markov models employ B-splines to specify the emission distributions, offering a more flexible modeling approach to data than conventional parametric HMMs. We introduce a Bayesian framework for inference, enabling the simultaneous estimation of all unknown model parameters including the number of states. A parsimonious knot configuration of the B-splines is identified by the use of a trans-dimensional Markov chain sampling algorithm, while model selection regarding the number of states can be performed based on the marginal likelihood within a parallel sampling framework. Using extensive simulation studies, we demonstrate the superiority of our methodology over alternative approaches as well as its robustness and scalability. We illustrate the explorative use of our methods for data on activity in animals, that is whitetip-sharks. The flexibility of our Bayesian approach also facilitates the incorporation of more realistic assumptions and we demonstrate this by developing a novel hierarchical conditional HMM to analyse human activity for circadian and sleep modeling. Supplementary materials for this article are available online
Integrating expert-based objectivist and nonexpert-based subjectivist paradigms in landscape assessment
This thesis explores the integration of objective and subjective measures of landscape aesthetics, particularly focusing on crowdsourced geo-information. It addresses the increasing importance of considering public perceptions in national landscape governance, in line with the European Landscape Convention's emphasis on public involvement. Despite this, national landscape assessments often remain expert-centric and top-down, facing challenges in resource constraints and limited public engagement. The thesis leverages Web 2.0 technologies and crowdsourced geographic information, examining correlations between expert-based metrics of landscape quality and public perceptions. The Scenic-Or-Not initiative for Great Britain, GIS-based Wildness spatial layers, and LANDMAP dataset for Wales serve as key datasets for analysis.
The research investigates the relationships between objective measures of landscape wildness quality and subjective measures of aesthetics. Multiscale geographically weighted regression (MGWR) reveals significant correlations, with different wildness components exhibiting varying degrees of association. The study suggests the feasibility of incorporating wildness and scenicness measures into formal landscape aesthetic assessments. Comparing expert and public perceptions, the research identifies preferences for water-related landforms and variations in upland and lowland typologies. The study emphasizes the agreement between experts and non-experts on extreme scenic perceptions but notes discrepancies in mid-spectrum landscapes. To overcome limitations in systematic landscape evaluations, an integrative approach is proposed. Utilizing XGBoost models, the research predicts spatial patterns of landscape aesthetics across Great Britain, based on the Scenic-Or-Not initiatives, Wildness spatial layers, and LANDMAP data. The models achieve comparable accuracy to traditional statistical models, offering insights for Landscape Character Assessment practices and policy decisions. While acknowledging data limitations and biases in crowdsourcing, the thesis discusses the necessity of an aggregation strategy to manage computational challenges. Methodological considerations include addressing the modifiable areal unit problem (MAUP) associated with aggregating point-based observations. The thesis comprises three studies published or submitted for publication, each contributing to the understanding of the relationship between objective and subjective measures of landscape aesthetics. The concluding chapter discusses the limitations of data and methods, providing a comprehensive overview of the research
Deep generative models for network data synthesis and monitoring
Measurement and monitoring are fundamental tasks in all networks, enabling the down-stream management and optimization of the network.
Although networks inherently
have abundant amounts of monitoring data, its access and effective measurement is
another story. The challenges exist in many aspects. First, the inaccessibility of network monitoring data for external users, and it is hard to provide a high-fidelity dataset
without leaking commercial sensitive information. Second, it could be very expensive
to carry out effective data collection to cover a large-scale network system, considering the size of network growing, i.e., cell number of radio network and the number of
flows in the Internet Service Provider (ISP) network. Third, it is difficult to ensure fidelity and efficiency simultaneously in network monitoring, as the available resources
in the network element that can be applied to support the measurement function are
too limited to implement sophisticated mechanisms. Finally, understanding and explaining the behavior of the network becomes challenging due to its size and complex
structure. Various emerging optimization-based solutions (e.g., compressive sensing)
or data-driven solutions (e.g. deep learning) have been proposed for the aforementioned challenges. However, the fidelity and efficiency of existing methods cannot yet
meet the current network requirements.
The contributions made in this thesis significantly advance the state of the art in
the domain of network measurement and monitoring techniques. Overall, we leverage
cutting-edge machine learning technology, deep generative modeling, throughout the
entire thesis. First, we design and realize APPSHOT , an efficient city-scale network
traffic sharing with a conditional generative model, which only requires open-source
contextual data during inference (e.g., land use information and population distribution). Second, we develop an efficient drive testing system â GENDT, based on generative model, which combines graph neural networks, conditional generation, and quantified model uncertainty to enhance the efficiency of mobile drive testing. Third, we
design and implement DISTILGAN, a high-fidelity, efficient, versatile, and real-time
network telemetry system with latent GANs and spectral-temporal networks. Finally,
we propose SPOTLIGHT , an accurate, explainable, and efficient anomaly detection system of the Open RAN (Radio Access Network) system. The lessons learned through
this research are summarized, and interesting topics are discussed for future work in
this domain. All proposed solutions have been evaluated with real-world datasets and
applied to support different applications in real systems
Recommended from our members
Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (Nâ=â5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects
Capsule networks with residual pose routing
Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due to the complex capsule routing algorithms. In this article, we present a simple yet effective capsule routing algorithm, which is presented by a residual pose routing. Specifically, the higher-layer capsule pose is achieved by an identity mapping on the adjacently lower-layer capsule pose. Such simple residual pose routing has two advantages: 1) reducing the routing computation complexity and 2) avoiding gradient vanishing due to its residual learning framework. On top of that, we explicitly reformulate the capsule layers by building a residual pose block. Stacking multiple such blocks results in a deep residual CapsNets (ResCaps) with a ResNet-like architecture. Results on MNIST, AffNIST, SmallNORB, and CIFAR-10/100 show the effectiveness of ResCaps for image classification. Furthermore, we successfully extend our residual pose routing to large-scale real-world applications, including 3-D object reconstruction and classification, and 2-D saliency dense prediction. The source code has been released on https://github.com/liuyi1989/ResCaps
Genomic resources for a historical collection of cultivated two-row European spring barley genotypes
Barley genomic resources are increasing rapidly, with the publication of a barley pangenome as one of the latest developments. Two-row spring barley cultivars are intensely studied as they are the source of high-quality grain for malting and distilling. Here we provide data from a European two-row spring barley population containing 209 different genotypes registered for the UK market between 1830 to 2014. The dataset encompasses RNA-sequencing data from six different tissues across a range of barley developmental stages, phenotypic datasets from two consecutive years of field-grown trials in the United Kingdom, Germany and the USA; and whole genome shotgun sequencing from all cultivars, which was used to complement the RNA-sequencing data for variant calling. The outcomes are a filtered SNP marker file, a phenotypic database and a large gene expression dataset providing a comprehensive resource which allows for downstream analyses like genome wide association studies or expression associations.</p
Online semi-supervised learning in non-stationary environments
Existing Data Stream Mining (DSM) algorithms assume the availability of labelled and
balanced data, immediately or after some delay, to extract worthwhile knowledge from the
continuous and rapid data streams. However, in many real-world applications such as
Robotics, Weather Monitoring, Fraud Detection Systems, Cyber Security, and Computer
Network Traffic Flow, an enormous amount of high-speed data is generated by Internet of
Things sensors and real-time data on the Internet. Manual labelling of these data streams
is not practical due to time consumption and the need for domain expertise. Another
challenge is learning under Non-Stationary Environments (NSEs), which occurs due to
changes in the data distributions in a set of input variables and/or class labels. The problem
of Extreme Verification Latency (EVL) under NSEs is referred to as Initially Labelled Non-Stationary Environment (ILNSE). This is a challenging task because the learning algorithms
have no access to the true class labels directly when the concept evolves. Several approaches
exist that deal with NSE and EVL in isolation. However, few algorithms address both issues
simultaneously. This research directly responds to ILNSEâs challenge in proposing two
novel algorithms âPredictor for Streaming Data with Scarce Labelsâ (PSDSL) and
Heterogeneous Dynamic Weighted Majority (HDWM) classifier. PSDSL is an Online Semi-Supervised Learning (OSSL) method for real-time DSM and is closely related to label
scarcity issues in online machine learning.
The key capabilities of PSDSL include learning from a small amount of labelled data in an
incremental or online manner and being available to predict at any time. To achieve this,
PSDSL utilises both labelled and unlabelled data to train the prediction models, meaning it
continuously learns from incoming data and updates the model as new labelled or
unlabelled data becomes available over time. Furthermore, it can predict under NSE
conditions under the scarcity of class labels. PSDSL is built on top of the HDWM classifier,
which preserves the diversity of the classifiers. PSDSL and HDWM can intelligently switch
and adapt to the conditions. The PSDSL adapts to learning states between self-learning,
micro-clustering and CGC, whichever approach is beneficial, based on the characteristics of
the data stream. HDWM makes use of âseedâ learners of different types in an ensemble to
maintain its diversity. The ensembles are simply the combination of predictive models
grouped to improve the predictive performance of a single classifier.
PSDSL is empirically evaluated against COMPOSE, LEVELIW, SCARGC and MClassification
on benchmarks, NSE datasets as well as Massive Online Analysis (MOA) data streams and real-world datasets. The results showed that PSDSL performed significantly better than
existing approaches on most real-time data streams including randomised data instances.
PSDSL performed significantly better than âStaticâ i.e. the classifier is not updated after it is
trained with the first examples in the data streams. When applied to MOA-generated data
streams, PSDSL ranked highest (1.5) and thus performed significantly better than SCARGC,
while SCARGC performed the same as the Static. PSDSL achieved better average prediction
accuracies in a short time than SCARGC.
The HDWM algorithm is evaluated on artificial and real-world data streams against existing
well-known approaches such as the heterogeneous WMA and the homogeneous Dynamic
DWM algorithm. The results showed that HDWM performed significantly better than WMA
and DWM. Also, when recurring concept drifts were present, the predictive performance of
HDWM showed an improvement over DWM. In both drift and real-world streams,
significance tests and post hoc comparisons found significant differences between
algorithms, HDWM performed significantly better than DWM and WMA when applied to
MOA data streams and 4 real-world datasets Electric, Spam, Sensor and Forest cover. The
seeding mechanism and dynamic inclusion of new base learners in the HDWM algorithms
benefit from the use of both forgetting and retaining the models. The algorithm also
provides the independence of selecting the optimal base classifier in its ensemble depending
on the problem.
A new approach, Envelope-Clustering is introduced to resolve the cluster overlap conflicts
during the cluster labelling process. In this process, PSDSL transforms the centroidsâ
information of micro-clusters into micro-instances and generates new clusters called
Envelopes. The nearest envelope clusters assist the conflicted micro-clusters and
successfully guide the cluster labelling process after the concept drifts in the absence of true
class labels. PSDSL has been evaluated on real-world problem âkeystroke dynamicsâ, and
the results show that PSDSL achieved higher prediction accuracy (85.3%) and SCARGC
(81.6%), while the Static (49.0%) significantly degrades the performance due to changes in
the users typing pattern. Furthermore, the predictive accuracies of SCARGC are found
highly fluctuated between (41.1% to 81.6%) based on different values of parameter âkâ
(number of clusters), while PSDSL automatically determine the best values for this
parameter
Life on a scale:Deep brain stimulation in anorexia nervosa
Anorexia nervosa (AN) is a severe psychiatric disorder marked by low body weight, body image abnormalities, and anxiety and shows elevated rates of morbidity, comorbidity and mortality. Given the limited availability of evidence-based treatments, there is an urgent need to investigate new therapeutic options that are informed by the disorderâs underlying neurobiological mechanisms. This thesis represents the first study in the Netherlands and one of a limited number globally to evaluate the efficacy, safety, and tolerability of deep brain stimulation (DBS) in the treatment of AN. DBS has the advantage of being both reversible and adjustable. Beyond assessing the primary impact of DBS on body weight, psychological parameters, and quality of life, this research is novel in its comprehensive approach. We integrated evaluations of efficacy with critical examinations of the functional impact of DBS in AN, including fMRI, electroencephalography EEG, as well as endocrinological and metabolic assessments. Furthermore, this work situates AN within a broader theoretical framework, specifically focusing on its manifestation as a form of self-destructive behavior. Finally, we reflect on the practical, ethical and philosophical aspects of conducting an experimental, invasive procedure in a vulnerable patient group. This thesis deepens our understanding of the neurobiological underpinnings of AN and paves the way for future research and potential clinical applications of DBS in the management of severe and enduring AN
Socioeconomic implications of adverse birth outcomes
In the following chapters, I present research on potential risk factors for adverse birth outcomes and their implications for the socioeconomic status of the newborn in later life. Beyond that, I study the economic consequences for the parents of a newborn child with adverse birth outcomes. Except for the second chapter, all chapters have an applied econometric approach and statistically analyse observational data from Germany and the United Kingdom. The analyses are interdisciplinary and focus on the explanation of them assumptions required to causally interpret the results and address the limitations of the analyses.New Opportunities for Research Funding Agency Cooperation in Europe (NORFACE)/Life Course Dynamics after Preterm Birth: Protective Factors for Social and Educational Transitions, Health, and Prosperity (PremLife)/462-16-040/E
- âŠ