10 research outputs found

    Distributed averaging for accuracy prediction in networked systems

    Full text link
    Distributed averaging is among the most relevant cooperative control problems, with applications in sensor and robotic networks, distributed signal processing, data fusion, and load balancing. Consensus and gossip algorithms have been investigated and successfully deployed in multi-agent systems to perform distributed averaging in synchronous and asynchronous settings. This study proposes a heuristic approach to estimate the convergence rate of averaging algorithms in a distributed manner, relying on the computation and propagation of local graph metrics while entailing simple data elaboration and small message passing. The protocol enables nodes to predict the time (or the number of interactions) needed to estimate the global average with the desired accuracy. Consequently, nodes can make informed decisions on their use of measured and estimated data while gaining awareness of the global structure of the network, as well as their role in it. The study presents relevant applications to outliers identification and performance evaluation in switching topologies

    Large-scale assessment of mobile crowdsensed data: a case study

    Get PDF
    Mobile crowdsensing (MCS) is a well-established paradigm that leverages mobile devices’ ubiquitous nature and processing capabilities for large-scale data collection to monitor phenomena of common interest. Crowd-powered data collection is significantly faster and more cost-effective than traditional methods. However, it poses challenges in assessing the accuracy and extracting information from large volumes of user-generated data. SmartRoadSense (SRS) is an MCS technology that utilises sensors embedded in mobile phones to monitor the quality of road surfaces by computing a crowdsensed road roughness index (referred to as PPE). The present work performs statistical modelling of PPE to analyse its distribution across the road network and elucidate how it can be efficiently analysed and interpreted. Joint statistical analysis of open datasets is then carried out to investigate the effect of both internal and external road features on PPE . Several road properties affecting PPE as predicted are identified, providing evidence that SRS can be effectively applied to assess road quality conditions. Finally, the effect of road category and the speed limit on the mean and standard deviation of PPE is evaluated, incorporating previous results on the relationship between vehicle speed and PPE . These results enable more effective and confident use of the SRS platform and its data to help inform road construction and renovation decisions, especially where a lack of resources limits the use of conventional approaches. The work also exemplifies how crowdsensing technologies can benefit from open data integration and highlights the importance of making coherent, comprehensive, and well-structured open datasets available to the public

    Exploring Machine Learning for Untargeted Metabolomics Using Molecular Fingerprints

    Get PDF
    Background Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. Methods This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. Results The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. Conclusion In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models

    Hybrid Personal Medical Digital Assistant Agents

    No full text
    Autonomous intelligent systems are beginning to impact clinical practice as personal medical assistant agents, by leveraging experts’ knowledge when needed and exploiting the vast amount of patient data available to clinicians. However, these approaches are seldom integrated. In this paper, we propose an integrated hybrid agent architecture that combines symbolic reasoning with sub-symbolic, data-driven models. Using the PIMA dataset, we demonstrate that this hybrid approach enhances the performance of both approaches when used alone. Specifically, we show that integrating a logical agent, which uses predefined expert knowledge plans, with rules obtained by symbolic knowledge extraction from machine learning models trained on historical data, improves system reliability and clinical decision-making, while reducing misclassified instances

    Topological network features determine convergence rate of distributed average algorithms

    No full text
    Gossip algorithms are message-passing schemes designed to compute averages and other global functions over networks through asynchronous and randomised pairwise interactions. Gossip-based protocols have drawn much attention for achieving robust and fault-tolerant communication while maintaining simplicity and scalability. However, the frequent propagation of redundant information makes them inefficient and resource-intensive. Most previous works have been devoted to deriving performance bounds and developing faster algorithms tailored to specific structures. In contrast, this study focuses on characterising the effect of topological network features on performance so that faster convergence can be engineered by acting on the underlying network rather than the gossip algorithm. The numerical experiments identify the topological limiting factors, the most predictive graph metrics, and the most efficient algorithms for each graph family and for all graphs, providing guidelines for designing and maintaining resource-efficient networks. Regression analyses confirm the explanatory power of structural features and demonstrate the validity of the topological approach in performance estimation. Finally, the high predictive capabilities of local metrics and the possibility of computing them in a distributed manner and at a low computational cost inform the design and implementation of a novel distributed approach for predicting performance from the network topology

    Medical-informed machine learning: integrating prior knowledge into medical decision systems

    No full text
    Background: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML. Methods: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models. Results: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios. Conclusions: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction

    Investigating Participation Mechanisms in EU Code Week

    No full text
    Digital competence (DC) is a broad set of skills, attitudes, and knowledge for confident, critical and responsible use of digital technologies in every aspect of life. DC proves essential in the contemporary digital landscape, yet its diffusion is hindered by biases, misunderstandings, and limited awareness. Teaching Informatics in the educational curriculum is increasingly supported by the institutions but faces serious challenges, such as teacher upskilling and support. In response, grassroots movements promoting computing literacy in an informal setting have grown, including EU Code Week, whose vision is to develop computing skills while promoting diversity and raising awareness of the importance of digital skills. This study extensively analyses EU Code Week editions spanning 2014 to 2021 across European Union member states, pursuing three primary objectives: firstly, to evaluate the teacher engagement in the campaign in terms of penetration, retention, and spatial distribution; secondly, to characterise the multifaceted audience and themes embraced by these initiatives; and lastly, to investigate the influence of socio-economic factors on engagement. The investigation uncovers the underlying mechanisms fostering Code Week’s engagement, providing insights to campaign organisers for strategic planning and resource allocation in future editions. Moreover, the analysis reveals that the most engaged areas are characterised by lower income, as well as lower digital literacy, restricted access to technology, and a less established computer education, suggesting that Code Week thrives precisely where its impact is most needed

    Predicting metabolic responses in genetic disorders via structural representation in machine learning

    No full text
    Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare. However, analysing large metabolomics datasets remains challenging due to limited and incompletely annotated biological pathways. To address this limitation, we recently proposed training machine learning classifiers on molecular fingerprints of metabolites to predict their responses under specific conditions and analysing feature importance to identify key chemical configurations, providing insights into the affected biological processes. This study extends our previous research by evaluating various metabolite structural representations, including Morgan fingerprint and its variants, graph-based structural encodings and proposing novel representations to improve resolution and interpretability of the state-of-the-art approaches. These structural encodings were evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. The study found that machine learning classifiers trained on the new representations improved in classification accuracy and interpretability. Notably, models trained on graph-based encoding do not exhibit performance gains, not even with pre-training on a larger metabolite dataset, underlining the efficacy of our proposed representations. Finally, feature importance analysis across different encoding methods consistently identifies similar structures as relevant for classification, underscoring the robustness of our approach across diverse structural representations

    Machine Learning-Enabled Prediction of Metabolite Response in Genetic Disorders

    No full text
    Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare, holding great potential for disease detection and drug testing. However, analysing large metabolomics datasets remains challenging, with available methods generally relying on limited and incompletely annotated biological pathways. This study introduces a novel approach that leverages machine learning classifiers trained on molecular fingerprints of metabolites, to predict their responses under specific experimental conditions. The model is evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. In this study, metabolite structures are encoded using the Morgan fingerprint, a well-established technique widely embraced in drug discovery. The suitability of this fingerprinting method, in generating unique structural encodings for detected metabolites, is analysed, and strategies to mitigate resolution limitations inherent to this fingerprint are introduced. Machine learning classifiers are trained on these fingerprints and exhibit satisfactory performance, providing evidence that the structural encoding holds predictive power over the metabolic response. Feature importance analysis, conducted on the best-performing models, identifies the chemical configu- rations that have the greatest influence to the classification process, shedding light on affected biological processes. Remarkably, this analysis not only identifies metabolites known to participate in affected pathways but also discovers metabolites not previously associated with the disease, opening up novel opportunities for further exploration. As an initial exploration of the proposed approach, this work lays the foundation for future research that leverages alternative structural encodings, diverse machine learning models, and explainability tools

    Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments

    Get PDF
    Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments
    corecore