10 research outputs found
Distributed averaging for accuracy prediction in networked systems
Distributed averaging is among the most relevant cooperative control
problems, with applications in sensor and robotic networks, distributed signal
processing, data fusion, and load balancing. Consensus and gossip algorithms
have been investigated and successfully deployed in multi-agent systems to
perform distributed averaging in synchronous and asynchronous settings. This
study proposes a heuristic approach to estimate the convergence rate of
averaging algorithms in a distributed manner, relying on the computation and
propagation of local graph metrics while entailing simple data elaboration and
small message passing. The protocol enables nodes to predict the time (or the
number of interactions) needed to estimate the global average with the desired
accuracy. Consequently, nodes can make informed decisions on their use of
measured and estimated data while gaining awareness of the global structure of
the network, as well as their role in it. The study presents relevant
applications to outliers identification and performance evaluation in switching
topologies
Large-scale assessment of mobile crowdsensed data: a case study
Mobile crowdsensing (MCS) is a well-established paradigm that leverages mobile devicesâ ubiquitous nature and processing capabilities for large-scale data collection to monitor phenomena of common interest. Crowd-powered data collection is significantly faster and more cost-effective than traditional methods. However, it poses challenges in assessing the accuracy and extracting information from large volumes of user-generated data. SmartRoadSense (SRS) is an MCS technology that utilises sensors embedded in mobile phones to monitor the quality of road surfaces by computing a crowdsensed road roughness index (referred to as PPE). The present work performs statistical modelling of PPE to analyse its distribution across the road network and elucidate how it can be efficiently analysed and interpreted. Joint statistical analysis of open datasets is then carried out to investigate the effect of both internal and external road features on PPE . Several road properties affecting PPE as predicted are identified, providing evidence that SRS can be effectively applied to assess road quality conditions. Finally, the effect of road category and the speed limit on the mean and standard deviation of PPE is evaluated, incorporating previous results on the relationship between vehicle speed and PPE . These results enable more effective and confident use of the SRS platform and its data to help inform road construction and renovation decisions, especially where a lack of resources limits the use of conventional approaches. The work also exemplifies how crowdsensing technologies can benefit from open data integration and highlights the importance of making coherent, comprehensive, and well-structured open datasets available to the public
Exploring Machine Learning for Untargeted Metabolomics Using Molecular Fingerprints
Background
Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways.
Methods
This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups.
Results
The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study.
Conclusion
In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models
Hybrid Personal Medical Digital Assistant Agents
Autonomous intelligent systems are beginning to impact clinical practice as personal medical assistant agents, by leveraging expertsâ knowledge when needed and exploiting the vast amount of patient data available to clinicians. However, these approaches are seldom integrated. In this paper, we propose an integrated hybrid agent architecture that combines symbolic reasoning with sub-symbolic, data-driven models. Using the PIMA dataset, we demonstrate that this hybrid approach enhances the performance of both approaches when used alone. Specifically, we show that integrating a logical agent, which uses predefined expert knowledge plans, with rules obtained by symbolic knowledge extraction from machine learning models trained on historical data, improves system reliability and clinical decision-making, while reducing misclassified instances
Topological network features determine convergence rate of distributed average algorithms
Gossip algorithms are message-passing schemes designed to compute averages and other global functions over networks through asynchronous and randomised pairwise interactions. Gossip-based protocols have drawn much attention for achieving robust and fault-tolerant communication while maintaining simplicity and scalability. However, the frequent propagation of redundant information makes them inefficient and resource-intensive. Most previous works have been devoted to deriving performance bounds and developing faster algorithms tailored to specific structures. In contrast, this study focuses on characterising the effect of topological network features on performance so that faster convergence can be engineered by acting on the underlying network rather than the gossip algorithm. The numerical experiments identify the topological limiting factors, the most predictive graph metrics, and the most efficient algorithms for each graph family and for all graphs, providing guidelines for designing and maintaining resource-efficient networks. Regression analyses confirm the explanatory power of structural features and demonstrate the validity of the topological approach in performance estimation. Finally, the high predictive capabilities of local metrics and the possibility of computing them in a distributed manner and at a low computational cost inform the design and implementation of a novel distributed approach for predicting performance from the network topology
Medical-informed machine learning: integrating prior knowledge into medical decision systems
Background: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML. Methods: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models. Results: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios. Conclusions: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction
Investigating Participation Mechanisms in EU Code Week
Digital competence (DC) is a broad set of skills, attitudes, and knowledge for confident, critical and responsible use of digital technologies in every aspect of life. DC proves essential in the contemporary digital landscape, yet its diffusion is hindered by biases, misunderstandings, and limited awareness. Teaching Informatics in the educational curriculum is increasingly supported by the institutions but faces serious challenges, such as teacher upskilling and support. In response, grassroots movements promoting computing literacy in an informal setting have grown, including EU Code Week, whose vision is to develop computing skills while promoting diversity and raising awareness of the importance of digital skills. This study extensively analyses EU Code Week editions spanning 2014 to 2021 across European Union member states, pursuing three primary objectives: firstly, to evaluate the teacher engagement in the campaign in terms of penetration, retention, and spatial distribution; secondly, to characterise the multifaceted audience and themes embraced by these initiatives; and lastly, to investigate the influence of socio-economic factors on engagement. The investigation uncovers the underlying mechanisms fostering Code Weekâs engagement, providing insights to campaign organisers for strategic planning and resource allocation in future editions. Moreover, the analysis reveals that the most engaged areas are characterised by lower income, as well as lower digital literacy, restricted access to technology, and a less established computer education, suggesting that Code Week thrives precisely where its impact is most needed
Predicting metabolic responses in genetic disorders via structural representation in machine learning
Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare. However, analysing large metabolomics datasets remains challenging due to limited and incompletely annotated biological pathways. To address this limitation, we recently proposed training machine learning classifiers on molecular fingerprints of metabolites to predict their responses under specific conditions and analysing feature importance to identify key chemical configurations, providing insights into the affected biological processes. This study extends our previous research by evaluating various metabolite structural representations, including Morgan fingerprint and its variants, graph-based structural encodings and proposing novel representations to improve resolution and interpretability of the state-of-the-art approaches. These structural encodings were evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. The study found that machine learning classifiers trained on the new representations improved in classification accuracy and interpretability. Notably, models trained on graph-based encoding do not exhibit performance gains, not even with pre-training on a larger metabolite dataset, underlining the efficacy of our proposed representations. Finally, feature importance analysis across different encoding methods consistently identifies similar structures as relevant for classification, underscoring the robustness of our approach across diverse structural representations
Machine Learning-Enabled Prediction of Metabolite Response in Genetic Disorders
Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare, holding great potential for disease detection and drug testing. However, analysing large metabolomics datasets remains challenging, with available methods generally relying on limited and incompletely annotated biological pathways. This study introduces a novel approach that leverages machine learning classifiers trained on molecular fingerprints of metabolites, to predict their responses under specific experimental conditions. The model is evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. In this study, metabolite structures are encoded using the Morgan fingerprint, a well-established technique widely embraced in drug discovery. The suitability of this fingerprinting method, in generating unique structural encodings for detected metabolites, is analysed, and strategies to mitigate resolution limitations inherent to this fingerprint are introduced. Machine learning classifiers are trained on these fingerprints and exhibit satisfactory performance, providing evidence that the structural encoding holds predictive power over the metabolic response. Feature importance analysis, conducted on the best-performing models, identifies the chemical configu- rations that have the greatest influence to the classification process, shedding light on affected biological processes. Remarkably, this analysis not only identifies metabolites known to participate in affected pathways but also discovers metabolites not previously associated with the disease, opening up novel opportunities for further exploration. As an initial exploration of the proposed approach, this work lays the foundation for future research that leverages alternative structural encodings, diverse machine learning models, and explainability tools
Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments
Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments