30 research outputs found
Recommended from our members
You Are Sensing, but Are You Biased?
Mobile devices are becoming pervasive to our daily lives: they follow us everywhere and we use them for much more than just communication. These devices are also equipped with a myriad of different sensors that have the potential to allow the tracking of human activities, user patterns, location, direction and much more. Following this direction, many movements including sports, quantified self, and mobile health ones are starting to heavily rely on this technology, making it pivotal that the sensors offer high accuracy.
However, heterogeneity in hardware manufacturing, slight substrate differences, electronic interference as well as external disturbances are just few of the reasons that limit sensor output accuracy which in turn hinders sensor usage in applications which need very high granularity and precision, such as quantified-self applications. Although, calibration of sensors is a widely studied topic in literature to the best of our knowledge no publicly available research exists that specifically tackles the calibration of mobile phones and existing methods that can be adapted for use in mobile devices not only require user interaction but they are also not adaptive to changes. Additionally, alternative approaches for performing more granular and accurate sensing exploit body-wide sensor networks using mobile phones and additional sensors; as one can imagine these techniques can be bulky, tedious, and not particularly user friendly. Moreover, existing techniques for performing data corrections post-acquisition can produce inconsistent results as they miss important context information provided from the device itself; which when used, has been shown to produce better results without a imposing a significant power-penalty.
In this paper we introduce a novel multiposition calibration scheme that is specifically targeted at mobile devices Our scheme exploits machine learning techniques to perform an adaptive, power-efficient auto-calibration procedure with which achieves high output sensor accuracy when compared to state of the art techniques without requiring any user interaction or special equipment beyond device itself Moreover, the energy costs associated with our approach are lower than the alternatives (such as Kalman filter based solutions) and the overall power penalty is < 5% when compared against power usage that is exhibited when using uncalibrated traces, thus, enabling our technique to be used efficiently on a wide variety of devices Finally, our evaluation illustrates that calibrated signals offer a tangible benefit in classification accuracy, ranging from 3 to 10%, over uncalibrated ones when using state of the art classifiers, on the other hand when using simpler SVM classifiers the classification improvement is boosted ranging from 8% to 12% making lower performing classifiers much more reliable Additionally, we show that for similar activities which are hard to distinguish otherwise, we reach an accuracy of > 95% when using neural network classifiers and > 88% when using SVM classifiers where uncalibrated data classification only reaches ~ 85% and ~ 80% respectively This can be a make or break factor in the use of accelerometer and gyroscope data in applications requiring high accuracy e g sports, health, games and othersThisworkwas supported by The Alan Turing Institute under grants: TU/C/000003, TU/B/000069, and EP/N510129/1
Sounds of COVID-19: exploring realistic performance of audio-based digital testing.
To identify Coronavirus disease (COVID-19) cases efficiently, affordably, and at scale, recent work has shown how audio (including cough, breathing and voice) based approaches can be used for testing. However, there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside symptoms and COVID-19 test results. Within the collected dataset, we selected 5240 samples from 2478 English-speaking participants and split them into participant-independent sets for model development and validation. In addition to controlling the language, we also balanced demographics for model training to avoid potential acoustic bias. We used these audio samples to construct an audio-based COVID-19 prediction model. The unbiased model took features extracted from breathing, coughs and voice signals as predictors and yielded an AUC-ROC of 0.71 (95% CI: 0.65-0.77). We further explored several scenarios with different types of unbalanced data distributions to demonstrate how biases and participant splits affect the performance. With these different, but less appropriate, evaluation strategies, the performance could be overestimated, reaching an AUC up to 0.90 (95% CI: 0.85-0.95) in some circumstances. We found that an unrealistic experimental setting can result in misleading, sometimes over-optimistic, performance. Instead, we reported complete and reliable results on crowd-sourced data, which would allow medical professionals and policy makers to accurately assess the value of this technology and facilitate its deployment
Recommended from our members
Federated Linear Dimensionality Reduction
In recent years, the explosive rate of dataset expansion has offered the ability for researchers to access an unprecedented amount of information. Moreover, in addition to actual dataset size increases, the types and locations of data generators are more heterogeneous than ever -ranging from traditional servers to a myriad of IoT devices. These facts, coupled with recent emphasis on privacy and data-ownership, led to the creation of federated datasets. Such datasets are characterised by their massive size and are usually scattered across decentralised edge devices, each holding their local data samples. As exciting as these federated datasets might be, they introduce an astounding challenge: how to efficiently process federated data at scale? Naturally, given these constraints, centralisation of such datasets is often intractable, thus making traditional analytical methods inapplicable.
This thesis introduces a suite of mathematical advancements that makes traditional learning algorithms applicable to the federated setting, by summarising the massive amounts of information into succinct dataset-specific representations. Concretely, we focus primarily on linear dimensionality reduction and, in particular, on Principal Component Analysis (PCA) due to its pervasiveness, along with its ability to process unstructured data. The first advancement we introduce is a novel algorithm to perform streaming and memory-limited dimesionality reduction at the edge that uses a generalisation of incremental Singular Value Decomposition (SVD). Further, we provide a rank-adaptive SVD extension able to account for distribution shifts over-time. Subsequently, building upon previous constructions, we present an (ε,δ)-differentially private federated algorithm for PCA. To achieve federation, we put forth a lightweight merging algorithm that unlocks the ability to process each subproblem locally at the edge which, in turn, through merging is propagated accordingly. We are able to guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset is perturbed with a non-symmetric random Gaussian matrix.
To evaluate the practicality of our innovations, we describe an algorithm able to perform task scheduling on federated data centres. The scheduler enables each decentralised node to incrementally compute its local model and independently execute scheduling decisions on whether to accept an incoming job based on the workload seen thus far. Finally, we complement our findings with an evaluation on synthetic as well as real-world datasets including sensor node measurements, hard-written images, and wine quality readings, considering a wide range of data modalities and dimensionalities.This thesis and associated studentship was supported by The Alan Turing Institute under grants: TU/C/000003, {TU/B/000069}, and {EP/N510129/1}
CPU Scheduling in Data Centers Using Asynchronous Finite-Time Distributed Coordination Mechanisms
Publisher Copyright: © 2013 IEEE.We propose an asynchronous iterative scheme that allows a set of interconnected nodes to distributively reach an agreement within a pre-specified bound in a finite number of steps. While this scheme could be adopted in a wide variety of applications, we discuss it within the context of task scheduling for data centers. In this context, the algorithm is guaranteed to approximately converge to the optimal scheduling plan, given the available resources, in a finite number of steps. Furthermore, by being asynchronous, the proposed scheme is able to take into account the uncertainty that can be introduced from straggler nodes or communication issues in the form of latency variability while still converging to the target objective. In addition, by using extensive empirical evaluation through simulations we show that the proposed method exhibits state-of-the-art performance.Peer reviewe
Recommended from our members
MOSES: A Streaming Algorithm for Linear Dimensionality Reduction
This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of data and reducing its dimension. More specifically, consider a scenario where the data vectors are presented sequentially to a user who has limited storage and processing time available, for example in the context of sensor networks. In this scenario, MOSES maintains an estimate of leading principal components of the data that has arrived so far and also reduces its dimension. In terms of its origins, MOSES slightly generalises the popular incremental Singular Value Decomposition (SVD) to handle thin blocks of data. This simple generalisation is in part what allows us to complement MOSES with a comprehensive statistical analysis that is not available for incremental SVD, despite its empirical success. This generalisationalso enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We also find that MOSES shows state-of-the-art performance in our numerical experiments with both synthetic and real-world datasets
MOSES: A Streaming Algorithm for Linear Dimensionality Reduction
This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of streaming data and reducing its dimension. More specifically, in various applications such as sensor networks, the data vectors are presented sequentially to a user who has limited storage and processing time available. Applied to such problems, MOSES can provide a running estimate of leading principal components of the data that has arrived so far and also reduce its dimension. MOSES generalises the popular incremental Singular Vale Decomposition (iSVD) to handle thin blocks of data, rather than just vectors. This minor generalisation in part allows us to complement MOSES with a comprehensive statistical analysis, thus providing the first theoretically-sound variant of iSVD, which has been lacking despite the empirical success of this method. This generalisation also enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We find that MOSES consistently surpasses the state of the art in our numerical experiments with both synthetic and real-world datasets, while being computationally inexpensive
An Asynchronous Approximate Distributed Alternating Direction Method of Multipliers in Digraphs
Funding Information: This work was supported by the Academy of Finland under Grant 320043. The work of T. Charalambous was supported by the Academy of Finland under Grant 317726. Publisher Copyright: © 2021 IEEE.In this work, we consider the asynchronous distributed optimization problem in which each node has its own convex cost function and can communicate directly only with its neighbors, as determined by a directed communication topology (directed graph or digraph). First, we reformulate the optimization problem so that Alternating Direction Method of Multipliers (ADMM) can be utilized. Then, we propose an algorithm, herein called Asynchronous Approximate Distributed Alternating Direction Method of Multipliers (AsyAD-ADMM), using finite-time asynchronous approximate ratio consensus, to solve the multi-node convex optimization problem, in which every node performs iterative computations and exchanges information with its neighbors asynchronously. More specifically, at every iteration of AsyAD-ADMM, each node solves a local convex optimization problem for the one of the primal variables and utilizes a finite-time asynchronous approximate consensus protocol to obtain the value of the other variable which is close to the optimal value, since the cost function for the second primal variable is not decomposable. If the individual cost functions are convex, but not-necessarily differentiable, the proposed algorithm converges at a rate of O(1/k), where k is the iteration counter. The efficacy of AsyAD-ADMM is exemplified via a proof-of-concept distributed least square optimization problem with different performance-influencing factors investigated.Peer reviewe
Federated Principal Component Analysis.
We present a federated, asynchronous, and (ε, δ)-differentially private algorithm for PCA in the memory-limited setting. Our algorithm incrementally computes local model updates using a streaming procedure and adaptively estimates its r leading principal components when only O(dr) memory is available with d being the dimensionality of the data. We guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset X is perturbed with a non-symmetric random Gaussian matrix with variance in O((d/n)^2 log(d)), thus improving upon the state-of-the-art. Furthermore, contrary to previous federated or distributed algorithms for PCA, our algorithm is also invariant to permutations in the incoming data, which provides robustness against straggler or failed nodes. Numerical simulations show that, while using limited memory, our algorithm exhibits performance that closely matches or outperforms traditional non-federated algorithms, and in the absence of communication latency, it exhibits attractive horizontal scalability
Recursive neural networks: recent results and applications
Neural Network’s basic principles and functions are based on the nervous system of living organisms, they aim to simulate neurons of the human brain to solve complicated real-world problems by working in a forward-only manner. A recursive Neural Network on the other hand is based on a recursive design principle over a given sequence input, to come up with a scalar assessment of the structured input. This means that is ideal for a given sequence of input data that is when processed dependent on its previous input sequence, which by default are used in various problems of our era. A common example could be devices such as Amazon Alexa, which uses speech recognition i.e., given an audio input source that receives audio signals, tries to predict logical expressions extracted from its different audio segments to form complete sentences. But RNNs do not come with no problems or difficulties. Today’s problems become more and more complex involving parameters in big data form, therefore a need for bigger and deeper RNNs is being created. This paper aims to explore these problems and ways to reduce them while also providing a description of RNN’s beneficial nature and listing different uses of the state-of-the-art RNNs and their use in different problems as those mentioned above