19 research outputs found

    BioDiffusion: A Versatile Diffusion Model for Biomedical Signal Synthesis

    Full text link
    Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality

    Experimentation and Analysis of Ensemble Deep Learning in IoT Applications

    Get PDF
    This paper presents an experimental study of Ensemble Deep Learning (DL) techniques for the analysis of time series data on IoT devices. We have shown in our earlier work that DL demonstrates superior performance compared to traditional machine learning techniques on fall detection applications due to the fact that important features in time series data can be learned and need not be determined manually by the domain expert. However, DL networks generally require large datasets for training. In the health care domain, such as the real-time smartwatch-based fall detection, there are no publicly available large annotated datasets that can be used for training, due to the nature of the problem (i.e. a fall is not a common event). Moreover, fall data is also inherently noisy since motions generated by the wrist-worn smartwatch can be mistaken for a fall. This paper explores combing DL (Recurrent Neural Network) with ensemble techniques (Stacking and AdaBoosting) using a fall detection application as a case study. We conducted a series of experiments using two different datasets of simulated falls for training various ensemble models. Our results show that an ensemble of deep learning models combined by the stacking ensemble technique, outperforms a single deep learning model trained on the same data samples, and thus, may be better suited for small-size datasets

    Heterogeneous data fusion to type brain tumor biopsies

    Get PDF
    Abstract Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this paper, we use machine learning algorithms to create a novel framework to perform the heterogeneous data fusion on both metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) Magnetic Resonance Spectroscopy and gene transcriptome profiling, to intact brain tumor biopsies and to identify different profiles of brain tumors. Our experimental results show our novel framework outperforms any analysis using individual dataset

    Complaint-driven Training Data Debugging for Query 2.0

    Full text link
    As the need for machine learning (ML) increases rapidly across all industry sectors, there is a significant interest among commercial database providers to support "Query 2.0", which integrates model inference into SQL queries. Debugging Query 2.0 is very challenging since an unexpected query result may be caused by the bugs in training data (e.g., wrong labels, corrupted features). In response, we propose Rain, a complaint-driven training data debugging system. Rain allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved. To the best of our knowledge, we are the first to study this problem. A naive solution requires retraining an exponential number of ML models. We propose two novel heuristic approaches based on influence functions which both require linear retraining steps. We provide an in-depth analytical and empirical analysis of the two approaches and conduct extensive experiments to evaluate their effectiveness using four real-world datasets. Results show that Rain achieves the highest recall@k among all the baselines while still returns results interactively.Comment: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Dat

    Spam Filtering with Naive Bayes -- Which Naive Bayes?

    No full text
    Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the different versions of nb over the entire tradeoff between true positives and true negatives

    In Situ Wireless Channel Visualization Using Augmented Reality and Ray Tracing

    No full text
    This article presents a novel methodology for predicting wireless signal propagation using ray-tracing algorithms, and visualizing signal variations in situ by leveraging Augmented Reality (AR) tools. The proposed system performs a special type of spatial mapping, capable of converting a scanned indoor environment to a vector facet model. A ray-tracing algorithm uses the facet model for wireless signal predictions. Finally, an AR application overlays the signal strength predictions on the physical space in the form of holograms. Although some indoor reconstruction models have already been developed, this paper proposes an image to a facet algorithm for indoor reconstruction and compares its performance with existing AR algorithms, such as spatial understanding that are modified to create the required facet models. In addition, the paper orchestrates AR and ray-tracing techniques to provide an in situ network visualization interface. It is shown that the accuracy of the derived facet models is acceptable, and the overall signal predictions are not significantly affected by any potential inaccuracies of the indoor reconstruction. With the expected increase of densely deployed indoor 5G networks, it is believed that these types of AR applications for network visualization will play a key role in the successful planning of 5G networks

    Correlation Analysis-Based Classification of Human Activity Time Series

    No full text