18 research outputs found
Experimentation and Analysis of Ensemble Deep Learning in IoT Applications
This paper presents an experimental study of Ensemble Deep Learning (DL) techniques for the analysis of time series data on IoT devices. We have shown in our earlier work that DL demonstrates superior performance compared to traditional machine learning techniques on fall detection applications due to the fact that important features in time series data can be learned and need not be determined manually by the domain expert. However, DL networks generally require large datasets for training. In the health care domain, such as the real-time smartwatch-based fall detection, there are no publicly available large annotated datasets that can be used for training, due to the nature of the problem (i.e. a fall is not a common event). Moreover, fall data is also inherently noisy since motions generated by the wrist-worn smartwatch can be mistaken for a fall. This paper explores combing DL (Recurrent Neural Network) with ensemble techniques (Stacking and AdaBoosting) using a fall detection application as a case study. We conducted a series of experiments using two different datasets of simulated falls for training various ensemble models. Our results show that an ensemble of deep learning models combined by the stacking ensemble technique, outperforms a single deep learning model trained on the same data samples, and thus, may be better suited for small-size datasets
Heterogeneous data fusion to type brain tumor biopsies
Abstract Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this paper, we use machine learning algorithms to create a novel framework to perform the heterogeneous data fusion on both metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) Magnetic Resonance Spectroscopy and gene transcriptome profiling, to intact brain tumor biopsies and to identify different profiles of brain tumors. Our experimental results show our novel framework outperforms any analysis using individual dataset
Complaint-driven Training Data Debugging for Query 2.0
As the need for machine learning (ML) increases rapidly across all industry
sectors, there is a significant interest among commercial database providers to
support "Query 2.0", which integrates model inference into SQL queries.
Debugging Query 2.0 is very challenging since an unexpected query result may be
caused by the bugs in training data (e.g., wrong labels, corrupted features).
In response, we propose Rain, a complaint-driven training data debugging
system. Rain allows users to specify complaints over the query's intermediate
or final output, and aims to return a minimum set of training examples so that
if they were removed, the complaints would be resolved. To the best of our
knowledge, we are the first to study this problem. A naive solution requires
retraining an exponential number of ML models. We propose two novel heuristic
approaches based on influence functions which both require linear retraining
steps. We provide an in-depth analytical and empirical analysis of the two
approaches and conduct extensive experiments to evaluate their effectiveness
using four real-world datasets. Results show that Rain achieves the highest
recall@k among all the baselines while still returns results interactively.Comment: Proceedings of the 2020 ACM SIGMOD International Conference on
Management of Dat
Spam Filtering with Naive Bayes -- Which Naive Bayes?
Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the different versions of nb over the entire tradeoff between true positives and true negatives
In Situ Wireless Channel Visualization Using Augmented Reality and Ray Tracing
This article presents a novel methodology for predicting wireless signal propagation using ray-tracing algorithms, and visualizing signal variations in situ by leveraging Augmented Reality (AR) tools. The proposed system performs a special type of spatial mapping, capable of converting a scanned indoor environment to a vector facet model. A ray-tracing algorithm uses the facet model for wireless signal predictions. Finally, an AR application overlays the signal strength predictions on the physical space in the form of holograms. Although some indoor reconstruction models have already been developed, this paper proposes an image to a facet algorithm for indoor reconstruction and compares its performance with existing AR algorithms, such as spatial understanding that are modified to create the required facet models. In addition, the paper orchestrates AR and ray-tracing techniques to provide an in situ network visualization interface. It is shown that the accuracy of the derived facet models is acceptable, and the overall signal predictions are not significantly affected by any potential inaccuracies of the indoor reconstruction. With the expected increase of densely deployed indoor 5G networks, it is believed that these types of AR applications for network visualization will play a key role in the successful planning of 5G networks