19 research outputs found

    Machine Learning for Human Activity Detection in Smart Homes

    Get PDF
    Recognizing human activities in domestic environments from audio and active power consumption sensors is a challenging task since on the one hand, environmental sound signals are multi-source, heterogeneous, and varying in time and on the other hand, the active power consumption varies significantly for similar type electrical appliances. Many systems have been proposed to process environmental sound signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. A part of this thesis contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features, and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the SNR and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D CNN using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems and validated the performance of our algorithms on public datasets (Google Brain/TensorFlow Speech Recognition Challenge and the 2017 Detection and Classification of Acoustic Scenes and Events Challenge). Regarding the problem of the energy-based human activity recognition in a household environment, machine learning techniques to infer the state of household appliances from their energy consumption data are applied and rule-based scenarios that exploit these states to detect human activity are used. Since most activities within a house are related with the operation of an electrical appliance, this unimodal approach has a significant advantage using inexpensive smart plugs and smart meters for each appliance. This part of the thesis proposes the use of unobtrusive and easy-install tools (smart plugs) for data collection and a decision engine that combines energy signal classification using dominant classifiers (compared in advanced with grid search) and a probabilistic measure for appliance usage. It helps preserving the privacy of the resident, since all the activities are stored in a local database. DNNs received great research interest in the field of computer vision. In this thesis we adapted different architectures for the problem of human activity recognition. We analyze the quality of the extracted features, and more specifically how model architectures and parameters affect the ability of the automatically extracted features from DNNs to separate activity classes in the final feature space. Additionally, the architectures that we applied for our main problem were also applied to text classification in which we consider the input text as an image and apply 2D CNNs to learn the local and global semantics of the sentences from the variations of the visual patterns of words. This work helps as a first step of creating a dialogue agent that would not require any natural language preprocessing. Finally, since in many domestic environments human speech is present with other environmental sounds, we developed a Convolutional Recurrent Neural Network, to separate the sound sources and applied novel post-processing filters, in order to have an end-to-end noise robust system. Our algorithm ranked first in the Apollo-11 Fearless Steps Challenge.Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 676157, project ACROSSIN

    Identification and Recognition of Remote-Controlled Malware

    Full text link
    This thesis encapsulates research on the detection of botnets. First, we design and implement Sandnet, an observation and monitoring infrastructure to study the botnet phenomenon. Using Sandnet, we evaluate detection approaches based on traffic analysis and rogue visual monetization. Therefore, we identify and recognize botnet C&C channels by help of traffic analysis. To a large degree, our clustering and classification leverage the sequence of message lengths per flow. As a result, our implementation, CoCoSpot, proves to reliably detect active C&C communication of a variety of botnet families, even in face of fully encrypted C&C messages. Furthermore, we found a botnet that uses DNS as carrier protocol for its command and control channel. By help of statistical entropy as well as behavioral features, we design and implement a classifier that detects DNS-based C&C, even in mixed network traffic of benign users. Finally, perceptual clustering of Sandnet screenshots enables us to group malware into rogue visual monetization campaigns and study their monetization properties

    Efficient data reconfiguration for today's cloud systems

    Get PDF
    Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers. In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path. Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art

    Machine learning for particle identification in the LHCb detector

    Get PDF
    LHCb experiment is a specialised b-physics experiment at the Large Hadron Collider at CERN. It has a broad physics program with the primary objective being the search for CP violations that would explain the matter-antimatter asymmetry of the Universe. LHCb studies very rare phenomena, making it necessary to process millions of collision events per second to gather enough data in a reasonable time frame. Thus software and data analysis tools are essential for the success of the experiment. Particle identification (PID) is a crucial ingredient of most of the LHCb results. The quality of the particle identification depends a lot on the data processing algorithms. This dissertation aims to leverage the recent advances in machine learning field to improve the PID at LHCb. The thesis contribution consists of four essential parts related to LHCb internal projects. Muon identification aims to quickly separate muons from the other charged particles using only information from the Muon subsystem. The second contribution is a method that takes into account a priori information on label noise and improves the accuracy of a machine learning model for classification of this data. Such data are common in high-energy physics and, in particular, is used to develop the data-driven muon identification methods. Global PID combines information from different subdetectors into a single set of PID variables. Cherenkov detector fast simulation aims to improve the speed of the PID variables simulation in Monte-Carlo

    Scalable Profiling and Visualization for Characterizing Microbiomes

    Get PDF
    Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information. The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions. In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities. The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy. Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes

    Volume 59, Number 06 (June 1941)

    Get PDF
    Economics of Piano Study Music As a Social Force Problems of the Advanced Piano Student (interview with Artur Rubinstein) Teaching the Teens Musical Development in the Philippines How Fast Shall I Play It? The Rhythms and Speeed of the Classics What theLittle Mother Did: In Which the Great American Baritone Tells Why Students of Singing Should Study the Piano You Can’t Get Away from It! Making Practice Profitable (interview with Mischa Elman) Morning Music and What It Meant: Some Ineresting Known Facts About Ancient Concerts and Their Givers Four Strong Foundations: The Importance of Proper Hand, Wrist, Arm and Forearm Motion in the Study of the Piano Check Up Piano Class Methods in Beethoven\u27s Time Technic of the Month—Octaves Flight of the Clipperino: A Modern Composer Writes a Piano Concerto in Six Movements Accordion Questions Answeredhttps://digitalcommons.gardner-webb.edu/etude/1248/thumbnail.jp
    corecore