19 research outputs found
Machine Learning for Human Activity Detection in Smart Homes
Recognizing human activities in domestic environments from audio and active power consumption sensors is a challenging task since on the one hand, environmental sound signals are multi-source, heterogeneous, and varying in time and on the other hand, the active power consumption varies significantly for similar type electrical appliances.
Many systems have been proposed to process environmental sound signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. A part of this thesis contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features, and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the SNR and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D CNN using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems and validated the performance of our algorithms on public datasets (Google Brain/TensorFlow Speech Recognition Challenge and the 2017 Detection and Classification of Acoustic Scenes and Events Challenge).
Regarding the problem of the energy-based human activity recognition in a household environment, machine learning techniques to infer the state of household appliances from their energy consumption data are applied and rule-based scenarios that exploit these states to detect human activity are used. Since most activities within a house are related with the operation of an electrical appliance, this unimodal approach has a significant advantage using inexpensive smart plugs and smart meters for each appliance. This part of the thesis proposes the use of unobtrusive and easy-install tools (smart plugs) for data collection and a decision engine that combines energy signal classification using dominant classifiers (compared in advanced with grid search) and a probabilistic measure for appliance usage. It helps preserving the privacy of the resident, since all the activities are stored in a local database.
DNNs received great research interest in the field of computer vision. In this thesis we adapted different architectures for the problem of human activity recognition. We analyze the quality of the extracted features, and more specifically how model architectures and parameters affect the ability of the automatically extracted features from DNNs to separate activity classes in the final feature space. Additionally, the architectures that we applied for our main problem were also applied to text classification in which we consider the input text as an image and apply 2D CNNs to learn the local and global semantics of the sentences from the variations of the visual patterns of words. This work helps as a first step of creating a dialogue agent that would not require any natural language preprocessing.
Finally, since in many domestic environments human speech is present with other environmental sounds, we developed a Convolutional Recurrent Neural Network, to separate the sound sources and applied novel post-processing filters, in order to have an end-to-end noise robust system. Our algorithm ranked first in the Apollo-11 Fearless Steps Challenge.Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 676157, project ACROSSIN
Identification and Recognition of Remote-Controlled Malware
This thesis encapsulates research on the detection of botnets. First, we design and implement Sandnet, an observation and monitoring infrastructure to study the botnet phenomenon. Using Sandnet, we evaluate detection approaches based on traffic analysis and rogue visual monetization. Therefore, we identify and recognize botnet C&C channels by help of traffic analysis. To a large degree, our clustering and classification leverage the sequence of message lengths per flow. As a result, our implementation, CoCoSpot, proves to reliably detect active C&C communication of a variety of botnet families, even in face of fully encrypted C&C messages. Furthermore, we found a botnet that uses DNS as carrier protocol for its command and control channel. By help of statistical entropy as well as behavioral features, we design and implement a classifier that detects DNS-based C&C, even in mixed network traffic of benign users. Finally, perceptual clustering of Sandnet screenshots enables us to group malware into rogue visual monetization campaigns and study their monetization properties
Recommended from our members
Graphics Processing Unit-Accelerated Numerical Simulations and Theoretical Study of Qubit Dynamics in Realistic Systems
Quantum computers are thought to be the future of computation, using the properties of quantum mechanics to solve problems intractable to classical computers.
Quantum computing leverages non-classical properties, such as entanglement, to achieve an exponential improvement in computational power. A quantum computer would enable us to address many real-world problems, such as how to synthesize fertilizers more efficiently; how to combat global warming; or to simulate protein folding in biological systems.
Although much work has been done to describe the use and implementation of entanglement generation theoretically, it is still a challenge to develop such protocols experimentally.
The bulk of this work is focused on creating Graphics Processing Unit (GPU)-accelerated computer simulations of quantum systems with advanced numerical and analytical techniques. Simulations can guide experiments attempting to create building blocks of quantum computers - qubits and their control devices. However, simulation of more realistic device setups in two dimensional systems has been facing problems owing to the space and time domain scaling associated with the solutions of the many-particle time dependent Schrodinger equation (TDSE). Nevertheless, recent advances in computer hardware performance has made previously intractable two-particle problems readily solvable. I have developed custom GPU-accelerated software based on a staggered-leapfrog algorithm that opens up new possibilities of simulating two-dimensional two-particle systems accurately.
I focus on three research projects. Firstly, optimally defining a charge-based solid state qubit, and controlling it in a simple and experimentally achievable way, while accounting for imperfections of the waveform generators. I simulate the physical qubit on a fine-grained lattice, and propose an innovative control scheme that accounts for finite rise/fall time of the experimental apparatus, while being relatively fast and resulting in very high operation fidelity. An optimal pulsing scheme with rise time-dependent parameters is found, and shown to be able to achieve an arbitrary qubit rotation. Since the proposed pulse sequence reduces to sine waves to minimize total pulse duration, it is straightforward to implement experimentally, and easily generalisable to different systems. I also show how the fidelity remains sufficiently high independently of the initial qubit state. The proposed sequence can even reduce errors caused by charge noise under certain conditions. Readout techniques are discussed as well, and found to not present significant issues.
Secondly, I aid the effort to create a Surface Acoustic Wave quantum computer prototype by describing how to produce an universal quantum gate set with a Root-of-SWAP operation used as a physical two-qubit gate. Using realistic parameters, it is shown how this operation can be performed with high fidelity.
Previous work has been done to simulate a proposed Root-of-SWAP method in one dimension - this work focuses on extending this to two dimensions.
We find that the method of generating Root-of-SWAP mentioned above breaks down in two dimensions- unwanted excitations are introduced in the extra dimension, causing a phase difference to appear, and thus ruining coherence of the state.
I propose to implement the Root-of-SWAP operation via a tunneling interaction across the effective double dot instead. This was previously considered, however was thought to be unstable against variations in tunnel barrier height, which has exponential impact on the speed of the quantum operation. Using newly available computing power, we were able to run detailed two dimensional simulations investigating this method and its robustness against variations in the double dot potential. We find that the method produces high fidelity Root-of-SWAP states, and is robust against small variations in the tunnel barrier. Additionally, we find a relation between the tunnel barrier height and spin measurement probability, providing a way for experimentalists to estimate an actual device barrier indirectly.
Finally, I theoretically model and simulate transport through a single electron transistor (SET) device. It is shown that a single donor structure can reliably be engineered from doped quantum dots by taking advantage of the tunability of the electron tunneling rates as well as the interplay, at low temperatures, between disorder conferred by randomness in dopant distribution and electron-electron interaction originating from the high doping concentration. It is possible to electrostatically isolate a single donor from the large ensemble of dopants. I investigate how such a complex system is expected to conduct, and verify a hypothesis that two donors take part in the transport by numerically reproducing the experimental measurements. Finally, it is shown that this device can be used as a single atom detector of the charge occupancy of a nearby capacitively coupled double quantum dot. While this final part does not make use of the GPU-accelerated software, it is still closely related to the rest of this work, and the theme of modeling realistic quantum devices.Project for Developing Innovation Systems of
the Ministry of Education, Culture, Sports, Science and Technology (MEXT)
Engineering and Physical Sciences Research Council (EPSRC) and Hitachi via CASE studentships RG 9463
Efficient data reconfiguration for today's cloud systems
Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers.
In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path.
Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art
Machine learning for particle identification in the LHCb detector
LHCb experiment is a specialised b-physics experiment at the Large Hadron Collider at CERN. It has a broad physics program with the primary objective being the search for CP violations that would explain the matter-antimatter asymmetry of the Universe. LHCb studies very rare phenomena, making it necessary to process millions of collision events per second to gather enough data in a reasonable time frame. Thus software and data analysis tools are essential for the success of the experiment.
Particle identification (PID) is a crucial ingredient of most of the LHCb results. The quality of the particle identification depends a lot on the data processing algorithms. This dissertation aims to leverage the recent advances in machine learning field to improve the PID at LHCb.
The thesis contribution consists of four essential parts related to LHCb internal projects. Muon identification aims to quickly separate muons from the other charged particles using only information from the Muon subsystem. The second contribution is a method that takes into account a priori information on label noise and improves the accuracy of a machine learning model for classification of this data. Such data are common in high-energy physics and, in particular, is used to develop the data-driven muon identification methods. Global PID combines information from different subdetectors into a single set of PID variables. Cherenkov detector fast simulation aims to improve the speed of the PID variables simulation in Monte-Carlo
Scalable Profiling and Visualization for Characterizing Microbiomes
Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information.
The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions.
In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities.
The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy.
Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes
Recommended from our members
Strong, thorough, and efficient memory protection against existing and emerging DRAM errors
Memory protection is necessary to ensure the correctness of data in the presence of unavoidable faults. As such, large-scale systems typically employ Error Correcting Codes (ECC) to trade off redundant storage and bandwidth for increased reliability. Single Device Data Correction (SDDC) ECC mechanisms are required to meet the reliability demands of servers and large-scale systems by tolerating even severe faults that disable an entire memory chip. In the future, however, stronger memory protection will be required due to increasing levels of system integration, shrinking process technology, and growing transfer rates. The energy-efficiency of memory protection is also important as DRAM already consumes a significant fraction of system energy budget. This dissertation develops a novel set of ECC schemes to provide strong, safe, flexible, and thorough protection against existing and emerging types of DRAM errors. This research also reduces energy consumption of such protection while only marginally impacting performance. First, this dissertation develops Bamboo ECC, a technique with strongerthan-SDDC correction and very safe detection capabilities (≥ 99.999994% of data errors with any severity are detected). Bamboo ECC changes ECC layout based on frequent DRAM error patterns, and can correct concurrent errors from multiple devices and all but eliminates the risk of silent data corruption. Also, Bamboo ECC provides flexible configurations to enable more adaptive graceful downgrade schemes in which the system continues to operate correctly after even severe chip faults, albeit at a reduced capacity to protect against future faults. These strength, safety, and flexibility advantages translate to a significantly more reliable memory sub-system for future exascale computing. Then, this dissertation focuses on emerging error types from scaling process technology and increasing data bandwidth. As DRAM process technology scales down to below 10nm, DRAM cells are becoming more vulnerable to errors from an imperfect manufacturing process. At the same time, DRAM signal transfers are getting more susceptible to timing and electrical noises as DRAM interfaces keep increasing signal transfer rates and decreasing I/O voltage levels. With individual DRAM chips getting more vulnerable to errors, industry and academia have proposed mechanisms to tolerate these emerging types of errors; yet they are inefficient because they rely on multiple levels of redundancy in the case of cell errors and ad-hoc schemes with suboptimal protection coverage for transmission errors. Active Guardband ECC and All-Inclusive ECC make systematic use of ECC and existing mechanisms to provide thorough end-to-end protection without requiring redundancy beyond what is common today. Finally, this dissertation targets the energy efficiency of memory protection. Frugal ECC combines ECC with fine-grained compression to provide versatile and energy-efficient protection. Frugal ECC compresses main memory at cache-block granularity, using any left over space to store ECC information. Frugal ECC allows more energy-efficient memory configurations while maintaining SDDC protection. Its tailored compression scheme minimizes insufficiently compressed blocks and results in acceptable performance overhead. The strong, thorough, and efficient protection described by this dissertation may allow for more aggressive design of future computing systems with larger integration, finer process technology, higher transfer rates, and better energy efficiencyElectrical and Computer Engineerin
Volume 59, Number 06 (June 1941)
Economics of Piano Study
Music As a Social Force
Problems of the Advanced Piano Student (interview with Artur Rubinstein)
Teaching the Teens
Musical Development in the Philippines
How Fast Shall I Play It? The Rhythms and Speeed of the Classics
What theLittle Mother Did: In Which the Great American Baritone Tells Why Students of Singing Should Study the Piano
You Can’t Get Away from It!
Making Practice Profitable (interview with Mischa Elman)
Morning Music and What It Meant: Some Ineresting Known Facts About Ancient Concerts and Their Givers
Four Strong Foundations: The Importance of Proper Hand, Wrist, Arm and Forearm Motion in the Study of the Piano
Check Up
Piano Class Methods in Beethoven\u27s Time
Technic of the Month—Octaves
Flight of the Clipperino: A Modern Composer Writes a Piano Concerto in Six Movements
Accordion Questions Answeredhttps://digitalcommons.gardner-webb.edu/etude/1248/thumbnail.jp