5,724 research outputs found

    Comparing Probabilistic Models for Melodic Sequences

    Get PDF
    Modelling the real world complexity of music is a challenge for machine learning. We address the task of modeling melodic sequences from the same music genre. We perform a comparative analysis of two probabilistic models; a Dirichlet Variable Length Markov Model (Dirichlet-VMM) and a Time Convolutional Restricted Boltzmann Machine (TC-RBM). We show that the TC-RBM learns descriptive music features, such as underlying chords and typical melody transitions and dynamics. We assess the models for future prediction and compare their performance to a VMM, which is the current state of the art in melody generation. We show that both models perform significantly better than the VMM, with the Dirichlet-VMM marginally outperforming the TC-RBM. Finally, we evaluate the short order statistics of the models, using the Kullback-Leibler divergence between test sequences and model samples, and show that our proposed methods match the statistics of the music genre significantly better than the VMM.Comment: in Proceedings of the ECML-PKDD 2011. Lecture Notes in Computer Science, vol. 6913, pp. 289-304. Springer (2011

    Inducing Probabilistic Grammars by Bayesian Model Merging

    Full text link
    We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based nn-grams, and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13 page

    Protein secondary structure: Entropy, correlations and prediction

    Get PDF
    Is protein secondary structure primarily determined by local interactions between residues closely spaced along the amino acid backbone, or by non-local tertiary interactions? To answer this question we have measured the entropy densities of primary structure and secondary structure sequences, and the local inter-sequence mutual information density. We find that the important inter-sequence interactions are short ranged, that correlations between neighboring amino acids are essentially uninformative, and that only 1/4 of the total information needed to determine the secondary structure is available from local inter-sequence correlations. Since the remaining information must come from non-local interactions, this observation supports the view that the majority of most proteins fold via a cooperative process where secondary and tertiary structure form concurrently. To provide a more direct comparison to existing secondary structure prediction methods, we construct a simple hidden Markov model (HMM) of the sequences. This HMM achieves a prediction accuracy comparable to other single sequence secondary structure prediction algorithms, and can extract almost all of the inter-sequence mutual information. This suggests that these algorithms are almost optimal, and that we should not expect a dramatic improvement in prediction accuracy. However, local correlations between secondary and primary structure are probably of under-appreciated importance in many tertiary structure prediction methods, such as threading.Comment: 8 pages, 5 figure

    Inducing a Semantically Annotated Lexicon via EM-Based Clustering

    Full text link
    We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries.Comment: 8 pages, uses colacl.sty. Proceedings of the 37th Annual Meeting of the ACL, 199

    Classification and modeling of power line noise using machine learning techniques

    Get PDF
    A thesis submitted in ful lment of the requirements for the degree of Doctor of Philosophy in the School of Electrical and Information Engineering Faculty of Engineering and Built Environment June 2017The realization of robust, reliable and e cient data transmission have been the theme of recent research, most importantly in real channel such as the noisy, fading prone power line communication (PLC) channel. The focus is to exploit old techniques or create new techniques capable of improving the transmission reliability and also increasing the transmission capacity of the real communication channels. Multi-carrier modulation scheme such as Orthogonal Frequency Division Multiplexing (OFDM) utilizing conventional single-carrier modulation is developed to facilitate a robust data transmission, increasing transmission capacity (e cient bandwidth usage) and further reducing design complexity in PLC systems. On the contrary, the reliability of data transmission is subjected to several inhibiting factors as a result of the varying nature of the PLC channel. These inhibiting factors include noise, perturbation and disturbances. Contrary to the Additive White Gaussian noise (AWGN) model often assumed in several communication systems, this noise model fails to capture the attributes of noise encountered on the PLC channel. This is because periodic noise or random noise pulses injected by power electronic appliances on the network is a deviation from the AWGN. The nature of the noise is categorized as non-white non-Gaussian and unstable due to its impulsive attributes, thus, it is labeled as Non-additive White Gaussian Noise (NAWGN). These noise and disturbances results into long burst errors that corrupts signals being transmitted, thus, the PLC is labeled as a horrible or burst error channel. The e cient and optimal performance of a conventional linear receiver in the white Gaussian noise environment can therefore be made to drastically degrade in this NAWGN environment. Therefore, transmission reliability in such environment can be greatly enhanced if we know and exploit the knowledge of the channel's statistical attributes, thus, the need for developing statistical channel model based on empirical data. In this thesis, attention is focused on developing a recon gurable software de ned un-coded single-carrier and multicarrier PLC transceiver as a tool for realizing an optimized channel model for the narrowband PLC (NB-PLC) channel. First, a novel recon gurable software de ned un-coded single-carrier and multi-carrier PLC transceiver is developed for real-time NB-PLC transmission. The transceivers can be adapted to implement di erent waveforms for several real-time scenarios and performance evaluation. Due to the varying noise parameters obtained from country to country as a result of the dependence of noise impairment on mains voltages, topology of power line, place and time, the developed transceivers is capable of facilitating constant measurement campaigns to capture these varying noise parameters before statistical and mathematically inclined channel models are derived. Furthermore, the single-carrier (Binary Phase Shift Keying (BPSK), Di erential BPSK (DBPSK), Quadrature Phase Shift Keying (QPSK) and Di erential QPSK (DQPSK)) PLC transceiver system developed is used to facilitate a First-Order semi-hidden Fritchman Markov modeling (SHFMM) of the NB-PLC channel utilizing the e cient iterative Baum- Welch algorithm (BWA) for parameter estimation. The performance of each modulation scheme is evaluated in a mildly and heavily disturbed scenarios for both residential and laboratory site considered. The First-Order estimated error statistics of the realized First- Order SHFMM have been analytically validated in terms of performance metrics such as: log-likelihood ratio (LLR), error-free run distribution (EFRD), error probabilities, mean square error (MSE) and Chi-square ( 2) test. The reliability of the model results is also con rmed by an excellent match between the empirically obtained error sequence and the SHFMM regenerated error sequence as shown by the error-free run distribution plot. This thesis also reports a novel development of a low cost, low complexity Frequency-shift keying (FSK) - On-o keying (OOK) in-house hybrid PLC and VLC system. The functionality of this hybrid PLC-VLC transceiver system was ascertained at both residential and laboratory site at three di erent times of the day: morning, afternoon and evening. A First and Second-Order SHFMM of the hybrid system is realized. The error statistics of the realized First and Second-Order SHFMMs have been analytically validated in terms of LLR, EFRD, error probabilities, MSE and Chi-square ( 2). The Second-Order SHFMMs have also been analytically validated to be superior to the First-Order SHFMMs although at the expense of added computational complexity. The reliability of both First and Second-Order SHFMM results is con rmed by an excellent match between the empirical error sequences and SHFMM re-generated error sequences as shown by the EFRD plot. In addition, the multi-carrier (QPSK-OFDM, Di erential QPSK (DQPSK)-OFDM) and Di erential 8-PSK (D8PSK)-OFDM) PLC transceiver system developed is used to facilitate a First and Second-Order modeling of the NB-PLC system using the SHFMM and BWA for parameter estimation. The performance of each OFDM modulation scheme in evaluated and compared taking into consideration the mildly and heavily disturbed noise scenarios for the two measurement sites considered. The estimated error statistics of the realized SHFMMs have been analytically validated in terms of LLR, EFRD, error probabilities, MSE and Chi-square ( 2) test. The estimated Second-Order SHFMMs have been analytically validated to be outperform the First-Order SHFMMs although with added computational complexity. The reliability of the models is con rmed by an excellent match between the empirical data and SHFMM generated data as shown by the EFRD plot. The statistical models obtained using Baum-Welch to adjust the parameters of the adopted SHFMM are often locally maximized. To solve this problem, a novel Metropolis-Hastings algorithm, a Bayesian inference approach based on Markov Chain Monte Carlo (MCMC) is developed to optimize the parameters of the adopted SHFMM. The algorithm is used to optimize the model results obtained from the single-carrier and multi-carrier PLC systems as well as that of the hybrid PLC-VLC system. Consequently, as deduced from the results, the models obtained utilizing the novel Metropolis-Hastings algorithm are more precise, near optimal model with parameter sets that are closer to the global maxima. Generally, the model results obtained in this thesis are relevant in enhancing transmission reliability on the PLC channel through the use of the models to improve the adopted modulation schemes, create adaptive modulation techniques, develop and evaluate forward error correction (FEC) codes such as a concatenation of Reed-Solomon and Permutation codes and other robust codes suitable for exploiting and mitigating noise impairments encountered on the low voltage NB-PLC channel. Furthermore, the recon gurable software de ned NB-PLC transceiver test-bed developed can be utilized for future measurement campaign as well as adapted for multiple-input and multiple-output (MIMO) PLC applications.MT201

    Classification and modeling of power line noise using machine learning techniques

    Get PDF
    A thesis submitted in ful lment of the requirements for the degree of Doctor of Philosophy in the School of Electrical and Information Engineering Faculty of Engineering and Built Environment June 2017The realization of robust, reliable and e cient data transmission have been the theme of recent research, most importantly in real channel such as the noisy, fading prone power line communication (PLC) channel. The focus is to exploit old techniques or create new techniques capable of improving the transmission reliability and also increasing the transmission capacity of the real communication channels. Multi-carrier modulation scheme such as Orthogonal Frequency Division Multiplexing (OFDM) utilizing conventional single-carrier modulation is developed to facilitate a robust data transmission, increasing transmission capacity (e cient bandwidth usage) and further reducing design complexity in PLC systems. On the contrary, the reliability of data transmission is subjected to several inhibiting factors as a result of the varying nature of the PLC channel. These inhibiting factors include noise, perturbation and disturbances. Contrary to the Additive White Gaussian noise (AWGN) model often assumed in several communication systems, this noise model fails to capture the attributes of noise encountered on the PLC channel. This is because periodic noise or random noise pulses injected by power electronic appliances on the network is a deviation from the AWGN. The nature of the noise is categorized as non-white non-Gaussian and unstable due to its impulsive attributes, thus, it is labeled as Non-additive White Gaussian Noise (NAWGN). These noise and disturbances results into long burst errors that corrupts signals being transmitted, thus, the PLC is labeled as a horrible or burst error channel. The e cient and optimal performance of a conventional linear receiver in the white Gaussian noise environment can therefore be made to drastically degrade in this NAWGN environment. Therefore, transmission reliability in such environment can be greatly enhanced if we know and exploit the knowledge of the channel's statistical attributes, thus, the need for developing statistical channel model based on empirical data. In this thesis, attention is focused on developing a recon gurable software de ned un-coded single-carrier and multicarrier PLC transceiver as a tool for realizing an optimized channel model for the narrowband PLC (NB-PLC) channel. First, a novel recon gurable software de ned un-coded single-carrier and multi-carrier PLC transceiver is developed for real-time NB-PLC transmission. The transceivers can be adapted to implement di erent waveforms for several real-time scenarios and performance evaluation. Due to the varying noise parameters obtained from country to country as a result of the dependence of noise impairment on mains voltages, topology of power line, place and time, the developed transceivers is capable of facilitating constant measurement campaigns to capture these varying noise parameters before statistical and mathematically inclined channel models are derived. Furthermore, the single-carrier (Binary Phase Shift Keying (BPSK), Di erential BPSK (DBPSK), Quadrature Phase Shift Keying (QPSK) and Di erential QPSK (DQPSK)) PLC transceiver system developed is used to facilitate a First-Order semi-hidden Fritchman Markov modeling (SHFMM) of the NB-PLC channel utilizing the e cient iterative Baum- Welch algorithm (BWA) for parameter estimation. The performance of each modulation scheme is evaluated in a mildly and heavily disturbed scenarios for both residential and laboratory site considered. The First-Order estimated error statistics of the realized First- Order SHFMM have been analytically validated in terms of performance metrics such as: log-likelihood ratio (LLR), error-free run distribution (EFRD), error probabilities, mean square error (MSE) and Chi-square ( 2) test. The reliability of the model results is also con rmed by an excellent match between the empirically obtained error sequence and the SHFMM regenerated error sequence as shown by the error-free run distribution plot. This thesis also reports a novel development of a low cost, low complexity Frequency-shift keying (FSK) - On-o keying (OOK) in-house hybrid PLC and VLC system. The functionality of this hybrid PLC-VLC transceiver system was ascertained at both residential and laboratory site at three di erent times of the day: morning, afternoon and evening. A First and Second-Order SHFMM of the hybrid system is realized. The error statistics of the realized First and Second-Order SHFMMs have been analytically validated in terms of LLR, EFRD, error probabilities, MSE and Chi-square ( 2). The Second-Order SHFMMs have also been analytically validated to be superior to the First-Order SHFMMs although at the expense of added computational complexity. The reliability of both First and Second-Order SHFMM results is con rmed by an excellent match between the empirical error sequences and SHFMM re-generated error sequences as shown by the EFRD plot. In addition, the multi-carrier (QPSK-OFDM, Di erential QPSK (DQPSK)-OFDM) and Di erential 8-PSK (D8PSK)-OFDM) PLC transceiver system developed is used to facilitate a First and Second-Order modeling of the NB-PLC system using the SHFMM and BWA for parameter estimation. The performance of each OFDM modulation scheme in evaluated and compared taking into consideration the mildly and heavily disturbed noise scenarios for the two measurement sites considered. The estimated error statistics of the realized SHFMMs have been analytically validated in terms of LLR, EFRD, error probabilities, MSE and Chi-square ( 2) test. The estimated Second-Order SHFMMs have been analytically validated to be outperform the First-Order SHFMMs although with added computational complexity. The reliability of the models is con rmed by an excellent match between the empirical data and SHFMM generated data as shown by the EFRD plot. The statistical models obtained using Baum-Welch to adjust the parameters of the adopted SHFMM are often locally maximized. To solve this problem, a novel Metropolis-Hastings algorithm, a Bayesian inference approach based on Markov Chain Monte Carlo (MCMC) is developed to optimize the parameters of the adopted SHFMM. The algorithm is used to optimize the model results obtained from the single-carrier and multi-carrier PLC systems as well as that of the hybrid PLC-VLC system. Consequently, as deduced from the results, the models obtained utilizing the novel Metropolis-Hastings algorithm are more precise, near optimal model with parameter sets that are closer to the global maxima. Generally, the model results obtained in this thesis are relevant in enhancing transmission reliability on the PLC channel through the use of the models to improve the adopted modulation schemes, create adaptive modulation techniques, develop and evaluate forward error correction (FEC) codes such as a concatenation of Reed-Solomon and Permutation codes and other robust codes suitable for exploiting and mitigating noise impairments encountered on the low voltage NB-PLC channel. Furthermore, the recon gurable software de ned NB-PLC transceiver test-bed developed can be utilized for future measurement campaign as well as adapted for multiple-input and multiple-output (MIMO) PLC applications.MT201
    • …
    corecore