26,900 research outputs found
Features for the classification and clustering of music in symbolic format
Tese de mestrado, Engenharia Informática, Universidade de Lisboa, Faculdade de Ciências, 2008Este documento descreve o trabalho realizado no âmbito da disciplina de Projecto em Engenharia Informática do Mestrado em Engenharia Informática da Faculdade de Ciências da Universidade de Lisboa. Recuperação de Informação Musical é, hoje em dia, um ramo altamente activo de investigação e desenvolvimento na área de ciência da computação, e incide em diversos tópicos, incluindo a classificação musical por géneros. O trabalho apresentado centra-se na Classificação de Pistas e de Géneros de música armazenada usando o formato MIDI. Para resolver o problema da classificação de pistas MIDI, extraimos um conjunto de descritores que são usados para treinar um classificador implementado através de uma técnica de Máquinas de Aprendizagem, Redes Neuronais, com base nas notas, e durações destas, que descrevem cada faixa. As faixas são classificadas em seis categorias: Melody (Melodia), Harmony (Harmonia), Bass (Baixo) e Drums (Bateria). Para caracterizar o conteúdo musical de cada faixa, um vector de descritores numérico, normalmente conhecido como ”shallow structure description”, é extraído. Em seguida, eles são utilizados no classificador — Neural Network — que foi implementado no ambiente Matlab. Na Classificação por Géneros, duas propostas foram usadas: Modelação de Linguagem, na qual uma matriz de transição de probabilidades é criada para cada tipo de pista midi (Melodia, Harmonia, Baixo e Bateria) e também para cada género; e Redes Neuronais, em que um vector de descritores numéricos é extraído de cada pista, e é processado num Classificador baseado numa Rede Neuronal. Seis Colectâneas de Musica no formato Midi, de seis géneros diferentes, Blues, Country, Jazz, Metal, Punk e Rock, foram formadas para efectuar as experiências. Estes géneros foram escolhidos por partilharem os mesmos instrumentos, na sua maioria, como por exemplo, baixo, bateria, piano ou guitarra. Estes géneros também partilham algumas características entre si, para que a classificação não seja trivial, e para que a robustez dos classificadores seja testada. As experiências de Classificação de Pistas Midi, nas quais foram testados, numa primeira abordagem, todos os descritores, e numa segunda abordagem, os melhores descritores, mostrando que o uso de todos os descritores é uma abordagem errada, uma vez que existem descritores que confundem o classificador. Provou-se que a melhor maneira, neste contexto, de se classificar estas faixas MIDI é utilizar descritores cuidadosamente seleccionados. As experiências de Classificação por Géneros, mostraram que os Classificadores por Instrumentos (Single-Instrument) obtiveram os melhores resultados. Quatro géneros, Jazz, Country, Metal e Punk, obtiveram resultados de classificação com sucesso acima dos 80%
O trabalho futuro inclui: algoritmos genéticos para a selecção de melhores descritores; estruturar pistas e musicas; fundir todos os classificadores desenvolvidos num único classificador.This document describes the work carried out under the discipline of Computing Engineering Project of the Computer Engineering Master, Sciences Faculty of the Lisbon University. Music Information Retrieval is, nowadays, a highly active branch of research and development in the computer science field, and focuses several topics, including music genre classification. The work presented in this paper focus on Track and Genre Classification of music stored using MIDI format, To address the problem of MIDI track classification, we extract a set of descriptors that are used to train a classifier implemented by a Neural Network, based on the pitch levels and durations that describe each track. Tracks are classified into four classes: Melody, Harmony, Bass and Drums. In order to characterize the musical content from each track, a vector of numeric descriptors, normally known as shallow structure description, is extracted. Then they are used as inputs for the classifier which was implemented in the Matlab environment. In the Genre Classification task, two approaches are used: Language Modeling, in which a transition probabilities matrix is created for each type of track (Melody, Harmony, Bass and Drums) and also for each genre; and an approach based on Neural Networks, where a vector of numeric descriptors is extracted from each track (Melody, Harmony, Bass and Drums) and fed to a Neural Network Classifier. Six MIDI Music Corpora were assembled for the experiments, from six different genres, Blues, Country, Jazz, Metal, Punk and Rock. These genres were selected because all of them have the same base instruments, such as bass, drums, piano or guitar. Also, the genres chosen share some characteristics between them, so that the classification isn’t trivial, and tests the classifiers robustness. Track Classification experiments using all descriptors and best descriptors were made, showing that using all descriptors is a wrong approach, as there are descriptors which confuse the classifier. Using carefully selected descriptors proved to be the best way to classify these MIDI tracks. Genre Classification experiments showed that the Single-Instrument Classifiers achieved the best results. Four genres achieved higher than 80% success rates: Jazz, Country, Metal and Punk. Future work includes: genetic algorithms; structurize tracks and songs; merge all presented classifiers into one full Automatic Genre Classification System
Machine Learning for Fluid Mechanics
The field of fluid mechanics is rapidly advancing, driven by unprecedented
volumes of data from field measurements, experiments and large-scale
simulations at multiple spatiotemporal scales. Machine learning offers a wealth
of techniques to extract information from data that could be translated into
knowledge about the underlying fluid mechanics. Moreover, machine learning
algorithms can augment domain knowledge and automate tasks related to flow
control and optimization. This article presents an overview of past history,
current developments, and emerging opportunities of machine learning for fluid
mechanics. It outlines fundamental machine learning methodologies and discusses
their uses for understanding, modeling, optimizing, and controlling fluid
flows. The strengths and limitations of these methods are addressed from the
perspective of scientific inquiry that considers data as an inherent part of
modeling, experimentation, and simulation. Machine learning provides a powerful
information processing framework that can enrich, and possibly even transform,
current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202
Expression cartography of human tissues using self organizing maps
Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.

A Review on Energy Consumption Optimization Techniques in IoT Based Smart Building Environments
In recent years, due to the unnecessary wastage of electrical energy in
residential buildings, the requirement of energy optimization and user comfort
has gained vital importance. In the literature, various techniques have been
proposed addressing the energy optimization problem. The goal of each technique
was to maintain a balance between user comfort and energy requirements such
that the user can achieve the desired comfort level with the minimum amount of
energy consumption. Researchers have addressed the issue with the help of
different optimization algorithms and variations in the parameters to reduce
energy consumption. To the best of our knowledge, this problem is not solved
yet due to its challenging nature. The gap in the literature is due to the
advancements in the technology and drawbacks of the optimization algorithms and
the introduction of different new optimization algorithms. Further, many newly
proposed optimization algorithms which have produced better accuracy on the
benchmark instances but have not been applied yet for the optimization of
energy consumption in smart homes. In this paper, we have carried out a
detailed literature review of the techniques used for the optimization of
energy consumption and scheduling in smart homes. The detailed discussion has
been carried out on different factors contributing towards thermal comfort,
visual comfort, and air quality comfort. We have also reviewed the fog and edge
computing techniques used in smart homes
Expression cartography of human tissues using self organizing maps
Background: The availability of parallel, high-throughput microarray and sequencing experiments poses a challenge how to best arrange and to analyze the obtained heap of multidimensional data in a concerted way. Self organizing maps (SOM), a machine learning method, enables the parallel sample- and gene-centered view on the data combined with strong visualization and second-level analysis capabilities. The paper addresses aspects of the method with practical impact in the context of expression analysis of complex data sets.
Results: The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten thousands of genes to a few thousands of metagenes where each metagene acts as representative of a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering provide a better signal-to-noise ratio and a better representativeness of the method if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues into essentially three clusters containing nervous, immune system and the remaining tissues. 
Conclusions: The global view on the behavior of a few well-defined modules of correlated and differentially expressed genes is more intuitive and more informative than the separate discovery of the expression levels of hundreds or thousands of individual genes. The metagene approach is less sensitive to a priori selection of genes. It can detect a coordinated expression pattern whose components would not pass single-gene significance thresholds and it is able to extract context-dependent patterns of gene expression in complex data sets.

Building Block and Building Rule: Dual Descriptor Method for Biological Sequence Analysis
The emergence of “Systems Biology” in recent years highlights the systematic viewpoint of bio-system modeling. Building on such a background, Dual Descriptor Method, a generic methodology for biological sequence analysis is proposed. From a systematic perspective, Dual Descriptor is defined as a two element set of Composition Weight Map and Position Weight Function which aim at reflecting the composition and permutation information of a sequence. An alternate training algorithm is provided to get an optimum description of the building patterns of the sequences. In this paper, dual descriptor method has been applied to the analysis of two typical problems of molecular biology: gene identification and the prediction of protein function. Satisfactory and insightful results are achieved. Owing to the generality of this methodology, dual descriptor method has wide application perspective for many problems of pattern recognition, especially those involved in “Systems Biology”
Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes
Multiple biological processes are driven by oscillatory gene expression at
different time scales. Pulsatile dynamics are thought to be widespread, and
single-cell live imaging of gene expression has lead to a surge of dynamic,
possibly oscillatory, data for different gene networks. However, the regulation
of gene expression at the level of an individual cell involves reactions
between finite numbers of molecules, and this can result in inherent randomness
in expression dynamics, which blurs the boundaries between aperiodic
fluctuations and noisy oscillators. Thus, there is an acute need for an
objective statistical method for classifying whether an experimentally derived
noisy time series is periodic. Here we present a new data analysis method that
combines mechanistic stochastic modelling with the powerful methods of
non-parametric regression with Gaussian processes. Our method can distinguish
oscillatory gene expression from random fluctuations of non-oscillatory
expression in single-cell time series, despite peak-to-peak variability in
period and amplitude of single-cell oscillations. We show that our method
outperforms the Lomb-Scargle periodogram in successfully classifying cells as
oscillatory or non-oscillatory in data simulated from a simple genetic
oscillator model and in experimental data. Analysis of bioluminescent live cell
imaging shows a significantly greater number of oscillatory cells when
luciferase is driven by a {\it Hes1} promoter (10/19), which has previously
been reported to oscillate, than the constitutive MoMuLV 5' LTR (MMLV) promoter
(0/25). The method can be applied to data from any gene network to both
quantify the proportion of oscillating cells within a population and to measure
the period and quality of oscillations. It is publicly available as a MATLAB
package.Comment: 36 pages, 17 figure
Overcoming Inter-Subject Variability in BCI Using EEG-Based Identification
The high dependency of the Brain Computer Interface (BCI) system performance on the BCI user is a well-known issue of many BCI devices. This contribution presents a new way to overcome this problem using a synergy between a BCI device and an EEG-based biometric algorithm. Using the biometric algorithm, the BCI device automatically identifies its current user and adapts parameters of the classification process and of the BCI protocol to maximize the BCI performance. In addition to this we present an algorithm for EEG-based identification designed to be resistant to variations in EEG recordings between sessions, which is also demonstrated by an experiment with an EEG database containing two sessions recorded one year apart. Further, our algorithm is designed to be compatible with our movement-related BCI device and the evaluation of the algorithm performance took place under conditions of a standard BCI experiment. Estimation of the mu rhythm fundamental frequency using the Frequency Zooming AR modeling is used for EEG feature extraction followed by a classifier based on the regularized Mahalanobis distance. An average subject identification score of 96 % is achieved
- …