775 research outputs found

    Improving clustering with metabolic pathway data

    Get PDF
    Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Lopez, Mariana Gabriela. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; ArgentinaFil: Carrari, Fernando Oscar. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria. Centro de Investigación en Ciencias Veterinarias y Agronómicas. Instituto de Biotecnología; Argentin

    An empirical study of neighbourhood decay in Kohonen\u27s self organizing map

    Full text link
    In this paper, empirical results are presented which suggest that size and rate of decay of region size plays a much more significant role in the learning, and especially the development, of topographic feature maps. Using these results as a basis, a scheme for decaying region size during SOM training is proposed. The proposed technique provides near optimal training time. This scheme avoids the need for sophisticated learning gain decay schemes, and precludes the need for a priori knowledge of likely training times. This scheme also has some potential uses for continuous learning

    Mining Dynamic Document Spaces with Massively Parallel Embedded Processors

    Get PDF
    Currently Océ investigates future document management services. One of these services is accessing dynamic document spaces, i.e. improving the access to document spaces which are frequently updated (like newsgroups). This process is rather computational intensive. This paper describes the research conducted on software development for massively parallel processors. A prototype has been built which processes streams of information from specified newsgroups and transforms them into personal information maps. Although this technology does speed up the training part compared to a general purpose processor implementation, however, its real benefits emerges with larger problem dimensions because of the scalable approach. It is recommended to improve on quality of the map as well as on visualisation and to better profile the performance of the other parts of the pipeline, i.e. feature extraction and visualisation

    Slip and Adhesion in a Railway Wheelset Simulink Model Proposed for Detection Driving Conditions Via Neural Networks

    Get PDF
    Constantly enlarging operation of locomotives with a very high tractive power in modern railway transport has caused problems with optimal supplying torque from motor to wheel-sets. Losses emerging with inadequate torque values lead to wheel slipping connected with excessive wear and limited acceleration. In models simulating dynamics of torque transmission from the drive units to wheels, the most important are the submodel of the drive and the submodel of balance between traction forces and drive resistances. Some issues of this field studied within a PhD program and SGS (CTU Students Grant Competition) has been focused on increasing quality of these submodels. This contribution is aimed at an innovated part in the existing Simulink model utilizing new data sources and modeling techniques. This improvement supports application of operating point detection methods based on machine learning techniques. New control facilities provided with pulse-width modulated frequency control of the asynchronous motor will be used for automatic submission of optimal operating points. The idea of utilization of via simulation obtained data is an on-line training of polynomial neural unit as an approximation of current driving conditions.Neustále narůstající provoz lokomotiv s velmi vysokým trakčním výkonem v moderní železniční dopravě způsobuje problémy s přenosem optimálního hnacího momentu z motoru na dvojkolí. Ztráty vyplývající z nevhodných hodnot točivého momentu vedou k prokluzu kol spojeným s nadměrným opotřebením a omezeným zrychlením. V modelech simulujících dynamiku přenosu točivého momentu z pohonné jednotky na dvojkolí jsou nejdůležitější submodely pohonu a rovnováhy mezi trakčními silami a jízdními odpory. Výzkum prováděný v rámci doktorských studijních programů a SGS (Studentská grantová soutěž ČVUT) se zaměřuje na zvyšování kvality těchto submodelů. Tento příspěvek je zaměřen na inovovanou část v existujícím Simulink modelu využívajícím nové zdroje dat a technik modelování. Nové možnosti regulace zajištěné pulzně-šířkovou frekvenční regulací asynchronního motoru budou použity pro automatické poskytnutí optimálních provozních bodů. Představa využití simulací získaných dat je on-line učení polynomické neuronové jednotky jako aproximace současných jízdních podmínek

    SOM-VAE: Interpretable Discrete Representation Learning on Time Series

    Full text link
    High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.Comment: Accepted for publication at the Seventh International Conference on Learning Representations (ICLR 2019

    Somoclu: An Efficient Parallel Library for Self-Organizing Maps

    Get PDF
    Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu
    corecore