31,644 research outputs found

    Partial mixture model for tight clustering of gene expression time-course

    Get PDF
    Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

    Using a Machine Learning Approach to Implement and Evaluate Product Line Features

    Get PDF
    Bike-sharing systems are a means of smart transportation in urban environments with the benefit of a positive impact on urban mobility. In this paper we are interested in studying and modeling the behavior of features that permit the end user to access, with her/his web browser, the status of the Bike-Sharing system. In particular, we address features able to make a prediction on the system state. We propose to use a machine learning approach to analyze usage patterns and learn computational models of such features from logs of system usage. On the one hand, machine learning methodologies provide a powerful and general means to implement a wide choice of predictive features. On the other hand, trained machine learning models are provided with a measure of predictive performance that can be used as a metric to assess the cost-performance trade-off of the feature. This provides a principled way to assess the runtime behavior of different components before putting them into operation.Comment: In Proceedings WWV 2015, arXiv:1508.0338

    A foundation for machine learning in design

    Get PDF
    This paper presents a formalism for considering the issues of learning in design. A foundation for machine learning in design (MLinD) is defined so as to provide answers to basic questions on learning in design, such as, "What types of knowledge can be learnt?", "How does learning occur?", and "When does learning occur?". Five main elements of MLinD are presented as the input knowledge, knowledge transformers, output knowledge, goals/reasons for learning, and learning triggers. Using this foundation, published systems in MLinD were reviewed. The systematic review presents a basis for validating the presented foundation. The paper concludes that there is considerable work to be carried out in order to fully formalize the foundation of MLinD

    Forecasting Long-Term Government Bond Yields: An Application of Statistical and AI Models

    Get PDF
    This paper evaluates several artificial intelligence and classical algorithms on their ability of forecasting the monthly yield of the US 10-year Treasury bonds from a set of four economic indicators. Due to the complexity of the prediction problem, the task represents a challenging test for the algorithms under evaluation. At the same time, the study is of particular significance for the important and paradigmatic role played by the US market in the world economy. Four data-driven artificial intelligence approaches are considered, namely, a manually built fuzzy logic model, a machine learned fuzzy logic model, a self-organising map model and a multi-layer perceptron model. Their performance is compared with the performance of two classical approaches, namely, a statistical ARIMA model and an econometric error correction model. The algorithms are evaluated on a complete series of end-month US 10-year Treasury bonds yields and economic indicators from 1986:1 to 2004:12. In terms of prediction accuracy and reliability of the modelling procedure, the best results are obtained by the three parametric regression algorithms, namely the econometric, the statistical and the multi-layer perceptron model. Due to the sparseness of the learning data samples, the manual and the automatic fuzzy logic approaches fail to follow with adequate precision the range of variations of the US 10-year Treasury bonds. For similar reasons, the self-organising map model gives an unsatisfactory performance. Analysis of the results indicates that the econometric model has a slight edge over the statistical and the multi-layer perceptron models. This suggests that pure data-driven induction may not fully capture the complicated mechanisms ruling the changes in interest rates. Overall, the prediction accuracy of the best models is only marginally better than the prediction accuracy of a basic one-step lag predictor. This result highlights the difficulty of the modelling task and, in general, the difficulty of building reliable predictors for financial markets.interest rates; forecasting; neural networks; fuzzy logic.

    Methodology for automatic recovering of 3D partitions from unstitched faces of non-manifold CAD models

    Get PDF
    Data exchanges between different software are currently used in industry to speed up the preparation of digital prototypes for Finite Element Analysis (FEA). Unfortunately, due to data loss, the yield of the transfer of manifold models rarely reaches 1. In the case of non-manifold models, the transfer results are even less satisfactory. This is particularly true for partitioned 3D models: during the data transfer based on the well-known exchange formats, all 3D partitions are generally lost. Partitions are mainly used for preparing mesh models required for advanced FEA: mapped meshing, material separation, definition of specific boundary conditions, etc. This paper sets up a methodology to automatically recover 3D partitions from exported non-manifold CAD models in order to increase the yield of the data exchange. Our fully automatic approach is based on three steps. First, starting from a set of potentially disconnected faces, the CAD model is stitched. Then, the shells used to create the 3D partitions are recovered using an iterative propagation strategy which starts from the so-called manifold vertices. Finally, using the identified closed shells, the 3D partitions can be reconstructed. The proposed methodology has been validated on academic as well as industrial examples.This work has been carried out under a research contract between the Research and Development Direction of the EDF Group and the Arts et MĂ©tiers ParisTech Aix-en-Provence

    Functional Testing of Feature Model Analysis Tools: a Test Suite

    Get PDF
    A Feature Model (FM) is a compact representation of all the products of a software product line. Automated analysis of FMs is rapidly gaining importance: new operations of analysis have been proposed, new tools have been developed to support those operations and different logical paradigms and algorithms have been proposed to perform them. Implementing operations is a complex task that easily leads to errors in analysis solutions. In this context, the lack of specific testing mechanisms is becoming a major obstacle hindering the development of tools and affecting their quality and reliability. In this article, we present FaMa Test Suite, a set of implementation–independent test cases to validate the functionality of FM analysis tools. This is an efficient and handy mechanism to assist in the development of tools, detecting faults and improving their quality. In order to show the effectiveness of our proposal, we evaluated the suite using mutation testing as well as real faults and tools. Our results are promising and directly applicable in the testing of analysis solutions. We intend this work to be a first step toward the development of a widely accepted test suite to support functional testing in the community of automated analysis of feature models.CICYT TIN2009-07366CICYT TIN2006-00472Junta de Andalucía TIC-253

    Kernel discriminant analysis and clustering with parsimonious Gaussian process models

    Full text link
    This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case. Experimental results on various data sets demonstrate the effectiveness of the proposed method
    • 

    corecore