195,205 research outputs found

    A classification of plane and planar 2-trees

    Get PDF
    We present new functional equations for the species of plane and of planar (in the sense of Harary and Palmer, 1973) 2-trees and some associated pointed species. We then deduce the explicit molecular expansion of these species, i.e a classification of their structures according to their stabilizers. There result explicit formulas in terms of Catalan numbers for their associated generating series, including the asymmetry index series. This work is closely related to the enumeration of polyene hydrocarbons of molecular formula C_nH_n+2.Comment: 26 pages, 14 figure

    Functional Data Representation with Merge Trees

    Full text link
    In this paper we face the problem of representation of functional data with the tools of algebraic topology. We represent functions by means of merge trees and this representation is compared with that offered by persistence diagrams. We show that these two tree structures, although not equivalent, are both invariant under homeomorphic re-parametrizations of the functions they represent, thus allowing for a statistical analysis which is indifferent to functional misalignment. We employ a novel metric for merge trees and we prove a few theoretical results related to its specific implementation when merge trees represent functions. To showcase the good properties of our topological approach to functional data analysis, we first go through a few examples using data generated {\em in silico} and employed to illustrate and compare the different representations provided by merge trees and persistence diagrams, and then we test it on the Aneurisk65 dataset replicating, from our different perspective, the supervised classification analysis which contributed to make this dataset a benchmark for methods dealing with misaligned functional data

    Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics

    Get PDF
    This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast

    Pooling random forest and functional data analysis for biomedical signals supervised classification: theory and application to electrocardiogram data

    Get PDF
    Scientific progress has contributed to creating many devices to gather vast amounts of biomedical data over time. The goal of these devices is generally to monitor people's health conditions, diagnose, and prevent patients' diseases, for example, to discover cardiovascular disorders or predict epileptic seizures. A common way of investigating these data is classification, but these instruments generate signals often characterized by high dimensionality. Learning from these data is definitely a challenging task due to many issues, for example, the trade-off between complexity and accuracy and the course of dimensionality. This study proposes a supervised classification method based on the joint use of functional data analysis, classification trees, and random forest to deal with massive biomedical data recorded over time. For this purpose, this research suggests different original tools to extract features and train functional classifiers, interpret the classification rules, assess leaves' quality and composition, avoid the classical drawbacks due to the COD, and improve the accuracy of the functional classifiers. Focusing on ECG data as a possible example, the final purpose of this study is to offer an original approach to identify and classify patients at risk using different types of biomedical signals. The results confirm that this line of research is exciting; indeed, the interpretative tools show evidence to be very useful for understanding classification rules. Furthermore, the performance of the proposed functional classifier, in terms of accuracy, is excellent because the latter breaks the previous classification record regarding a well-known ECG dataset

    Bayesian Deep Net GLM and GLMM

    Full text link
    Deep feedforward neural networks (DFNNs) are a powerful tool for functional approximation. We describe flexible versions of generalized linear and generalized linear mixed models incorporating basis functions formed by a DFNN. The consideration of neural networks with random effects is not widely used in the literature, perhaps because of the computational challenges of incorporating subject specific parameters into already complex models. Efficient computational methods for high-dimensional Bayesian inference are developed using Gaussian variational approximation, with a parsimonious but flexible factor parametrization of the covariance matrix. We implement natural gradient methods for the optimization, exploiting the factor structure of the variational covariance matrix in computation of the natural gradient. Our flexible DFNN models and Bayesian inference approach lead to a regression and classification method that has a high prediction accuracy, and is able to quantify the prediction uncertainty in a principled and convenient way. We also describe how to perform variable selection in our deep learning method. The proposed methods are illustrated in a wide range of simulated and real-data examples, and the results compare favourably to a state of the art flexible regression and classification method in the statistical literature, the Bayesian additive regression trees (BART) method. User-friendly software packages in Matlab, R and Python implementing the proposed methods are available at https://github.com/VBayesLabComment: 35 pages, 7 figure, 10 table

    Towards precise classification of cancers based on robust gene functional expression profiles

    Get PDF
    BACKGROUND: Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level. RESULTS: Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles. CONCLUSION: This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level

    Automatic Response Assessment in Regions of Language Cortex in Epilepsy Patients Using ECoG-based Functional Mapping and Machine Learning

    Full text link
    Accurate localization of brain regions responsible for language and cognitive functions in Epilepsy patients should be carefully determined prior to surgery. Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been shown to be a safer alternative to the electrical cortical stimulation mapping (ESM), which is currently the clinical/gold standard. Conventional methods for analyzing RTFM signals are based on statistical comparison of signal power at certain frequency bands. Compared to gold standard (ESM), they have limited accuracies when assessing channel responses. In this study, we address the accuracy limitation of the current RTFM signal estimation methods by analyzing the full frequency spectrum of the signal and replacing signal power estimation methods with machine learning algorithms, specifically random forest (RF), as a proof of concept. We train RF with power spectral density of the time-series RTFM signal in supervised learning framework where ground truth labels are obtained from the ESM. Results obtained from RTFM of six adult patients in a strictly controlled experimental setup reveal the state of the art detection accuracy of 78%\approx 78\% for the language comprehension task, an improvement of 23%23\% over the conventional RTFM estimation method. To the best of our knowledge, this is the first study exploring the use of machine learning approaches for determining RTFM signal characteristics, and using the whole-frequency band for better region localization. Our results demonstrate the feasibility of machine learning based RTFM signal analysis method over the full spectrum to be a clinical routine in the near future.Comment: This paper will appear in the Proceedings of IEEE International Conference on Systems, Man and Cybernetics (SMC) 201

    Automatic test cases generation from software specifications modules

    Get PDF
    A new technique is proposed in this paper to extend the Integrated Classification Tree Methodology (ICTM) developed by Chen et al. [13] This software assists testers to construct test cases from functional specifications. A Unified Modelling Language (UML) class diagram and Object Constraint Language (OCL) are used in this paper to represent the software specifications. Each classification and associated class in the software specification is represented by classes and attributes in the class diagram. Software specification relationships are represented by associated and hierarchical relationships in the class diagram. To ensure that relationships are consistent, an automatic methodology is proposed to capture and control the class relationships in a systematic way. This can help to reduce duplication and illegitimate test cases, which improves the testing efficiency and minimises the time and cost of the testing. The methodology introduced in this paper extracts only the legitimate test cases, by removing the duplicate test cases and those incomputable with the software specifications. Large amounts of time would have been needed to execute all of the test cases; therefore, a methodology was proposed which aimed to select a best testing path. This path guarantees the highest coverage of system units and avoids using all generated test cases. This path reduces the time and cost of the testing
    corecore