195,205 research outputs found
A classification of plane and planar 2-trees
We present new functional equations for the species of plane and of planar
(in the sense of Harary and Palmer, 1973) 2-trees and some associated pointed
species. We then deduce the explicit molecular expansion of these species, i.e
a classification of their structures according to their stabilizers. There
result explicit formulas in terms of Catalan numbers for their associated
generating series, including the asymmetry index series. This work is closely
related to the enumeration of polyene hydrocarbons of molecular formula
C_nH_n+2.Comment: 26 pages, 14 figure
Functional Data Representation with Merge Trees
In this paper we face the problem of representation of functional data with
the tools of algebraic topology. We represent functions by means of merge trees
and this representation is compared with that offered by persistence diagrams.
We show that these two tree structures, although not equivalent, are both
invariant under homeomorphic re-parametrizations of the functions they
represent, thus allowing for a statistical analysis which is indifferent to
functional misalignment. We employ a novel metric for merge trees and we prove
a few theoretical results related to its specific implementation when merge
trees represent functions. To showcase the good properties of our topological
approach to functional data analysis, we first go through a few examples using
data generated {\em in silico} and employed to illustrate and compare the
different representations provided by merge trees and persistence diagrams, and
then we test it on the Aneurisk65 dataset replicating, from our different
perspective, the supervised classification analysis which contributed to make
this dataset a benchmark for methods dealing with misaligned functional data
Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics
This paper investigates how predictive clustering trees can be used to predict gene function in the genome of the yeast Saccharomyces cerevisiae. We consider the MIPS FunCat classification scheme, in which each gene is annotated with one or more classes selected from a given functional class hierarchy. This setting presents two important challenges to machine learning: (1) each instance is labeled with a set of classes instead of just one class, and (2) the classes are structured in a hierarchy; ideally the learning algorithm should also take this hierarchical information into account. Predictive clustering trees generalize decision trees and can be applied to a wide range of prediction tasks by plugging in a suitable distance metric. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast
Pooling random forest and functional data analysis for biomedical signals supervised classification: theory and application to electrocardiogram data
Scientific progress has contributed to creating many devices to gather vast amounts of biomedical data over time. The goal of these devices is generally to monitor people's health conditions, diagnose, and prevent patients' diseases, for example, to discover cardiovascular disorders or predict epileptic seizures. A common way of investigating these data is classification, but these instruments generate signals often characterized by high dimensionality. Learning from these data is definitely a challenging task due to many issues, for example, the trade-off between complexity and accuracy and the course of dimensionality. This study proposes a supervised classification method based on the joint use of functional data analysis, classification trees, and random forest to deal with massive biomedical data recorded over time. For this purpose, this research suggests different original tools to extract features and train functional classifiers, interpret the classification rules, assess leaves' quality and composition, avoid the classical drawbacks due to the COD, and improve the accuracy of the functional classifiers. Focusing on ECG data as a possible example, the final purpose of this study is to offer an original approach to identify and classify patients at risk using different types of biomedical signals. The results confirm that this line of research is exciting; indeed, the interpretative tools show evidence to be very useful for understanding classification rules. Furthermore, the performance of the proposed functional classifier, in terms of accuracy, is excellent because the latter breaks the previous classification record regarding a well-known ECG dataset
Bayesian Deep Net GLM and GLMM
Deep feedforward neural networks (DFNNs) are a powerful tool for functional
approximation. We describe flexible versions of generalized linear and
generalized linear mixed models incorporating basis functions formed by a DFNN.
The consideration of neural networks with random effects is not widely used in
the literature, perhaps because of the computational challenges of
incorporating subject specific parameters into already complex models.
Efficient computational methods for high-dimensional Bayesian inference are
developed using Gaussian variational approximation, with a parsimonious but
flexible factor parametrization of the covariance matrix. We implement natural
gradient methods for the optimization, exploiting the factor structure of the
variational covariance matrix in computation of the natural gradient. Our
flexible DFNN models and Bayesian inference approach lead to a regression and
classification method that has a high prediction accuracy, and is able to
quantify the prediction uncertainty in a principled and convenient way. We also
describe how to perform variable selection in our deep learning method. The
proposed methods are illustrated in a wide range of simulated and real-data
examples, and the results compare favourably to a state of the art flexible
regression and classification method in the statistical literature, the
Bayesian additive regression trees (BART) method. User-friendly software
packages in Matlab, R and Python implementing the proposed methods are
available at https://github.com/VBayesLabComment: 35 pages, 7 figure, 10 table
Towards precise classification of cancers based on robust gene functional expression profiles
BACKGROUND: Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level. RESULTS: Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles. CONCLUSION: This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level
Automatic Response Assessment in Regions of Language Cortex in Epilepsy Patients Using ECoG-based Functional Mapping and Machine Learning
Accurate localization of brain regions responsible for language and cognitive
functions in Epilepsy patients should be carefully determined prior to surgery.
Electrocorticography (ECoG)-based Real Time Functional Mapping (RTFM) has been
shown to be a safer alternative to the electrical cortical stimulation mapping
(ESM), which is currently the clinical/gold standard. Conventional methods for
analyzing RTFM signals are based on statistical comparison of signal power at
certain frequency bands. Compared to gold standard (ESM), they have limited
accuracies when assessing channel responses.
In this study, we address the accuracy limitation of the current RTFM signal
estimation methods by analyzing the full frequency spectrum of the signal and
replacing signal power estimation methods with machine learning algorithms,
specifically random forest (RF), as a proof of concept. We train RF with power
spectral density of the time-series RTFM signal in supervised learning
framework where ground truth labels are obtained from the ESM. Results obtained
from RTFM of six adult patients in a strictly controlled experimental setup
reveal the state of the art detection accuracy of for the
language comprehension task, an improvement of over the conventional
RTFM estimation method. To the best of our knowledge, this is the first study
exploring the use of machine learning approaches for determining RTFM signal
characteristics, and using the whole-frequency band for better region
localization. Our results demonstrate the feasibility of machine learning based
RTFM signal analysis method over the full spectrum to be a clinical routine in
the near future.Comment: This paper will appear in the Proceedings of IEEE International
Conference on Systems, Man and Cybernetics (SMC) 201
Automatic test cases generation from software specifications modules
A new technique is proposed in this paper to extend the Integrated Classification Tree Methodology (ICTM) developed by Chen et al. [13] This software assists testers to construct test cases from functional specifications. A Unified Modelling Language (UML) class diagram and Object Constraint Language (OCL) are used in this paper to represent the software specifications. Each classification and associated class in the software specification is represented by classes and attributes in the class diagram. Software specification relationships are represented by associated and hierarchical relationships in the class diagram. To ensure that relationships are consistent, an automatic methodology is proposed to capture and control the class relationships in a systematic way. This can help to reduce duplication and illegitimate test cases, which improves the testing efficiency and minimises the time and cost of the testing. The methodology introduced in this paper extracts only the legitimate test cases, by removing the duplicate test cases and those incomputable with the software specifications. Large amounts of time would have been needed to execute all of the test cases; therefore, a methodology was proposed which aimed to select a best testing path. This path guarantees the highest coverage of system units and avoids using all generated test cases. This path reduces the time and cost of the testing
- …