16 research outputs found
Kernel Functions for Graph Classification
Graphs are information-rich structures, but their complexity makes them difficult to analyze. Given their broad and powerful representation capacity, the classification of graphs has become an intense area of research. Many established classifiers represent objects with vectors of explicit features. When the number of features grows, however, these vector representations suffer from typical problems of high dimensionality such as overfitting and high computation time. This work instead focuses on using kernel functions to map graphs into implicity defined spaces that avoid the difficulties of vector representations. The introduction of kernel classifiers has kindled great interest in kernel functions for graph data. By using kernels the problem of graph classification changes from finding a good classifier to finding a good kernel function. This work explores several novel uses of kernel functions for graph classification. The first technique is the use of structure based features to add structural information to the kernel function. A strength of this approach is the ability to identify specific structure features that contribute significantly to the classification process. Discriminative structures can then be passed off to domain-specific researchers for additional analysis. The next approach is the use of wavelet functions to represent graph topology as simple real-valued features. This approach achieves order-of-magnitude decreases in kernel computation time by eliminating costly topological comparisons, while retaining competitive classification accuracy. Finally, this work examines the use of even simpler graph representations and their utility for classification. The models produced from the kernel functions presented here yield excellent performance with respect to both efficiency and accuracy, as demonstrated in a variety of experimental studies
Genome-wide Protein-chemical Interaction Prediction
The analysis of protein-chemical reactions on a large scale is critical to understanding the complex interrelated mechanisms that govern biological life at the cellular level. Chemical proteomics is a new research area aimed at genome-wide screening of such chemical-protein interactions. Traditional approaches to such screening involve in vivo or in vitro experimentation, which while becoming faster with the application of high-throughput screening technologies, remains costly and time-consuming compared to in silico methods. Early in silico methods are dependant on knowing 3D protein structures (docking) or knowing binding information for many chemicals (ligand-based approaches). Typical machine learning approaches follow a global classification approach where a single predictive model is trained for an entire data set, but such an approach is unlikely to generalize well to the protein-chemical interaction space considering its diversity and heterogeneous distribution. In response to the global approach, work on local models has recently emerged to improve generalization across the interaction space by training a series of independant models localized to each predict a single interaction. This work examines current approaches to genome-wide protein-chemical interaction prediction and explores new computational methods based on modifications to the boosting framework for ensemble learning. The methods are described and compared to several competing classification methods. Genome-wide chemical-protein interaction data sets are acquired from publicly available resources, and a series of experimental studies are performed in order to compare the the performance of each method under a variety of conditions
Macroeconomic Indicator Forecasting with Deep Neural Networks
Resumen de la comunicación[EN] Economic policymaking relies upon accurate forecasts of economic conditions. Current methods for unconditional forecasting are dominated by inherently linear models that exhibit model dependence and have high data demands. We explore deep neural networks as an opportunity to improve upon forecast accurac y with limited data and while remaining agnostic as to functional form. We focus on predicting civilian unemployment using models
based on four different neural network architectures. Each of these models outperforms benchmark models at short time horizons. One model, based on an Encoder Decoder architecture outperforms benchmark models at every forecast horizon (up to four quarters).Cook, T.; Smalter Hall, A. (2018). Macroeconomic Indicator Forecasting with Deep Neural Networks. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 261-261. https://doi.org/10.4995/CARMA2018.2018.8571OCS26126
GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully explored in the data mining community. In this paper, we demonstrate a novel technique called graph pattern diffusion (GPD) kernel. Our idea is to leverage existing frequent pattern discovery methods and to explore the application of kernel classifier (e.g., support vector machine) in building highly accurate graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the graph database and use a process we call “pattern diffusion” to label nodes in the graphs. Finally, we designed a graph alignment algorithm to compute the inner product of two graphs. We have tested our algorithm using a number of chemical structure data. The experimental results demonstrate that our method is significantly better than competing methods such as those kernel functions based on paths, cycles, and subgraphs
Production, purification, and characterization of recombinant hFSH glycoforms for functional studies
Previously, our laboratory demonstrated the existence of a β-subunit glycosylation-deficient human FSH glycoform, hFSH21. A third variant, hFSH18, has recently been detected in FSH glycoforms isolated from purified pituitary hLH preparations. Human FSH21 abundance in individual female pituitaries progressively decreased with increasing age. Hypo-glycosylated glycoform preparations are significantly more active than fully-glycosylated hFSH preparations. The purpose of this study was to produce, purify and chemically characterize both glycoform variants expressed by a mammalian cell line. Recombinant hFSH was expressed in a stable GH3 cell line and isolated from serum-free cell culture medium by sequential, hydrophobic and immunoaffinity chromatography. FSH glycoform fractions were separated by Superdex 75 gel-filtration. Western blot analysis revealed the presence of both hFSH18 and hFSH21 glycoforms in the low molecular weight fraction, however, their electrophoretic mobilities differed from those associated with the corresponding pituitary hFSH variants. Edman degradation of FSH21/18 -derived β-subunit before and after peptide-N-glycanase F digestion confirmed that it possessed a mixture of both mono-glycosylated FSHβ subunits, as both Asn7 and Asn24 were partially glycosylated. FSH receptor-binding assays confirmed our previous observations that hFSH21/18 exhibits greater receptor-binding affinity and occupies more FSH binding sites when compared to fully-glycosylated hFSH24. Thus, the age-related reduction in hypo-glycosylated hFSH significantly reduces circulating levels of FSH biological activity that may further compromise reproductive function. Taken together, the ability to express and isolate recombinant hFSH glycoforms opens the way to study functional differences between them both in vivo and in vitro
Application of kernel functions for accurate similarity search in large chemical databases
Background
Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases.
Results
To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep.
Conclusions
Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases
GPM: A Graph Pattern Matching Kernel with Diffusion for Chemical Compound Classification
Abstract — Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge. In this paper, we demonstrate a novel technique called G raph P attern M atching kernel (GPM). Our idea is to leverage existing frequent pattern discovery methods and explore their application to kernel classifiers (e.g. support vector machine) for graph classification. In our method, we first identify all frequent patterns from a graph database. We then map subgraphs to graphs in the database and use a diffusion process to label nodes in the graphs. Finally the kernel is computed using a set matching algorithm. We performed experiments on 16 chemical structure data sets and have compared our methods to other major graph kernels. The experimental results demonstrate excellent performance of our method. I