8 research outputs found

    On behavior of signs for the heat equation and a diffusion method for data separation

    Get PDF
    Consider the solution u(x; t) of the heat equation with initial data u0. The diffusive sign SD[u0](x) is de ned by the limit of sign of u(x; t) as t ! 0. A sufficient condition for x 2 Rd and u0 such that SD[u0](x) is well-de ned is given. A few examples of u0 violating and ful lling this condition are given. It turns out that this diffusive sign is also related to variational problem whose energy is the Dirichlet energy with a delty term. If initial data is a difference of characteristic function of two disjoint sets, it turns out that the boundary of the set SD[u0](x) = 1 (or 1) is roughly an equi-distance hypersurface from A and B and this gives a separation of two data sets

    A simplified approach to disulfide connectivity prediction from protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity.</p> <p>Results</p> <p>We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors.</p> <p>Conclusion</p> <p>We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.</p

    Ensembles based on Random Projection for gene expression data analysis

    Get PDF
    In this work we focused on methods to solve classification problems characterized by high dimensionality and low cardinality data. These features are relevant in bio-molecular data analysis and particularly in class prediction whith microarray data. Many methods have been proposed to approach this problem, characterized by the so called curse of dimensionality (term introduced by Richard Bellman (9)). Among them, gene selection methods, principal and independent component analysis, kernel methods. In this work we propose and we experimentally analyze two ensemble methods based on two randomized techniques for data compression: Random Subspaces and Random Projections. While Random Subspaces, originally proposed by T. K. Ho, is a technique related to feature subsampling, Random Projections is a feature extraction technique motivated by the Johnson-Lindenstrauss theory about distance preserving random projections. The randomness underlying the proposed approach leads to diverse sets of extracted features corresponding to low dimensional subspaces with low metric distortion and approximate preservation of the expected loss of the trained base classifiers. In the first part of the work we justify our approach with two theoretical results. The first regards unsupervised learning: we prove that a clustering algorithm minimizing the objective (quadratic) function provides a -closed solution if applied to compressed data according to Johnson-Lindenstrauss theory. The second one is related to supervised learning: we prove that Polynomials kernels are approximatively preserved by Random Projections, up to a degradation proportional to the square of the degree of the polynomial. In the second part of the work, we propose ensemble algorithms based on Random Subspaces and Random Projections, and we experimentally compare them with single SVM and other state-of-the-art ensemble methods, using three gene expression data set: Colon, Leukemia and DLBL-FL - i.e. Diffuse Large B-cell and Follicular Lymphoma. The obtained results confirm the effectiveness of the proposed approach. Moreover, we observed a certain performance degradation of Random Projection methods when the base learners are SVMs with polynomial kernel of high degree

    Text Mining Biomedical Literature for Genomic Knowledge Discovery

    Get PDF
    The last decade has been marked by unprecedented growth in both the production of biomedical data and the amount of published literature discussing it. Almost every known or postulated piece of information pertaining to genes, proteins, and their role in biological processes is reported somewhere in the vast amount of published biomedical literature. We believe the ability to rapidly survey and analyze this literature and extract pertinent information constitutes a necessary step toward both the design and the interpretation of any large-scale experiment. Moreover, automated literature mining offers a yet untapped opportunity to integrate many fragments of information gathered by researchers from multiple fields of expertise into a complete picture exposing the interrelated roles of various genes, proteins, and chemical reactions in cells and organisms. In this thesis, we show that functional keywords in biomedical literature, particularly Medline, represent very valuable information and can be used to discover new genomic knowledge. To validate our claim we present an investigation into text mining biomedical literature to assist microarray data analysis, yeast gene function classification, and biomedical literature categorization. We conduct following studies: 1. We test sets of genes to discover common functional keywords among them and use these keywords to cluster them into groups; 2. We show that it is possible to link genes to diseases by an expert human interpretation of the functional keywords for the genes- none of these diseases are as yet mentioned in public databases; 3. By clustering genes based on commonality of functional keywords it is possible to group genes into meaningful clusters that reveal more information about their functions, link to diseases and roles in metabolism pathways; 4. Using extracted functional keywords, we are able to demonstrate that for yeast genes, we can make a better functional grouping of genes in comparison to available public microarray and phylogenetic databases; 5. We show an application of our approach to literature classification. Using functional keywords as features, we are able to extract epidemiological abstracts automatically from Medline with higher sensitivity and accuracy than a human expert.Ph.D.Committee Chair: Shamkant B. Navathe; Committee Co-Chair: Brian J. Ciliax; Committee Member: Ashwin Ram; Committee Member: Edward Omiecinski; Committee Member: Ray Dingledine; Committee Member: Venu Dasig

    Different miRNA signatures of oral and pharyngeal squamous cell carcinomas: a prospective translational study

    Get PDF
    BACKGROUND: MicroRNAs (miRNAs) are small non-coding RNAs, which regulate mRNA translation/decay, and may serve as biomarkers. We characterised the expression of miRNAs in clinically sampled oral and pharyngeal squamous cell carcinoma (OSCC and PSCC) and described the influence of human papilloma virus (HPV). METHODS: Biopsies obtained from 51 patients with OSCC/PSCC and 40 control patients were used for microarray analysis. The results were correlated to clinical data and HPV status. Supervised learning by support vector machines was employed to generate a diagnostic miRNA signature. RESULTS: One hundred and fourteen miRNAs were differentially expressed between OSCC and normal oral epithelium, with the downregulation of miR-375 and upregulation of miR-31 as the most significant aberrations. Pharyngeal squamous cell carcinoma exhibited 38 differentially expressed miRNAs compared with normal pharyngeal epithelium. Differences in the miRNA expression pattern of both normal epithelium and SCC were observed between the oral cavity compared with the pharynx. Human papilloma virus infection revealed perturbations of 21 miRNAs, most significantly in miR-127-3p and miR363. A molecular classifier including 61 miRNAs was generated for OSCC with an accuracy of 93%. CONCLUSION: MicroRNAs may serve as useful biomarkers in OSCC and PSCC. The influence of HPV on miRNA may provide a mechanism for the distinct clinical behaviour of HPV-infected tumours
    corecore