7 research outputs found

    Extreme Data Mining: Inference from Small Datasets

    Get PDF
    Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets. We have the following goals: i. To discuss the meaning of small in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application

    Binding Affinity and Specificity of SH2 Domain Interactions in Receptor Tyrosine Kinase Signaling Networks

    Get PDF
    Receptor tyrosine kinase (RTK) signaling mechanisms play a central role in intracellular signaling and control development of multicellular organisms, cell growth, cell migration, and programmed cell death. Dysregulation of these signaling mechanisms results in defects of development and diseases such as cancer. Control of this network relies on the specificity and selectivity of Src Homology 2 (SH2) domain interactions with phosphorylated target peptides. In this work, we review and identify the limitations of current quantitative understanding of SH2 domain interactions, and identify severe limitations in accuracy and availability of SH2 domain interaction data. We propose a framework to address some of these limitations and present new results which improve the quality and accuracy of currently available data. Furthermore, we supplement published results with a large body of negative interactions of high-confidence extracted from rejected data, allowing for improved modeling and prediction of SH2 interactions. We present and analyze new experimental results for the dynamic response of downstream signaling proteins in response to RTK signaling. Our data identify differences in downstream response depending on the character and dose of the receptor stimulus, which has implications for previous studies using high-dose stimulation. We review some of the methods used in this work, focusing on pitfalls of clustering biological data, and address the high-dimensional nature of biological data from high-throughput experiments, the failure to consider more than one clustering method for a given problem, and the difficulty in determining whether clustering has produced meaningful results

    In Silico Methodologies for Selection and Prioritization of Compounds in Drug Discovery

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Molecular analysis of the Oryzias latipes (Medaka) transcriptome.

    Get PDF
    Based on oligonucleotide fingerprinting (OFP) analysis and subsequent EST production a non-redundant set of 10,016 medaka cDNA clones was established from three different embryonic stages (gastrula, neurula and organogenesis) and one adult tissue (ovary) as a resource of high value for further research on the medaka transcriptome. In a first round 26,880 medaka gastrula clones were subjected to OFP cluster analysis and representatives of each cluster or clones left as singletons were chosen for producing ESTs. In total 7680 cDNA clones were sequenced and 6909 high-quality 5'reads were obtained. The advantage of OFP lies not only in the normalisation but it is also possible to get insight into differential expression by subjecting cDNA libraries of different developmental stages or tissues to fingerprinting analysis. Therefore in a second round in addition to the gastrula clones, cDNA inserts from libraries of the ovary tissue and neurula and organogenesis stages were included. From this approach another 11,468 high-quality 5'ESTs were produced. All EST sequence data was published in GenBank EST database with the accession numbers from AM137442 to AM156757. The 18,377 high-quality sequences obtained were, by EST clustering, grouped into 3268 clusters and 7274 singletons providing us with 10542 unique sequences. Further clustering reduced this set to 10,016 unique sequences. High-quality EST clusters and singletons were annotated. To 8155 of these sequences functions were assigned, with many sequences showing similarity to proteins with important functions, e.g. in development. EST data which showed no similarity to any other known proteins includes by a large amount valuable and high-quality sequence information and must therefore be seen as new Medaka sequence data, either protein-coding or non-coding
    corecore