20 research outputs found

    Development of effective gene selection algorithms for microarray data analysis

    Get PDF
    Issued as final reportNational Science Foundation (U.S.

    Discovering gene functional relationships using a literature-based NMF model

    Get PDF
    The rapid growth of the biomedical literature and genomic information presents a major challenge for determining the functional relationships among genes. Several bioinformatics tools have been developed to extract and identify gene relationships from various biological databases. However, an intuitive user-interface tool that allows the biologist to determine functional relationships among genes is still not available. In this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. We tested FAUN on three manually constructed gene document collections, and then used it to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation. The FAUN site is publicly available at http://grits.eecs.utk.edu/faun

    Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

    Get PDF
    Background Searching the enormous amount of information available in biomedical literature to extract novel functional relationships among genes remains a challenge in the field of bioinformatics. While numerous (software) tools have been developed to extract and identify gene relationships from biological databases, few effectively deal with extracting new (or implied) gene relationships, a process which is useful in interpretation of discovery-oriented genome-wide experiments. Results In this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is tested on three manually constructed gene document collections. Its utility and performance as a knowledge discovery tool is demonstrated using a set of genes associated with Autism. Conclusions FAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments

    Concept Based Knowledge Discovery from Biomedical Literature

    Get PDF
    Philosophiae Doctor - PhDThis thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology, resented can be integrated with the researchers own knowledge, experimentation and observations for optimal progression of scientific research.South Afric

    Modelo para descoberta de conhecimento baseado em associação semântica e temporal entre elementos textuais

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia e Gestão do Conhecimento, Florianópolis, 2016.O aumento da complexidade nas atividades organizacionais, a vertiginosa expansão da Internet e os avanços da sociedade do conhecimento são alguns dos responsáveis pelo volume inédito de dados digitais. Essa crescente massa de dados apresenta grande potencial para a análise de padrões e descoberta de conhecimento. Nesse sentido, a análise dos relacionamentos presentes nesse imenso volume de informações pode proporcionar novos e, possivelmente, inesperados insights. A presente pesquisa constatou a escassez de trabalhos que consideram adequadamente a semântica e a temporalidade dos relacionamentos entre elementos textuais, características consideradas importantes para a descoberta de conhecimento. Assim, este trabalho propõe um modelo para descoberta de conhecimento que conta com uma ontologia de alto-nível para a representação de relacionamentos e com a técnica Latent Semantic Indexing (LSI) para determinar a força de associação entre termos que não se relacionam diretamente. A representação do conhecimento de domínio, bem como, a determinação da força associativa entre os termos são realizadas levando em conta o tempo em que os relacionamentos ocorrem. A avaliação do modelo foi realizada a partir de dois tipos de experimentos: um que trata da classificação de documentos e outro que trata da associação semântica e temporal entre termos. Os resultados demonstram que o modelo: i) possui potencial para ser aplicado em tarefas intensivas em conhecimento, como a classificação e ii) é capaz de apresentar curvas da força associativa entre dois termos ao longo do tempo, contribuindo para o levantamento de hipóteses e, consequentemente, para a descoberta de conhecimento.Abstract : The increased complexity in organizational activities, the rapid expansion of the Internet and advances in the knowledge society are some of those responsible for the unprecedented volume of digital data. This growing body of data has great potential for pattern analysis and knowledge discovery. In this sense, the analysis of relationships present in this immense volume of information can provide new and possibly unexpected insights. This research found shortages of studies that adequately consider the semantics and the temporality of relationships between textual elements considered important features for knowledge discovery. This work proposes a model of knowledge discovery comprising a high-level ontology for the representation of relationships and the LSI technique to determine the strength of association between terms that do not relate directly. The representation of domain knowledge and the determination of the associative strength between the terms are made taking into account the time in which the relationships occur. The evaluation of the model was made from two types of experiments: one that deals with the classification of documents and another concerning semantics and temporal association between terms. The results show that the model: i) has the potential to be used as a text classifier and ii) is capable of displaying curves of associative force between two terms over time, contributing to the raising of hypotheses and therefore to discover of knowledge

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations"</p><p>http://www.biomedcentral.com/1471-2105/8/S9/S6</p><p>BMC Bioinformatics 2007;8(Suppl 9):S6-S6.</p><p>Published online 27 Nov 2007</p><p>PMCID:PMC2217664.</p><p></p>atrix built from genes directly associated with Reelin signaling pathway. The -th gene is located at (, , ), where ∈ ℝ= []. (A red circle: RELN; A red circle and blue diamonds: genes associated with the Reelin signaling pathway; Black dots: other genes

    Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations"</p><p>http://www.biomedcentral.com/1471-2105/8/S9/S6</p><p>BMC Bioinformatics 2007;8(Suppl 9):S6-S6.</p><p>Published online 27 Nov 2007</p><p>PMCID:PMC2217664.</p><p></p>atrix built from known genes associated with the Alzheimer's disease pathway. The -th gene is located at (, , ), where ∈ ℝ= []. (A red circle: APP; A red circle and blue diamonds: genes associated with the Alzheimer's disease pathway; Black dots: other genes
    corecore