6 research outputs found

    Structural pattern detection and domain recognition for protein function prediction

    Get PDF
    Proteins are essential players of the cell that control and affect all functions. In proteins, structural patterns consist of a few amino acids which assemble in a specific arrangement. Due to their specific structures, they are recognized as the functionally important sites of the proteins, and conserved even in distantly related proteins. Moreover, several structural patterns merge and form domains which are also associated with the proteins function. In this work, we introduced a method for finding structure patterns common to a protein pair by using graphlet mappings. We presented protein structures with graphs, and then generate graphlets. Local alignments are produced by mapping the generated graphlets from protein pairs. Moreover, by merging these local alignments, we tried to recognize functionally important domains. These common domains are very useful in protein function prediction, fold classification and homology relationship detection. In this work, our algorithm was first applied to fold classification problem and 80% accuracy was observed. Furthermore, our algorithm was also used for protein function prediction and 97% accuracy was observed

    EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts

    Get PDF
    BACKGROUND: A better understanding of the mechanisms of an enzyme's functionality and stability, as well as knowledge and impact of mutations is crucial for researchers working with enzymes. Though, several of the enzymes' databases are currently available, scientific literature still remains at large for up-to-date source of learning the effects of a mutation on an enzyme. However, going through vast amounts of scientific documents to extract the information on desired mutation has always been a time consuming process. In this paper, therefore, we describe an unique method, termed as EnzyMiner, which automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability and/or the activity of a given enzyme. RESULTS: We present an automated system which identifies the abstracts that contain an amino-acid-level mutation and then classifies them according to the mutation's effect on the enzyme. In the case of mutation identification, MuGeX, an automated mutation-gene extraction system has an accuracy of 93.1% with a 91.5 F-measure. For impact analysis, document classification is performed to identify the abstracts that contain a change in enzyme's stability or activity resulting from the mutation. The system was trained on lipases and tested on amylases with an accuracy of 85%. CONCLUSION: EnzyMiner identifies the abstracts that contain a protein mutation for a given enzyme and checks whether the abstract is related to a disease with the help of information extraction and machine learning techniques. For disease related abstracts, the mutation list and direct links to the abstracts are retrieved from the system and displayed on the Web. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality followed by displaying these on the web

    Event clustering within news articles

    No full text
    This paper summarizes our group’s efforts in the event sentence coreference identification shared task, which is organized as part of the Automated Extraction of Socio-Political Events from News (AESPEN) Workshop. Our main approach consists of three steps. We initially use a transformer based model to predict whether a pair of sentences refer to the same event or not. Later, we use these predictions as the initial scores and recalculate the pair scores by considering the relation of sentences in a pair with respect to other sentences. As the last step, final scores between these sentences are used to construct the clusters, starting with the pairs with the highest scores. Our proposed approach outperforms the baseline approach across all evaluation metrics

    Evolutionary selection of minimum number of features for classification of gene expression data using genetic algorithms

    No full text
    Selecting the most relevant factors from genetic profiles that can optimally characterize cellular states is of crucial importance in identifying complex disease genes and biomarkers for disease diagnosis and assessing drug efficiency. In this paper, we present an approach using a genetic algorithm for a feature subset selection problem that can be used in selecting the near optimum set of genes for classification of cancer data. In substantial improvement over existing methods, we classified cancer data with high accuracy with less features
    corecore