1,371 research outputs found

    Translation of semantic aspects of OODINI graphical representation to ONTO OODB data definition language

    Get PDF
    In this thesis we present a system to translate the semantic elements in the graphical schema language of OODINI from API of OODAL to the Type definition of ONTOS DB. To translate semantic constraints of the graphical language, we patch more information to existent class data structure in API of OODAL. After a brief review of OODINI, ONTOS DB and the existent translator without the ability to translate semantic constraints, we describe in detail the methods to translate the essential relationship. dependent relationship, multi-valued essential relationship and multi-valued dependent relationship. We employ an Inverse Reference to a Set of Type to achieve the goal. Setof and Tupleof relationship are special cases of the above relationships. For validating the result of the translation, we give examples of translation of a schema containing each of the relationships discussed

    Exact Single-Source SimRank Computation on Large Graphs

    Full text link
    SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-kk SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than 10610^6 nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-kk SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

    Investigating the biological relevance in trained embedding representations of protein sequences

    Get PDF
    As genome sequencing is becoming faster and cheaper, an abundance of DNA and protein sequence data is available. However, experimental annotation of structural or functional information develops at a much slower pace. Therefore, machine learning techniques have been widely adopted to make accurate predictions on unseen sequence data. In recent years, deep learning has been gaining popularity, as it allows for effective end-to-end learning. One consideration for its application on sequence data is the choice for a suitable and effective sequence representation strategy. In this paper, we investigate the significance of three common encoding schemes on the multi-label prediction problem of Gene Ontology (GO) term annotation, namely a one-hot encoding, an ad-hoc trainable embedding, and pre-trained protein vectors, using different hyper-parameters. We found that traditional unigram one-hot encodings achieved very good results, only slightly outperformed by unigram ad-hoc trainable embeddings and bigram pre-trained embeddings (by at most 3%for the F maxscore), suggesting the exploration of different encoding strategies to be potentially beneficial. Most interestingly, when analyzing and visualizing the trained embeddings, we found that biologically relevant (dis)similarities between amino acid n-grams were implicitly learned, which were consistent with their physiochemical properties
    • …
    corecore