3 research outputs found

    Word-Graph Construction Techniques for Context Analysis

    Get PDF
    A Nomo-Word Graph Construction Analysis Method (NWGC-AM) is used to graph let the corresponding construction phrases into essential and non-essential citation groups. NMCS-NR, or Nomo Maximum Common Sub-graph edge resemblance, Maximum Common Subgraph Directed Edge resemblance (MCS-DER), and Maximum Common Subgraph Resemblance. The graph resemblance metrics used in this work are called Undirected Edges Resemblance (MCS-UER). The tests included five distinct classifiers: Random Forest, Naive Bayes, K-Nearest Neighbors (KNN), Decision Trees, and Support Vector Machines (SVM).Four sixty one (361) citations made up the annotated dataset used for the studies.  The Decision Tree classifier exhibits superior performance, attaining an accuracy rate of 0.98

    Implementing Semantic Document Search Using a Bounded Random Walk in a Probabilistic Graph

    Get PDF
    Given a set of documents and an input query that is expressed using natural language, the problem of document search is retrieving all relevant documents ordered by the degree of relevance. Semantic document search fetches not only documents that contain words from the input query, but also documents that are semantically relevant. For example, the query friendly pets will consider documents that contain the words dog and cat , among others. One way to implement semantic search is to use a probabilistic graph in which the input query is connected to the documents through paths that contain semantically similar words and phrases, where we use WordNet to initially populate the graph. Each edge in the graph is labeled with the conditional probability that the destination node is relevant given that the source node is relevant. Our semantic document search algorithm works in two phases. In the first phase, we find all documents in the graph that are close to the input query and create a bounded subgraph that includes the query, the found documents, and the paths that connect them. In the second phase, we simulate multiple random walks. Each random walk starts at the input query and continues until a document is reached, a jump outside the bounding subgraph is made, or the number of allowed jumps is exhausted. This allows us to rank the documents based on the number of random walks that terminated in them. We experimentally validated the algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. We show that we achieve higher value for the mean average precision (MAP) measure than a keywords-based search algorithm and a previously published algorithm that relies on a variation of the probabilistic graph
    corecore