28 research outputs found

    A hybrid similarity measure method for patent portfolio analysis

    Full text link
    © 2016 Elsevier Ltd Similarity measures are fundamental tools for identifying relationships within or across patent portfolios. Many bibliometric indicators are used to determine similarity measures; for example, bibliographic coupling, citation and co-citation, and co-word distribution. This paper aims to construct a hybrid similarity measure method based on multiple indicators to analyze patent portfolios. Two models are proposed: categorical similarity and semantic similarity. The categorical similarity model emphasizes international patent classifications (IPCs), while the semantic similarity model emphasizes textual elements. We introduce fuzzy set routines to translate the rough technical (sub-) categories of IPCs into defined numeric values, and we calculate the categorical similarities between patent portfolios using membership grade vectors. In parallel, we identify and highlight core terms in a 3-level tree structure and compute the semantic similarities by comparing the tree-based structures. A weighting model is designed to consider: 1) the bias that exists between the categorical and semantic similarities, and 2) the weighting or integrating strategy for a hybrid method. A case study to measure the technological similarities between selected firms in China's medical device industry is used to demonstrate the reliability our method, and the results indicate the practical meaning of our method in a broad range of informetric applications

    Detecting and predicting the topic change of Knowledge-based Systems: A topic-based bibliometric analysis from 1991 to 2016

    Full text link
    © 2017 The journal Knowledge-based Systems (KnoSys) has been published for over 25 years, during which time its main foci have been extended to a broad range of studies in computer science and artificial intelligence. Answering the questions: “What is the KnoSys community interested in?” and “How does such interest change over time?” are important to both the editorial board and audience of KnoSys. This paper conducts a topic-based bibliometric study to detect and predict the topic changes of KnoSys from 1991 to 2016. A Latent Dirichlet Allocation model is used to profile the hotspots of KnoSys and predict possible future trends from a probabilistic perspective. A model of scientific evolutionary pathways applies a learning-based process to detect the topic changes of KnoSys in sequential time slices. Six main research areas of KnoSys are identified, i.e., expert systems, machine learning, data mining, decision making, optimization, and fuzzy, and the results also indicate that the interest of KnoSys communities in the area of computational intelligence is raised, and the ability to construct practical systems through knowledge use and accurate prediction models is highly emphasized. Such empirical insights can be used as a guide for KnoSys submissions

    A heuristic information retrieval study : an investigation of methods for enhanced searching of distributed data objects exploiting bidirectional relevance feedback

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy of the University of LutonThe primary aim of this research is to investigate methods of improving the effectiveness of current information retrieval systems. This aim can be achieved by accomplishing numerous supporting objectives. A foundational objective is to introduce a novel bidirectional, symmetrical fuzzy logic theory which may prove valuable to information retrieval, including internet searches of distributed data objects. A further objective is to design, implement and apply the novel theory to an experimental information retrieval system called ANACALYPSE, which automatically computes the relevance of a large number of unseen documents from expert relevance feedback on a small number of documents read. A further objective is to define a methodology used in this work as an experimental information retrieval framework consisting of multiple tables including various formulae which anow a plethora of syntheses of similarity functions, ternl weights, relative term frequencies, document weights, bidirectional relevance feedback and history adjusted term weights. The evaluation of bidirectional relevance feedback reveals a better correspondence between system ranking of documents and users' preferences than feedback free system ranking. The assessment of similarity functions reveals that the Cosine and Jaccard functions perform significantly better than the DotProduct and Overlap functions. The evaluation of history tracking of the documents visited from a root page reveals better system ranking of documents than tracking free information retrieval. The assessment of stemming reveals that system information retrieval performance remains unaffected, while stop word removal does not appear to be beneficial and can sometimes be harmful. The overall evaluation of the experimental information retrieval system in comparison to a leading edge commercial information retrieval system and also in comparison to the expert's golden standard of judged relevance according to established statistical correlation methods reveal enhanced system information retrieval effectiveness

    Information retrieval (Part I):Introduction

    Get PDF

    The intellectual structure and substance of the knowledge utilization field: A longitudinal author co-citation analysis, 1945 to 2004

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been argued that science and society are in the midst of a far-reaching renegotiation of the social contract between science and society, with society becoming a far more active partner in the creation of knowledge. On the one hand, new forms of knowledge production are emerging, and on the other, both science and society are experiencing a rapid acceleration in new forms of knowledge utilization. Concomitantly since the Second World War, the science underpinning the knowledge utilization field has had exponential growth. Few in-depth examinations of this field exist, and no comprehensive analyses have used bibliometric methods.</p> <p>Methods</p> <p>Using bibliometric analysis, specifically first author co-citation analysis, our group undertook a domain analysis of the knowledge utilization field, tracing its historical development between 1945 and 2004. Our purposes were to map the historical development of knowledge utilization as a field, and to identify the changing intellectual structure of its scientific domains. We analyzed more than 5,000 articles using citation data drawn from the Web of Science<sup>®</sup>. Search terms were combinations of knowledge, research, evidence, guidelines, ideas, science, innovation, technology, information theory and use, utilization, and uptake.</p> <p>Results</p> <p>We provide an overview of the intellectual structure and how it changed over six decades. The field does not become large enough to represent with a co-citation map until the mid-1960s. Our findings demonstrate vigorous growth from the mid-1960s through 2004, as well as the emergence of specialized domains reflecting distinct collectives of intellectual activity and thought. Until the mid-1980s, the major domains were focused on innovation diffusion, technology transfer, and knowledge utilization. Beginning slowly in the mid-1980s and then growing rapidly, a fourth scientific domain, evidence-based medicine, emerged. The field is dominated in all decades by one individual, Everett Rogers, and by one paradigm, innovation diffusion.</p> <p>Conclusion</p> <p>We conclude that the received view that social science disciplines are in a state where no accepted set of principles or theories guide research (<it>i.e.</it>, that they are pre-paradigmatic) could not be supported for this field. Second, we document the emergence of a new domain within the knowledge utilization field, evidence-based medicine. Third, we conclude that Everett Rogers was the dominant figure in the field and, until the emergence of evidence-based medicine, his representation of the general diffusion model was the dominant paradigm in the field.</p

    Cluster Analysis of Legal Documents

    Get PDF
    Single-link cluster analysis has been used to provide classifications of several collections of legal documents, based on various characteristics of the text. Each document was represented in terms of the chosen characteristics by a vector whose elements were the frequencies of occurrence of the characteristics in that document. The values of similarity between documents were determined by calculating the cosine of the angle between each pair of document vectors. The clustering algorithm then operated on these similarity coefficients to group documents which were most similar. A suite of computer programs was written to perform the classification. Four programs were required to (a) select the document descriptors from the full-text of the documents, (b) construct document vectors, (c) calculate similarity coefficients, and (d) perform single-link clustering. Three classification experiments were performed. The first classified the full-text of both the English and French versions of the Treaties of the Council of Europe. The words of the full-text, taken singly and in pairs, were used to describe the treaties, and the two cases of including and excluding the 'common' words were investigated. The best classification was based on single words with common words excluded. Since each treaty was a lengthy collection of non-homogeneous clauses, it was thought that a classification - ii - of the individual articles would be more useful. In this case the formal and non-formal clauses clustered separately, whereas before the formal clauses, present in every. treaty, had caused semantically unrelated treaties to be brought together. During the course of this study an opportunity arose to investigate the use of cluster analysis to test the trustworthiness of certain oral confessions presented as evidence in criminal proceedings. The common or function words, which are generally agreed to characterise the style of an author, were used as document descriptors for two sets of statements, one which the defendant admitted, the other which he was alleged to have made but which he denied. The two sets of statements clustered separately, indicating a difference in style. On the basis of this and other comparative tests it was possible to say that the disputed statements were unlikely to have been made by the defendant. The third experiment involved the use of the marginal citations in Statutes as document descriptors. Statutes were regarded as semantically related if they cited the same Acts. The Public General Acts of Parliament for the three years 1973 - 1975 were successfully clustered into groups of related Acts

    Integrating and conceptualizing heterogeneous ontologies on the web

    Get PDF
    Master'sMASTER OF SCIENC
    corecore