409 research outputs found

    Automatic Discovery and Ranking of Synonyms for Search Keywords in the Web

    Get PDF
    Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard

    A Systematic Identification and Analysis of Scientists on Twitter

    Full text link
    Metrics derived from Twitter and other social media---often referred to as altmetrics---are increasingly used to estimate the broader social impacts of scholarship. Such efforts, however, may produce highly misleading results, as the entities that participate in conversations about science on these platforms are largely unknown. For instance, if altmetric activities are generated mainly by scientists, does it really capture broader social impacts of science? Here we present a systematic approach to identifying and analyzing scientists on Twitter. Our method can identify scientists across many disciplines, without relying on external bibliographic data, and be easily adapted to identify other stakeholder groups in science. We investigate the demographics, sharing behaviors, and interconnectivity of the identified scientists. We find that Twitter has been employed by scholars across the disciplinary spectrum, with an over-representation of social and computer and information scientists; under-representation of mathematical, physical, and life scientists; and a better representation of women compared to scholarly publishing. Analysis of the sharing of URLs reveals a distinct imprint of scholarly sites, yet only a small fraction of shared URLs are science-related. We find an assortative mixing with respect to disciplines in the networks between scientists, suggesting the maintenance of disciplinary walls in social media. Our work contributes to the literature both methodologically and conceptually---we provide new methods for disambiguating and identifying particular actors on social media and describing the behaviors of scientists, thus providing foundational information for the construction and use of indicators on the basis of social media metrics

    The Extraction of Social Networks from Web Using Search Engines

    Get PDF
    In this paper, our purpose is to create a large collection of related vocabularies and concepts to the user’s favorite field (articles, people, conferences, books, etc.) from the available information on the infinite and vast source of web which is expressed in the form of social network. In the other words, we introduced a way to help the researchers to be able to specify their favorite topic in a particular field and by this way, observe and extract the social network of the related concepts to that topic. In order to extract the nodes of this network, we used the sampling of web pages through the Google search engine, text processing techniques, and information retrieval. The topic of the extracted social network in this research is the scientific conferences in the field of computer sciences. In order to evaluate the effectiveness of this method, the extracted network from the results of the search engine is compared with the scientific conferences available in the DBLP[1] database. The obtained results from the social network analysis showed that the extracted network is of very high accuracy.[1] Digital Bibliography and Library Projec

    An Efficient approach for finding the essential experts in Digital Library

    Get PDF
    Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on Nam e Disambiguation problem. When non - unique values are used as the identifier of Entities, due to their homonym, confusion can occur. In particular, when (part of ) "names" of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish "Vannevar Bush" from "George Bush"). We formalize the problem in a unified probabilistic framework and propose a algorithm for parameter estimation. We use a dynamic approach for estimating the number of people K and for finding the experts in digital library by counting the number of accesses of the paper
    • …
    corecore