39,385 research outputs found

    A Graph-structured Dataset for Wikipedia Research

    Get PDF
    Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

    The Analysis of Existing Experience for the Ethnobotanical Information System

    Get PDF
    Ethnobotanical researches reflect the conventional learning of a region. Over the previous decade, medical plants which used for healing indigenous people has become a significant notion among the people and impacted improvement of scientific and ethnobotanical knowledge and investigations of eliminating health problems. A public database has been based on data assembled from various verifiable sources, including journals, travel records, and treatises on therapeutic plants, composed by explorers, botanists, doctors, researchers who went to the nations during the most recent three centuries. In addition, ethnobotanical data depicted in chronicled natural accumulations and in Ancient and Medieval writings from the inquired district have been incorporated into the database. The databases have to be sufficiently adaptable to illustrate a valuable tool for analysts who need to store and analyze present and past ethnobotanical data from the researched location. The ethnobotanical researches are improved in Azerbaijan day by day. The database is used for informing people about some national plants which are growing in the different region of Azerbaijan. The ethnobotanical databases from different countries are analyzed in this article.There are used some special methods for comparing the differences among these databases as data mining and text mining. As a first step the suitable databases are gathered for our investigation, then are defined the best information systems that are used in many countries\u27 biologists and scientists and the end is observed advantages and disadvantages of all existing ethnobotanical databases which we researched. The features of information systems are evaluated. The results demonstrated each of databases has its very own quality, but none has turned a standard form for universal research. The reason is very basic: none of these databases enable specialists to include their own information. There is also illustrated sample structure, main tables and key components of the ethnobotanical database.The obtained results, while a few ethnobotanical databases existing, none are satisfactory answers for worldwide work, and none enable analysts to include their very own information. There is a need brought together all essential properties of existing databases, and creating a free database that encourages ethnobotanical research. Due to the rise and quick improvement in the field of data advances, it has now turned out to be conceivable to digitize, oversee and make ethnobotanical information accessible to a more extensive gathering of people

    Is That Twitter Hashtag Worth Reading

    Full text link
    Online social media such as Twitter, Facebook, Wikis and Linkedin have made a great impact on the way we consume information in our day to day life. Now it has become increasingly important that we come across appropriate content from the social media to avoid information explosion. In case of Twitter, popular information can be tracked using hashtags. Studying the characteristics of tweets containing hashtags becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, and sentiment analysis among others. In this paper, we have analyzed Twitter data based on trending hashtags, which is widely used nowadays. We have used event based hashtags to know users' thoughts on those events and to decide whether the rest of the users might find it interesting or not. We have used topic modeling, which reveals the hidden thematic structure of the documents (tweets in this case) in addition to sentiment analysis in exploring and summarizing the content of the documents. A technique to find the interestingness of event based twitter hashtag and the associated sentiment has been proposed. The proposed technique helps twitter follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium on Women in Computing and Informatics (WCI-2015
    • …
    corecore