39,385 research outputs found
A Graph-structured Dataset for Wikipedia Research
Wikipedia is a rich and invaluable source of information. Its central place
on the Web makes it a particularly interesting object of study for scientists.
Researchers from different domains used various complex datasets related to
Wikipedia to study language, social behavior, knowledge organization, and
network theory. While being a scientific treasure, the large size of the
dataset hinders pre-processing and may be a challenging obstacle for potential
new studies. This issue is particularly acute in scientific domains where
researchers may not be technically and data processing savvy. On one hand, the
size of Wikipedia dumps is large. It makes the parsing and extraction of
relevant information cumbersome. On the other hand, the API is straightforward
to use but restricted to a relatively small number of requests. The middle
ground is at the mesoscopic scale when researchers need a subset of Wikipedia
ranging from thousands to hundreds of thousands of pages but there exists no
efficient solution at this scale.
In this work, we propose an efficient data structure to make requests and
access subnetworks of Wikipedia pages and categories. We provide convenient
tools for accessing and filtering viewership statistics or "pagecounts" of
Wikipedia web pages. The dataset organization leverages principles of graph
databases that allows rapid and intuitive access to subgraphs of Wikipedia
articles and categories. The dataset and deployment guidelines are available on
the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}
The Analysis of Existing Experience for the Ethnobotanical Information System
Ethnobotanical researches reflect the conventional learning of a region. Over the previous decade, medical plants which used for healing indigenous people has become a significant notion among the people and impacted improvement of scientific and ethnobotanical knowledge and investigations of eliminating health problems. A public database has been based on data assembled from various verifiable sources, including journals, travel records, and treatises on therapeutic plants, composed by explorers, botanists, doctors, researchers who went to the nations during the most recent three centuries. In addition, ethnobotanical data depicted in chronicled natural accumulations and in Ancient and Medieval writings from the inquired district have been incorporated into the database. The databases have to be sufficiently adaptable to illustrate a valuable tool for analysts who need to store and analyze present and past ethnobotanical data from the researched location. The ethnobotanical researches are improved in Azerbaijan day by day. The database is used for informing people about some national plants which are growing in the different region of Azerbaijan. The ethnobotanical databases from different countries are analyzed in this article.There are used some special methods for comparing the differences among these databases as data mining and text mining. As a first step the suitable databases are gathered for our investigation, then are defined the best information systems that are used in many countries\u27 biologists and scientists and the end is observed advantages and disadvantages of all existing ethnobotanical databases which we researched. The features of information systems are evaluated. The results demonstrated each of databases has its very own quality, but none has turned a standard form for universal research. The reason is very basic: none of these databases enable specialists to include their own information. There is also illustrated sample structure, main tables and key components of the ethnobotanical database.The obtained results, while a few ethnobotanical databases existing, none are satisfactory answers for worldwide work, and none enable analysts to include their very own information. There is a need brought together all essential properties of existing databases, and creating a free database that encourages ethnobotanical research. Due to the rise and quick improvement in the field of data advances, it has now turned out to be conceivable to digitize, oversee and make ethnobotanical information accessible to a more extensive gathering of people
Is That Twitter Hashtag Worth Reading
Online social media such as Twitter, Facebook, Wikis and Linkedin have made a
great impact on the way we consume information in our day to day life. Now it
has become increasingly important that we come across appropriate content from
the social media to avoid information explosion. In case of Twitter, popular
information can be tracked using hashtags. Studying the characteristics of
tweets containing hashtags becomes important for a number of tasks, such as
breaking news detection, personalized message recommendation, friends
recommendation, and sentiment analysis among others.
In this paper, we have analyzed Twitter data based on trending hashtags,
which is widely used nowadays. We have used event based hashtags to know users'
thoughts on those events and to decide whether the rest of the users might find
it interesting or not. We have used topic modeling, which reveals the hidden
thematic structure of the documents (tweets in this case) in addition to
sentiment analysis in exploring and summarizing the content of the documents. A
technique to find the interestingness of event based twitter hashtag and the
associated sentiment has been proposed. The proposed technique helps twitter
follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium
on Women in Computing and Informatics (WCI-2015
- …