2,957 research outputs found

    WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks

    Full text link
    Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the 99 largest language editions. The dataset contains yearly snapshots of the network and spans 1717 years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version accepted at the 13TH International AAAI Conference on Web and Social Media (ICWSM 2019) - Munich (Germany), 11-14 June 201

    A Picture of Present Ubicomp Research Exploring Publications from Important Events in the Field

    Get PDF
    In this work we use a dataset of papers published in top conferences focused on ubiquitous computing (ubicomp) to provide an overview and analysis of recent ubiquitous computing research performed internationally and in Brazil. The contributions of this study are twofold. First, we extracted useful information from our dataset such as representativeness of authors and institutions, and the formation of communities. Second, we analyzed all papers published between 2010 and 2011 in all top international conferences, creating a taxonomy of recent ubicomp research performed internationally. Afterthat we mapped SBCUP papers (Brazilian ubicomp conference) according to this taxonomy, which enables the comparison of international and national research. This study is useful to guide novices in the field and it also provides experienced researchers with facts enabling the discussion of ubicomp research.Key words: Ubiquitous computing, scientific network, collaboration network, Pervasive, Percom, Ubicomp, SBCUP, taxonomy, characterization

    Scientometric Research Assessment of IEEE CSCWD Conference Proceedings: An Exploratory Analysis from 2001 to 2019

    Get PDF
    It has been a quarter of a century since the publication of the first edition of the IEEE International Conference on Computer Supported Cooperative Work in Design (CSCWD) held in 1996 in Beijing, China. Despite some attempts to empirically examine the evolution and identity of the field of CSCW and its related communities and disciplines, the scarcity of scientometric studies on the IEEE CSCWD research productivity is noteworthy. To fill this gap, this study reports on an exploratory quantitative analysis of the literature published in the IEEE CSCWD conference proceedings with the purpose of visualizing and understanding its structure and evolution for the 2001-2019 period. The findings offer valuable insights into the paper and author distribution, country and citation-level productivity indicators, degree of collaboration, and collaboration index. Through this analysis, we also expect to get an initial overview of the IEEE CSCWD conference concerning the main topics being presented, most cited papers, and variances in the number of keywords, full-text views, and references

    Data Mining a Medieval Medical Text Reveals Patterns in Ingredient Choice That Reflect Biological Activity against Infectious Agents

    Get PDF
    We used established methodologies from network science to identify patterns in medicinal ingredient combinations in a key medieval text, the 15th-century Lylye of Medicynes, focusing on recipes for topical treatments for symptoms of microbial infection. We conducted experiments screening the antimicrobial activity of selected ingredients. These experiments revealed interesting examples of ingredients that potentiated or interfered with each other’s activity and that would be useful bases for future, more detailed experiments. Our results highlight (i) the potential to use methodologies from network science to analyze medieval data sets and detect patterns of ingredient combination, (ii) the potential of interdisciplinary collaboration to reveal different aspects of the ethnopharmacology of historical medical texts, and (iii) the potential development of novel therapeutics inspired by premodern remedies in a time of increased need for new antibiotics.The pharmacopeia used by physicians and laypeople in medieval Europe has largely been dismissed as placebo or superstition. While we now recognize that some of the materia medica used by medieval physicians could have had useful biological properties, research in this area is limited by the labor-intensive process of searching and interpreting historical medical texts. Here, we demonstrate the potential power of turning medieval medical texts into contextualized electronic databases amenable to exploration by the use of an algorithm. We used established methodologies from network science to reveal patterns in ingredient selection and usage in a key text, the 15th-century Lylye of Medicynes, focusing on remedies to treat symptoms of microbial infection. In providing a worked example of data-driven textual analysis, we demonstrate the potential of this approach to encourage interdisciplinary collaboration and to shine a new light on the ethnopharmacology of historical medical texts

    Oil and Gas flow Anomaly Detection on offshore naturally flowing wells using Deep Neural Networks

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe Oil and Gas industry, as never before, faces multiple challenges. It is being impugned for being dirty, a pollutant, and hence the more demand for green alternatives. Nevertheless, the world still has to rely heavily on hydrocarbons, since it is the most traditional and stable source of energy, as opposed to extensively promoted hydro, solar or wind power. Major operators are challenged to produce the oil more efficiently, to counteract the newly arising energy sources, with less of a climate footprint, more scrutinized expenditure, thus facing high skepticism regarding its future. It has to become greener, and hence to act in a manner not required previously. While most of the tools used by the Hydrocarbon E&P industry is expensive and has been used for many years, it is paramount for the industry’s survival and prosperity to apply predictive maintenance technologies, that would foresee potential failures, making production safer, lowering downtime, increasing productivity and diminishing maintenance costs. Many efforts were applied in order to define the most accurate and effective predictive methods, however data scarcity affects the speed and capacity for further experimentations. Whilst it would be highly beneficial for the industry to invest in Artificial Intelligence, this research aims at exploring, in depth, the subject of Anomaly Detection, using the open public data from Petrobras, that was developed by experts. For this research the Deep Learning Neural Networks, such as Recurrent Neural Networks with LSTM and GRU backbones, were implemented for multi-class classification of undesirable events on naturally flowing wells. Further, several hyperparameter optimization tools were explored, mainly focusing on Genetic Algorithms as being the most advanced methods for such kind of tasks. The research concluded with the best performing algorithm with 2 stacked GRU and the following vector of hyperparameters weights: [1, 47, 40, 14], which stand for timestep 1, number of hidden units 47, number of epochs 40 and batch size 14, producing F1 equal to 0.97%. As the world faces many issues, one of which is the detrimental effect of heavy industries to the environment and as result adverse global climate change, this project is an attempt to contribute to the field of applying Artificial Intelligence in the Oil and Gas industry, with the intention to make it more efficient, transparent and sustainable

    Social Knowledge Creation: Three Annotated Bibliographies

    Get PDF
    In 2012-2013 a team led by Ray Siemens at the Electronic Textual Cultures Lab (ETCL), University of Victoria, in collaboration with Implementing New Knowledge Environments (INKE), developed three annotated bibliographies under the rubric of social knowledge creation. The items for the bibliographies were gathered and annotated by members of the Electronic Textual Cultures Lab (ETCL) to form this tripartite document as a resource for students and researchers involved in the iNKE team and well beyond, iincluding at digital humanities seminars in Bern (June 2013) and Leipzig (July 2013)

    Analysis of category co-occurrence in Wikipedia networks

    Get PDF
    Wikipedia has seen a huge expansion of content since its inception. Pages within this online encyclopedia are organised by assigning them to one or more categories, where Wikipedia maintains a manually constructed taxonomy graph that encodes the semantic relationship between these categories. An alternative, called the category co-occurrence graph, can be produced automatically by linking together categories that have pages in common. Properties of the latter graph and its relationship to the former is the concern of this thesis. The analytic framework, called t-component, is introduced to formalise the graphs and discover category clusters connecting relevant categories together. The m-core, a cohesive subgroup concept as a clustering model, is used to construct a subgraph depending on the number of shared pages between the categories exceeding a given threshold t. The significant of the clustering result of the m-core is validated using a permutation test. This is compared to the k-core, another clustering model. TheWikipedia category co-occurrence graphs are scale-free with a few category hubs and the majority of clusters are size 2. All observed properties for the distribution of the largest clusters of the category graphs obey power-laws with decay exponent averages around 1. As the threshold t of the number of shared pages is increased, eventually a critical threshold is reached when the largest cluster shrinks significantly in size. This phenomena is only exhibited for the m-core but not the k-core. Lastly, the clustering in the category graph is shown to be consistent with the distance between categories in the taxonomy graph
    • …
    corecore