2,957 research outputs found
WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks
Wikipedia articles contain multiple links connecting a subject to other pages
of the encyclopedia. In Wikipedia parlance, these links are called internal
links or wikilinks. We present a complete dataset of the network of internal
Wikipedia links for the largest language editions. The dataset contains
yearly snapshots of the network and spans years, from the creation of
Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on
the complete hyperlink graph which includes also links automatically generated
by templates, we parsed each revision of each article to track links appearing
in the main text. In this way we obtained a cleaner network, discarding more
than half of the links and representing all and only the links intentionally
added by editors. We describe in detail how the Wikipedia dumps have been
processed and the challenges we have encountered, including the need to handle
special pages such as redirects, i.e., alternative article titles. We present
descriptive statistics of several snapshots of this network. Finally, we
propose several research opportunities that can be explored using this new
dataset.Comment: 10 pages, 3 figures, 7 tables, LaTeX. Final camera-ready version
accepted at the 13TH International AAAI Conference on Web and Social Media
(ICWSM 2019) - Munich (Germany), 11-14 June 201
A Picture of Present Ubicomp Research Exploring Publications from Important Events in the Field
In this work we use a dataset of papers published in top conferences focused on ubiquitous computing (ubicomp) to provide an overview and analysis of recent ubiquitous computing research performed internationally and in Brazil. The contributions of this study are twofold. First, we extracted useful information from our dataset such as representativeness of authors and institutions, and the formation of communities. Second, we analyzed all papers published between 2010 and 2011 in all top international conferences, creating a taxonomy of recent ubicomp research performed internationally. Afterthat we mapped SBCUP papers (Brazilian ubicomp conference) according to this taxonomy, which enables the comparison of international and national research. This study is useful to guide novices in the field and it also provides experienced researchers with facts enabling the discussion of ubicomp research.Key words: Ubiquitous computing, scientific network, collaboration network, Pervasive, Percom, Ubicomp, SBCUP, taxonomy, characterization
Scientometric Research Assessment of IEEE CSCWD Conference Proceedings: An Exploratory Analysis from 2001 to 2019
It has been a quarter of a century since the publication of the first edition of the IEEE International Conference on Computer Supported Cooperative Work in Design (CSCWD) held in 1996 in Beijing, China. Despite some attempts to empirically examine the evolution and identity of the field of CSCW and its related communities and disciplines, the scarcity of scientometric studies on the IEEE CSCWD research productivity is noteworthy. To fill this gap, this study reports on an exploratory quantitative analysis of the literature published in the IEEE CSCWD conference proceedings with the purpose of visualizing and understanding its structure and evolution for the 2001-2019 period. The findings offer valuable insights into the paper and author distribution, country and citation-level productivity indicators, degree of collaboration, and collaboration index. Through this analysis, we also expect to get an initial overview of the IEEE CSCWD conference concerning the main topics being presented, most cited papers, and variances in the number of keywords, full-text views, and references
Data Mining a Medieval Medical Text Reveals Patterns in Ingredient Choice That Reflect Biological Activity against Infectious Agents
We used established methodologies from network science to identify patterns in medicinal ingredient combinations in a key medieval text, the 15th-century Lylye of Medicynes, focusing on recipes for topical treatments for symptoms of microbial infection. We conducted experiments screening the antimicrobial activity of selected ingredients. These experiments revealed interesting examples of ingredients that potentiated or interfered with each other’s activity and that would be useful bases for future, more detailed experiments. Our results highlight (i) the potential to use methodologies from network science to analyze medieval data sets and detect patterns of ingredient combination, (ii) the potential of interdisciplinary collaboration to reveal different aspects of the ethnopharmacology of historical medical texts, and (iii) the potential development of novel therapeutics inspired by premodern remedies in a time of increased need for new antibiotics.The pharmacopeia used by physicians and laypeople in medieval Europe has largely been dismissed as placebo or superstition. While we now recognize that some of the materia medica used by medieval physicians could have had useful biological properties, research in this area is limited by the labor-intensive process of searching and interpreting historical medical texts. Here, we demonstrate the potential power of turning medieval medical texts into contextualized electronic databases amenable to exploration by the use of an algorithm. We used established methodologies from network science to reveal patterns in ingredient selection and usage in a key text, the 15th-century Lylye of Medicynes, focusing on remedies to treat symptoms of microbial infection. In providing a worked example of data-driven textual analysis, we demonstrate the potential of this approach to encourage interdisciplinary collaboration and to shine a new light on the ethnopharmacology of historical medical texts
Oil and Gas flow Anomaly Detection on offshore naturally flowing wells using Deep Neural Networks
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe Oil and Gas industry, as never before, faces multiple challenges. It is being impugned for being
dirty, a pollutant, and hence the more demand for green alternatives. Nevertheless, the world still has
to rely heavily on hydrocarbons, since it is the most traditional and stable source of energy, as opposed
to extensively promoted hydro, solar or wind power. Major operators are challenged to produce the
oil more efficiently, to counteract the newly arising energy sources, with less of a climate footprint,
more scrutinized expenditure, thus facing high skepticism regarding its future. It has to become
greener, and hence to act in a manner not required previously.
While most of the tools used by the Hydrocarbon E&P industry is expensive and has been used for
many years, it is paramount for the industry’s survival and prosperity to apply predictive maintenance
technologies, that would foresee potential failures, making production safer, lowering downtime,
increasing productivity and diminishing maintenance costs. Many efforts were applied in order to
define the most accurate and effective predictive methods, however data scarcity affects the speed
and capacity for further experimentations. Whilst it would be highly beneficial for the industry to invest
in Artificial Intelligence, this research aims at exploring, in depth, the subject of Anomaly Detection,
using the open public data from Petrobras, that was developed by experts.
For this research the Deep Learning Neural Networks, such as Recurrent Neural Networks with LSTM
and GRU backbones, were implemented for multi-class classification of undesirable events on naturally
flowing wells. Further, several hyperparameter optimization tools were explored, mainly focusing on
Genetic Algorithms as being the most advanced methods for such kind of tasks.
The research concluded with the best performing algorithm with 2 stacked GRU and the following
vector of hyperparameters weights: [1, 47, 40, 14], which stand for timestep 1, number of hidden units
47, number of epochs 40 and batch size 14, producing F1 equal to 0.97%.
As the world faces many issues, one of which is the detrimental effect of heavy industries to the
environment and as result adverse global climate change, this project is an attempt to contribute to
the field of applying Artificial Intelligence in the Oil and Gas industry, with the intention to make it
more efficient, transparent and sustainable
Social Knowledge Creation: Three Annotated Bibliographies
In 2012-2013 a team led by Ray Siemens at the Electronic Textual Cultures Lab (ETCL), University of Victoria, in collaboration with Implementing New Knowledge Environments (INKE), developed three annotated bibliographies under the rubric of social knowledge creation. The items for the bibliographies were gathered and annotated by members of the Electronic Textual Cultures Lab (ETCL) to form this tripartite document as a resource for students and researchers involved in the iNKE team and well beyond, iincluding at digital humanities seminars in Bern (June 2013) and Leipzig (July 2013)
Analysis of category co-occurrence in Wikipedia networks
Wikipedia has seen a huge expansion of content since its inception. Pages within this online
encyclopedia are organised by assigning them to one or more categories, where Wikipedia
maintains a manually constructed taxonomy graph that encodes the semantic relationship
between these categories. An alternative, called the category co-occurrence graph, can be
produced automatically by linking together categories that have pages in common. Properties
of the latter graph and its relationship to the former is the concern of this thesis.
The analytic framework, called t-component, is introduced to formalise the graphs and
discover category clusters connecting relevant categories together. The m-core, a cohesive
subgroup concept as a clustering model, is used to construct a subgraph depending on the
number of shared pages between the categories exceeding a given threshold t. The significant
of the clustering result of the m-core is validated using a permutation test. This is compared
to the k-core, another clustering model.
TheWikipedia category co-occurrence graphs are scale-free with a few category hubs and
the majority of clusters are size 2. All observed properties for the distribution of the largest
clusters of the category graphs obey power-laws with decay exponent averages around 1.
As the threshold t of the number of shared pages is increased, eventually a critical threshold
is reached when the largest cluster shrinks significantly in size. This phenomena is only
exhibited for the m-core but not the k-core. Lastly, the clustering in the category graph
is shown to be consistent with the distance between categories in the taxonomy graph
- …