7,686 research outputs found
Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems of interacting
objects, unifying the study of diverse phenomena including biological organisms
and human society. One crucial step when studying the structure and dynamics of
networks is to identify communities: groups of related nodes that correspond to
functional subunits such as protein complexes or social spheres. Communities in
networks often overlap such that nodes simultaneously belong to several groups.
Meanwhile, many networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical structure. However, the
fact that many real networks have communities with pervasive overlap, where
each and every node belongs to more than one group, has the consequence that a
global hierarchy of nodes cannot capture the relationships between overlapping
groups. Here we reinvent communities as groups of links rather than nodes and
show that this unorthodox approach successfully reconciles the antagonistic
organizing principles of overlapping communities and hierarchy. In contrast to
the existing literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hierarchical
organization. We find relevant link communities in many networks, including
major biological networks such as protein-protein interaction and metabolic
networks, and show that a large social network contains hierarchically
organized community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link communities are
fundamental building blocks that reveal overlap and hierarchical organization
in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
Text authorship identified using the dynamics of word co-occurrence networks
The identification of authorship in disputed documents still requires human
expertise, which is now unfeasible for many tasks owing to the large volumes of
text and authors in practical applications. In this study, we introduce a
methodology based on the dynamics of word co-occurrence networks representing
written texts to classify a corpus of 80 texts by 8 authors. The texts were
divided into sections with equal number of linguistic tokens, from which time
series were created for 12 topological metrics. The series were proven to be
stationary (p-value>0.05), which permits to use distribution moments as
learning attributes. With an optimized supervised learning procedure using a
Radial Basis Function Network, 68 out of 80 texts were correctly classified,
i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in
purely dynamic network metrics were found to characterize authorship, thus
opening the way for the description of texts in terms of small evolving
networks. Moreover, the approach introduced allows for comparison of texts with
diverse characteristics in a simple, fast fashion
Information Science in the web era: a term-based approach to domain mapping.
International audienceWe propose a methodology for mapping the research in Information Science (IS) field based on a combined use of symbolic (linguistic) and numeric information. Using the same list of 12 IS journals as in earlier studies on this same topic (White & McCain 1998 ; Zhao & Strotmann 2008a&b), we mapped the structure of research in IS for two consecutive periods: 1996-2005 and 2006-2008. We focused on mapping the content of scientific publications from the title and abstract fields of underlying publications. The labels of clusters were automatically derived from titles and abstracts of scientific publications based on linguistic criteria. The results showed that while Information Retrieval (IR) and Citation studies continued to be the two structuring poles of research in IS, other prominent poles have emerged: webometrics in the first period (1996-2005) evolved into general web studies in the second period, integrating more aspects of IR research. Hence web studies and IR are more interwoven. There is still persistence of user studies in IS but now dispersed among the web studies and the IR poles. The presence of some recent trends in IR research such as automatic summarization and the use of language models were also highlighted by our method. Theoretic research on "information science" continue to occupy a smaller but persistence place. Citation studies on the other hand remains a monolithic block, isolated from the two other poles (IR and web studies) save for a tenuous link through user studies. Citation studies have also recently evolved internally to accommodate newcomers like "h-index, Google scholar and the open access model". All these results were automatically generated by our method without resorting to manual labeling of specialties nor reading the publication titles. Our results show that mapping domain knowledge structures at the term level offers a more detailed and intuitive picture of the field as well as capturing emerging trends
Graph Theory and Networks in Biology
In this paper, we present a survey of the use of graph theoretical techniques
in Biology. In particular, we discuss recent work on identifying and modelling
the structure of bio-molecular networks, as well as the application of
centrality measures to interaction networks and research on the hierarchical
structure of such networks and network motifs. Work on the link between
structural network properties and dynamics is also described, with emphasis on
synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape
Topic Extraction and Interactive Knowledge Graphs for Learning Resources
Humanity development through education is an important method of sustainable development. This guarantees community development at present time without any negative effects in the future and also provides prosperity for future generations. E-learning is a natural development of the educational tools in this era and current circumstances. Thanks to the rapid development of computer sciences and telecommunication technologies, this has evolved impressively. In spite of facilitating the educational process, this development has also provided a massive amount of learning resources, which makes the task of searching and extracting useful learning resources difficult. Therefore, new tools need to be advanced to facilitate this development. In this paper we present a new algorithm that has the ability to extract the main topics from textual learning resources, link related resources and generate interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks no matter how big or small the texts are. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm"s accuracy was evaluated against Gensim, largely improving its accuracy. This could be a step towards strengthening self-learning and supporting the sustainable development of communities, and more broadly of humanity, across different generations.The researcher was partially funded by the Egyptian Ministry of Higher Education and Minia University in the Arab Republic of Egypt. [Joint supervision mission from the fourth year missions (2015–2016) of the seventh five-year plan (2012–2017)]
Fairness-aware Machine Learning in Educational Data Mining
Fairness is an essential requirement of every educational system, which is reflected in a variety of educational activities. With the extensive use of Artificial Intelligence (AI) and Machine Learning (ML) techniques in education, researchers and educators can analyze educational (big) data and propose new (technical) methods in order to support teachers, students, or administrators of (online) learning systems in the organization of teaching and learning. Educational data mining (EDM) is the result of the application and development of data mining (DM), and ML techniques to deal with educational problems, such as student performance prediction and student grouping. However, ML-based decisions in education can be based on protected attributes, such as race or gender, leading to discrimination of individual students or subgroups of students. Therefore, ensuring fairness in ML models also contributes to equity in educational systems. On the other hand, bias can also appear in the data obtained from learning environments. Hence, bias-aware exploratory educational data analysis is important to support unbiased decision-making in EDM.
In this thesis, we address the aforementioned issues and propose methods that mitigate discriminatory outcomes of ML algorithms in EDM tasks. Specifically, we make the following contributions:
We perform bias-aware exploratory analysis of educational datasets using Bayesian networks to identify the relationships among attributes in order to understand bias in the datasets. We focus the exploratory data analysis on features having a direct or indirect relationship with the protected attributes w.r.t. prediction outcomes.
We perform a comprehensive evaluation of the sufficiency of various group fairness measures in predictive models for student performance prediction problems. A variety of experiments on various educational datasets with different fairness measures are performed to provide users with a broad view of unfairness from diverse aspects.
We deal with the student grouping problem in collaborative learning. We introduce the fair-capacitated clustering problem that takes into account cluster fairness and cluster cardinalities. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain fair-capacitated clustering.
We introduce the multi-fair capacitated (MFC) students-topics grouping problem that satisfies students' preferences while ensuring balanced group cardinalities and maximizing the diversity of members regarding the protected attribute. We propose three approaches: a greedy heuristic approach, a knapsack-based approach using vanilla maximal 0-1 knapsack formulation, and an MFC knapsack approach based on group fairness knapsack formulation.
In short, the findings described in this thesis demonstrate the importance of fairness-aware ML in educational settings. We show that bias-aware data analysis, fairness measures, and fairness-aware ML models are essential aspects to ensure fairness in EDM and the educational environment.Ministry of Science and Culture of Lower Saxony/LernMINT/51410078/E
Recommended from our members
Context-awareness for mobile sensing: a survey and future directions
The evolution of smartphones together with increasing computational power have empowered developers to create innovative context-aware applications for recognizing user related social and cognitive activities in any situation and at any location. The existence and awareness of the context provides the capability of being conscious of physical environments or situations around mobile device users. This allows network services to respond proactively and intelligently based on such awareness. The key idea behind context-aware applications is to encourage users to collect, analyze and share local sensory knowledge in the purpose for a large scale community use by creating a smart network. The desired network is capable of making autonomous logical decisions to actuate environmental objects, and also assist individuals. However, many open challenges remain, which are mostly arisen due to the middleware services provided in mobile devices have limited resources in terms of power, memory and bandwidth. Thus, it becomes critically important to study how the drawbacks can be elaborated and resolved, and at the same time better understand the opportunities for the research community to contribute to the context-awareness. To this end, this paper surveys the literature over the period of 1991-2014 from the emerging concepts to applications of context-awareness in mobile platforms by providing up-to-date research and future research directions. Moreover, it points out the challenges faced in this regard and enlighten them by proposing possible solutions
- …