510 research outputs found
Quantifying and suppressing ranking bias in a large citation network
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. Our statistical framework to assess ranking bias allows us to exactly quantify the contributions of each individual field to the overall bias of a given ranking. We propose a general normalization procedure motivated by the z-score which produces much less biased rankings when applied to citation count and PageRank score
Research complexity of Australian universities
Strategic research direction and prioritisation is crucial for decision making in universities. Analysis of research diversification and sophistication helps differentiating universities according to their research attributes. Based on the Microsoft Academic Graph data set, this paper conducts research complexity analysis for all Australian universities, and examines the ubiquity and diversity of the research output. This paper also investigates research complexity indices of Australian universities, with further discussions for universities with research leadership, technological and practical focuses, and young research universities
Implementation of Anomaly Based Network Intrusion Detection by Using Q-learning Technique
Network Intrusion detection System (NIDS) is an intrusion detection system that tries to discover malicious activity such as service attacks, port scans or even attempts to break into computers by monitoring network traffic. Data mining techniques make it possible to search large amounts of data for characteristic rules and patterns. If applied to network monitoring data recorded on a host or in a network, they can be used to detect intrusions, attacks or anomalies. We proposed “machine learning method”, cascading Principal Component Analysis (PCA) and the Q-learning methods to classifying anomalous and normal activities in a computer network. This paper investigates the use of PCA to reduce high dimensional data and to improve the predictive performance. On the reduced data, representing a density region of normal or anomaly instances, Q-learning strategies are applied for the creation of agents that can adapt to unknown, complex environments. We attempted to create an agent that would learn to explore an environment and collect the malicious within it. We obtained interesting results where agents were able to re-adapt their learning quickly to the new traffic and network information as compare to the other machine learning method such as supervised learning and unsupervised learning. Keywords: Intrusion, Anomaly Detection, Data Mining, KDD Cup’99, PCA, Q-learning
Name Disambiguation from link data in a collaboration graph using temporal and topological features
In a social community, multiple persons may share the same name, phone number
or some other identifying attributes. This, along with other phenomena, such as
name abbreviation, name misspelling, and human error leads to erroneous
aggregation of records of multiple persons under a single reference. Such
mistakes affect the performance of document retrieval, web search, database
integration, and more importantly, improper attribution of credit (or blame).
The task of entity disambiguation partitions the records belonging to multiple
persons with the objective that each decomposed partition is composed of
records of a unique person. Existing solutions to this task use either
biographical attributes, or auxiliary features that are collected from external
sources, such as Wikipedia. However, for many scenarios, such auxiliary
features are not available, or they are costly to obtain. Besides, the attempt
of collecting biographical or external data sustains the risk of privacy
violation. In this work, we propose a method for solving entity disambiguation
task from link information obtained from a collaboration network. Our method is
non-intrusive of privacy as it uses only the time-stamped graph topology of an
anonymized network. Experimental results on two real-life academic
collaboration networks show that the proposed method has satisfactory
performance.Comment: The short version of this paper has been accepted to ASONAM 201
- …