Search CORE

493 research outputs found

Will This Paper Increase Your h-index? Scientific Impact Prediction

Author: Chawla Nitesh V.
Dong Yuxiao
Johnson Reid A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/12/2014
Field of study

Scientific impact plays a central role in the evaluation of the output of scholars, departments, and institutions. A widely used measure of scientific impact is citations, with a growing body of literature focused on predicting the number of citations obtained by any given publication. The effectiveness of such predictions, however, is fundamentally limited by the power-law distribution of citations, whereby publications with few citations are extremely common and publications with many citations are relatively rare. Given this limitation, in this work we instead address a related question asked by many academic researchers in the course of writing a paper, namely: "Will this paper increase my h-index?" Using a real academic dataset with over 1.7 million authors, 2 million papers, and 8 million citation relationships from the premier online academic service ArnetMiner, we formalize a novel scientific impact prediction problem to examine several factors that can drive a paper to increase the primary author's h-index. We find that the researcher's authority on the publication topic and the venue in which the paper is published are crucial factors to the increase of the primary author's h-index, while the topic popularity and the co-authors' h-indices are of surprisingly little relevance. By leveraging relevant factors, we find a greater than 87.5% potential predictability for whether a paper will contribute to an author's h-index within five years. As a further experiment, we generate a self-prediction for this paper, estimating that there is a 76% probability that it will contribute to the h-index of the co-author with the highest current h-index in five years. We conclude that our findings on the quantification of scientific impact can help researchers to expand their influence and more effectively leverage their position of "standing on the shoulders of giants."Comment: Proc. of the 8th ACM International Conference on Web Search and Data Mining (WSDM'15

arXiv.org e-Print Archive

Crossref

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author: Blase Jennifer
Chu Xu
Li Peng
Rao Xi
Zhang Ce
Zhang Yue
Publication venue
Publication date: 01/01/2020
Field of study

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

Author: Hagenmeyer Veit
Keller Hubert B.
Liu Qi
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 07/04/2021
Field of study

KITopen

Inferring Networks of Substitutable and Complementary Products

Author: Bennett J.
Blei D.
Blei D.
Blei D. M.
Brody S.
Chang J.
Ganu G.
Mas-Colell A.
Moghaddam S.
Reyes A.
Titov I.
Vu D.
Publication venue
Publication date: 29/06/2015
Field of study

In a modern recommender system, it is important to understand how products relate to each other. For example, while a user is looking for mobile phones, it might make sense to recommend other phones, but once they buy a phone, we might instead want to recommend batteries, cases, or chargers. These two types of recommendations are referred to as substitutes and complements: substitutes are products that can be purchased instead of each other, while complements are products that can be purchased in addition to each other. Here we develop a method to infer networks of substitutable and complementary products. We formulate this as a supervised link prediction task, where we learn the semantics of substitutes and complements from data associated with products. The primary source of data we use is the text of product reviews, though our method also makes use of features such as ratings, specifications, prices, and brands. Methodologically, we build topic models that are trained to automatically discover topics from text that are successful at predicting and explaining such relationships. Experimentally, we evaluate our system on the Amazon product catalog, a large dataset consisting of 9 million products, 237 million links, and 144 million reviews.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Data mining applications of singular value decomposition

Author: Kurucz Miklós
Publication venue
Publication date: 01/01/2011
Field of study

ELTE Digital Institutional Repository (EDIT)