7 research outputs found

    Using text analysis to quantify the similarity and evolution of scientific disciplines

    Full text link
    We use an information-theoretic measure of linguistic similarity to investigate the organization and evolution of scientific fields. An analysis of almost 20M papers from the past three decades reveals that the linguistic similarity is related but different from experts and citation-based classifications, leading to an improved view on the organization of science. A temporal analysis of the similarity of fields shows that some fields (e.g., computer science) are becoming increasingly central, but that on average the similarity between pairs has not changed in the last decades. This suggests that tendencies of convergence (e.g., multi-disciplinarity) and divergence (e.g., specialization) of disciplines are in balance.Comment: 9 pages, 4 figure

    Quantifying the rise and fall of scientific fields

    Full text link
    Science advances by pushing the boundaries of the adjacent possible. While the global scientific enterprise grows at an exponential pace, at the mesoscopic level the exploration and exploitation of research ideas is reflected through the rise and fall of research fields. The empirical literature has largely studied such dynamics on a case-by-case basis, with a focus on explaining how and why communities of knowledge production evolve. Although fields rise and fall on different temporal and population scales, they are generally argued to pass through a common set of evolutionary stages. To understand the social processes that drive these stages beyond case studies, we need a way to quantify and compare different fields on the same terms. In this paper we develop techniques for identifying scale-invariant patterns in the evolution of scientific fields, and demonstrate their usefulness using 1.5 million preprints from the arXiv repository covering 175 research fields spanning Physics, Mathematics, Computer Science, Quantitative Biology and Quantitative Finance. We show that fields consistently follows a rise and fall pattern captured by a two parameters right-tailed Gumbel temporal distribution. We introduce a field-specific rescaled time and explore the generic properties shared by articles and authors at the creation, adoption, peak, and decay evolutionary phases. We find that the early phase of a field is characterized by the mixing of cognitively distant fields by small teams of interdisciplinary authors, while late phases exhibit the role of specialized, large teams building on the previous works in the field. This method provides foundations to quantitatively explore the generic patterns underlying the evolution of research fields in science, with general implications in innovation studies.Comment: 18 pages, 4 figures, 8 SI figure

    What Can Philosophers Really Learn from Science Journals?

    Get PDF
    Philosophers of science regularly use scientific publications in their research. To make their analyses of the literature more thorough, some have begun to use computational methods from the digital humanities (DH). Yet this creates a tension: it’s become a truism in science studies that the contents of scientific publications do not accurately reflect the complex realities of scientific investigation. In this paper, we outline existing views on how scientific publications fit into the broader picture of science as a system of practices, and find that none of these views exclude articles as valuable sources for philosophical inquiry. Far from ignoring the gap between texts and practice, proper use of DH tools requires, and can even contribute to, our understanding of that gap and its implications

    Digital Literature Analysis for Empirical Philosophy of Science

    Get PDF
    Empirical philosophers of science aim to base their philosophical theories on observations of scientific practice. But since there is far too much science to observe it all, how can we form and test hypotheses about science that are sufficiently rigorous and broad in scope, while avoiding the pitfalls of bias and subjectivity in our methods? Part of the answer, we claim, lies in the computational tools of the digital humanities (DH), which allow us to analyze large volumes of scientific literature. Here we advocate for the use of these methods by addressing a number of large-scale, justificatory concerns—specifically, about the epistemic value of journal articles as evidence for what happens elsewhere in science, and about the ability of DH tools to extract this evidence. Far from ignoring the gap between scientific literature and the rest of scientific practice, effective use of DH tools requires critical reflection about these relationships

    Dissimilarity between scientific fields

    No full text
    Datasets and supporting material used in the manuscript "Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671 There are four types of information: 1. Classification One file (classification.csv) Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification. 2. Divergencies Seven ".csv" files D_level_dimension.csv The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties). The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details. 3. Temporal evolution One file (D_over_time.csv) The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper. 4. List of words The list of contractions was obtained from the Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions). The list of stop word was constructed mixing the lists found in NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org). List of Contractions: "she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would have', "how'd'y": 'how do you', "shan't've": 'shall not have', "haven't": 'have not', "who's": 'who is', 'gonna': 'going to', "they'd": 'they would', "oughtn't": 'ought not', "you've": 'you have', "she'd've": 'she would have', "we'll": 'we will', "mayn't": 'may not', "they've": 'they have', "mustn't've": 'must not have', "could've": 'could have', "what've": 'what have', "mustn't": 'must not', "isn't": 'is not', "that'd've": 'that would have', "i'll": 'i will', "why's": 'why is', "you'd": 'you would', "couldn't've": 'could not have', "they'll've": 'they will have', "we'd": 'we would', "y'all'd": 'you all would', "he'll've": 'he will have', "shan't": 'shall not', "y'all'd've": 'you all would have', "there'd": 'there would', "needn't": 'need not', "where'd": 'where did', "hadn't've": 'had not have', "wouldn't've": 'would not have', "there's": 'there is', "shouldn't": 'should not', "they'll": 'they will', "needn't've": 'need not have', "mightn't": 'might not', "you're": 'you are', "so've": 'so have', "what'll": 'what will', "mightn't've": 'might not have', "hadn't": 'had not', "aren't": 'are not', "where's": 'where is', "wouldn't": 'would not', "i'd": 'i would', "weren't": 'were not', "would've": 'would have', "i'm": 'i am', "it'll": 'it will', "we'd've": 'we would have', "can't": 'cannot', "y'all": 'you all', "couldn't": 'could not', "how'll": 'how will', "doesn't": 'does not', "when've": 'when have', "how's": 'how is', "it's": 'it is', "y'all've": 'you all have', "how'd": 'how did', "we're": 'we are', "it'd": 'it would', "what're": 'what are', "i've": 'i have', "oughtn't've": 'ought not have', "what's": 'what is', "ain't": 'am not', "who'll": 'who will', "i'd've": 'i would have', "must've": 'must have', "they're": 'they are', "you'd've": 'you would have', "wasn't": 'was not', "it'll've": 'it will have', "hasn't": 'has not', "won't've": 'will not have', "so's": 'so is', "you'll've": 'you will have', "there'd've": 'there would have', "i'll've": 'i will have', "didn't": 'did not', "where've": 'where have', "they'd've": 'they would have', "why've": 'why have', "it'd've": 'it would have', "who've": 'who have', "sha'n't": 'shall not', "to've": 'to have', "o'clock": 'of the clock', "let's": 'let us', "what'll've": 'what will have', "might've": 'might have', "he'll": 'he will', "that'd": 'that would', 'wanna': 'want to', "we'll've": 'we will have', "she'd": 'she would', "can't've": 'cannot have', "you'll": 'you will', "will've": 'will have', "she's": 'she is', "that's": 'that is' List of Stopwords: a, about, above, after, afterward, afterwards, again, against, all, almost, along, already, also, although, always, am, among, amongst, an, and, another, any, anybody, anyhow, anyone, anything, anyway, anyways, anywhere, are, around, as, aside, at, be, became, because, become, becomes, becoming, been, before, beforehand, behind, being, below, beside, besides, between, beyond, both, but, by, can, "cant", cannot, could, "couldnt", did, "didnt", do, does, "doesnt", doing, "dont", down, downwards, due, each, eg, either, else, elsewhere, enough, etc, even, ever, every, everybody, everyone, everything, everywhere, ex, except, for, former, formerly, find, found, from, further, furthermore, get, gets, getting, go, goes, going, gone, got, gotten, had, has, "hasnt", have, having, he, hence, her, here, hereafter, hereby, herein, hereupon, hers, herself, him, himself, his, hither, hitherto, how, however, i, ie, if, ii, iii, in, indeed, insofar, instead, into, inward, is, it, its, itself, iv, just, less, may, maybe, me, meanwhile, might, mine, more, moreover, most, mostly, must, my, myself, neither, nevertheless, new, no, non, none, nonetheless, nor, not, now, nowhere, obviously, occurs, of, off, often, on, only, onto, or, other, others, otherwise, our, ours, ourselves, out, over, own, perhaps, put, quite, rather, respectively, same, several, shall, she, should, show, showed, shown, shows, similar, since, so, some, somehow, someone, something, sometime, sometimes, somewhere, still, such, than, that, "thats", the, their, theirs, them, themselves, then, thence, thenceforth, there, "theres", thereafter, thereby, therefore, therein, theres, thereupon, these, they, this, thorough, thoroughly, those, though, through, throughout, thru, thus, to, together, too, toward, towards, under, until, unto, up, upon, upwards, us, use, used, using, various, very, was, we, well, were, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, whither, who, whoever, whole, whom, whose, why, will, with, within, without, would, yet, you, your, yours, yourself, yourselve
    corecore