Search CORE

174 research outputs found

Scientific progress despite irreproducibility: A seeming paradox

Author: Borner Katy
Shiffrin Richard M.
Stigler Stephen M.
Publication venue
Publication date: 05/10/2017
Field of study

It appears paradoxical that science is producing outstanding new results and theories at a rapid rate at the same time that researchers are identifying serious problems in the practice of science that cause many reports to be irreproducible and invalid. Certainly the practice of science needs to be improved and scientists are now pursuing this goal. However, in this perspective we argue that this seeming paradox is not new, has always been part of the way science works, and likely will remain so. We first introduce the paradox. We then review a wide range of challenges that appear to make scientific success difficult. Next, we describe the factors that make science work-in the past, present, and presumably also in the future. We then suggest that remedies for the present practice of science need to be applied selectively so as not to slow progress, and illustrate with a few examples. We conclude with arguments that communication of science needs to emphasize not just problems but the enormous successes and benefits that science has brought and is now bringing to all elements of modern society.Comment: 3 figure

arXiv.org e-Print Archive

Recommended from our members

A Bayesian Metric for Network Similarity

Author: Shiffrin Richard M.
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Networks of every kind and in numerous fields areomnipresent in today’s society (e.g. brain networks, socialnetworks) and are the intense subject of research. It wouldbe of great utility to have a computationally efficient andgenerally applicable method for assessing similarity ofnetworks. The field (going back to the 1950s) has not comeup with such a method (albeit a few moves in this directionexist, such as Jaccard coefficients, QAP--quadraticassignment procedure, and more recently Menezes & Roth,2013, and Asta & Shalizi, 2014). I present a Bayesian-basedmetric for assessing similarity of two networks, possibly ofdifferent size, that include nodes and links between nodes. Iassume the nodes are labeled so that both the nodes andlinks between two nodes that are shared between the twonetworks can be identified.The method calculates similarity as (a monotonictransformation of) the odds that the two observed networks,termed V and W, were produced by random sampling froma single master network, termed G, as opposed to generationby two different but similar networks, termed Gv and Gw.The simplest form of the method ignores strengths thatcould be assigned to nodes and links, and considers onlynodes and links that are, or are not, shared by the networks.Suppose there are n V nodes and N V links only in V, n Wnodes and N W links only in W and n c nodes and N c linksshared between the networks. Thus the number of nodes inV is n c + n V and the number in W is n c + n W . The number ofunique nodes in both V and W is n c + n V + n W = n. Thenumber of links in V is N c + N V and the number in W is N c +N W . The number of unique links in both V and W is N c + N V+ N W = N.The single master network, G, is assumed to consist of theunion of the nodes and links in the two networks, and has nnodes and N links. The probability a given shared node willbe randomly and independently sampled twice is[(n V +n c )/n][(n W +n c )/n]. The probability a given shared linkwill be randomly and independently sampled twice is[(N V +N c )/N][(N W +N c )/N].If there are two generating networks I assume they eachhave n nodes and N links. I also assume they are similar, because we would not be comparing dissimilar networks.The degree of similarity is controlled by ‘tuning’parameters 1 : Gv and Gw are assumed to share αn nodes andβN links. The probability a given shared node will besampled twice is then α[(n V +n c )/n][(n W +n c )/n], and theprobability a given shared link will be sampled twice isβ[(N V +N c )/N][(N W +N c )/N]. The likelihood ratio λ js for G vs(GV, GW) as generator of a given shared node is then 1/αand the likelihood ratio π js of a given shared link is then 1/β.For a non-shared node, say in V, similar reasoning gives alikelihood ratio λ kV of[1-(n W +n c )/n)] /[1– α(n W +n c )/n]and for a non-shared link a likelihood ratio π kV of[1-(N W +N c )/n)] /[1– α(N W +N c )/N]For a non-shared node or link in W substitute a Wsubscript for the V subscript in these likelihood ratios.Computational efficiency is a necessity if the similaritymetric is to be applied to large networks. For this reason Ido not calculate the exact probabilities for the numbers ofshared and non-shared nodes and links that are observed(the combinatoric complexity of such calculations isenormous). Instead I make the simplifying assumption thateach node and link contribute the likelihood ratios givenabove and that the total odds is obtained by multiplying allthe likelihood ratios together. This simplification canperhaps be justified if similar distortion is produced by thissimplifying assumption for both the cases of G and (G V ,G W )as generators. Under this simplifying assumption the overallodds becomes:φ(1/2) = (λ js ) nc (λ kV ) nV (λ jW ) nW (π js ) Nc (π kV ) NV (π jW ) NWTaking the log of this product converts the calculation tosums and makes calculation highly efficient.This abstract is too short to permit giving the different andmore complex results that hold for the several cases whenthe nodes and/or links have associated strengths. I give asummary of some of the results here. The results for linksand nodes are similar so consider the results for nodes. Letthere be just one set of strength values, Si for the i-th node.Norm these to sum to 1.0. For either generation by G or(Gv,Gw) assume sampling is made without replacement andproportional to strength. Let Ziv and Ziw be theprobabilities that node i will be sampled by n v +n c samples,or n w +n c samples respectively. The Z’s would be difficult to obtain analytically but could be estimated by Monte Carlosampling. Consider two possibilities for the way that Gv andGw overlap. In Case A the probability a node will be sharedis simply α, independent of strength. In Case B, theprobability a node will be shared is an increasingfunction of strength, Y i .For Case A the likelihood ratio for a shared node i is:1/α. For a node k only in V the likelihood ratio is: λ kV =(1-‐Zkw)/{1 – α (1-‐Zkw)}. For a node only in W exchangethe subscripts v and w. Then we have for the odds due tonodes: φ D = (1/α) nc Π k (λ kV )Π j (λ jW ).For Case B the likelihood ratio for a shared node i is1/Y i . For a node k only in V the likelihood ratio is: λ kV =(1-‐Zkw)/{1–Y k (1-‐Zkw)}. Again switch v and w subscriptsfor a node only in W. Then we have for the odds due tonodes: φ D = Π i (1/Y i )Π k (λ kV )Π j (λ jW ).These expressions would have analogous forms forlinks, with different Ns, Z’s and Y’s, and the overall oddswould, as before, be a product of the odds for nodes andthe odds for links.The critical difference between Cases A and B is thedegree to which evidence based on an observed sharednode or link is strength dependent: For Case B thisevidence rises as strength decreases. This should raiseconcerns: However strengths are obtained there is likelyto be measurement noise that reduces the reliability oflow strength values. This might argue in favor of usingCase A, or if one preferred Case B to restrict the Yi valuesto lie above a lower bound. The idea would be to letevidence depend most on the nodes (or links with highstrength values.It should be observed that the existence of acomputationally efficient and generally applicable metricfor network similarity would allow alignment of non-labeled networks. One would search for the alignment ofnodes that would maximize the metric.I have many relevant publications demonstrating somedegree of expertise in Bayesian modeling (e.g.: Shiffrin &Chandramouli, in press; Shiffrin, Chandramouli, &Grünwald, 2015; Chandramouli & Shiffrin, 2015; Nelson &Shiffrin, 2013; Cox & Shiffrin, 2012; Shiffrin, Lee, Kim, &Wagenmakers, 2008; Cohen, Shiffrin, Gold, Ross, & Ross,2007; Denton & Shiffrin; Huber, Shiffrin, Lyle, & Ruys,2001; Shiffrin & Steyvers, 1997). I note that the presentresults are in a vague sense an extension of the metricproposed for matching memory probes to memory tracesthat are given in Cox and Shiffrin (2012) and in theappendix of Nelsonb and Shiffrin (2013)

eScholarship - University of California

Prime diagnosticity in short-term repetition priming: Is primed evidence discounted, even when it reliably indicates the correct answer?

Author: Huber David E.
Shiffrin Richard M.
Weidemann Christoph T.
Publication venue
Publication date: 01/01/2008
Field of study

The authors conducted 4 repetition priming experiments that manipulated prime duration and prime diagnosticity in a visual forced-choice perceptual identification task. The strength and direction of prime diagnosticity produced marked effects on identification accuracy, but those effects were resistant to subsequent changes of diagnosticity. Participants learned to associate different diagnosticities with primes of different durations but not with primes presented in different colors. Regardless of prime diagnosticity, preference for a primed alternative covaried negatively with prime duration, suggesting that even for diagnostic primes, evidence discounting remains an important factor. A computational model, with the assumption that adaptation to the statistics of the experiment modulates the level of evidence discounting, accounted for these results

Crossref

Cronfa at Swansea University

CogPrints Cognitive Sciences Eprint Archive

Recommended from our members

Constraints on Models of Recognition and Recall Imposed by Data on the Time Course of Retrieval

Author: Nobel Peter A.
Shiffrin Richard M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Reaction time distributions in recognition conditions were compared to those in cued recall to explore the time course of retrieval, to test current models, and to provide constraints for the development of n e w models (including, to take an example, the class of recurrent neural nets, since they naturally produce reaction time predictions). Two different experimental paradigms were used. Results from a free response procedure showed fundamental differences between the two test modes, both in mean reaction time and the general shape of the distributions. Analysis of data from a signal-to-respond procedure revealed large differences between recognition and recall in the rate of growth of performance. These results suggest the existence of different processes underlying retrieval in recognition and cued recall. One model posits parallel activation of separate memory traces; for recognition, the summed activation is used for a decision, but for recall a search is based on sequential probabilistic choices from the traces. Further constraining models was the observation of nearly identical reaction time distributions for positive and negative responses in recognition, suggesting a single process for recognition decisions for targets and distractors

eScholarship - University of California

Confusion and Compensation in Visual Perception: Effects of Spatiotemporal Proximity and Selective Attention

Author: Christoph T. Weidemann
David E. Huber
Fox
Huber
Kucera
Loftus
Macmillan
Meyer
Neill
Paap
Richard M. Shiffrin
Shiffrin
Shiffrin
Sperling
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2005
Field of study

The authors investigated spatial, temporal, and attentional manipulations in a short-term repetition priming paradigm. Brief primes produced a strong preference to choose the primed alternative, whereas long primes had the opposite effect. However, a 2nd brief presentation of a long prime produced a preference for the primed word despite the long total prime duration. These surprising results are explained by a computational model that posits the offsetting components of source confusion (prime features are confused with target features) and discounting (evidence from primed features is discounted). The authors obtained compelling evidence for these components by showing how they can cooperate or compete through different manipulations of prime salience. The model allows for dissociations between prime salience and the magnitude of priming, thereby providing a unified account of "subliminal" and "supraliminal" priming

CiteSeerX

Crossref

Cronfa at Swansea University

Extraordinary claims, extraordinary evidence? A discussion

Author: Chandramouli Suyog H.
Crystal Jonathon D.
Matzke Dora
Morey Richard D.
Murphy Mary C.
Shiffrin Richard M.
Vandekerckhove Joachim
Wagenmakers E. -J.
Zorzi Marco
Publication venue
Publication date: 01/01/2021
Field of study

Roberts (2020, Learning & Behavior, 48[2], 191-192) discussed research claiming honeybees can do arithmetic. Some readers of this research might regard such claims as unlikely. The present authors used this example as a basis for a debate on the criterion that ought to be used for publication of results or conclusions that could be viewed as unlikely by a significant number of readers, editors, or reviewers.Peer reviewe

PsyArxiv

Online Research @ Cardiff

PubMed Central

eScholarship - University of California

Helsingin yliopiston digitaalinen arkisto

International Migration, Integration and Social Cohesion online publications

UvA-DARE

How should the advent of large language models affect the practice of science?

Author: Aczel Balazs
Akata Zeynep
Alaniz Stephan
Allen Colin
Bender Emily M.
Bergstrom Carl T.
Binz Marcel
Botvinick Matthew M.
Gershman Samuel J.
Marelli Marco
Popov Ven
Roskies Adina
Schad Daniel
Schulz Eric
Shiffrin Richard M.
West Jevin D.
Wulff Dirk
Zhang Qiong
Publication venue
Publication date: 05/12/2023
Field of study

Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices

arXiv.org e-Print Archive

Statistics in the service of science : don’t let the tail wag the dog

Author: Chandramouli Suyog H.
Cox Gregory E.
Davis-Stober Clintin P.
Dunn John C.
Gronau Quentin F.
Kalish Michael L.
Kellen David
McMullin Sara D.
Navarro Danielle J.
Shiffrin Richard M.
Singmann Henrik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxical—in essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination, which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization, which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios

UCL Discovery

Warwick Research Archives Portal Repository

Helsingin yliopiston digitaalinen arkisto

Mechanisms of source confusion and discounting in short-term priming: 1. Effects of prime duration and prime recognition

Author: A. J. Marcel
D. E. Huber
D. E. Meyer
D. E. Meyer
David E. Huber
E.-J. M. Wagenmakers
G. Lukatela
G. W. Humphreys
H. KuÏcera
J. S. Bowers
Keith B. Lyle
L. Hochhaus
L. J. Evett
M. E. J. Masson
N. A. Macmillan
R. Ratcliff
R. Ratcliff
R. Ratcliff
R. Ratcliff
R. Zeelenberg
Raushanna Quach
Richard M. Shiffrin
S. Roberts
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref