174 research outputs found
Scientific progress despite irreproducibility: A seeming paradox
It appears paradoxical that science is producing outstanding new results and
theories at a rapid rate at the same time that researchers are identifying
serious problems in the practice of science that cause many reports to be
irreproducible and invalid. Certainly the practice of science needs to be
improved and scientists are now pursuing this goal. However, in this
perspective we argue that this seeming paradox is not new, has always been part
of the way science works, and likely will remain so. We first introduce the
paradox. We then review a wide range of challenges that appear to make
scientific success difficult. Next, we describe the factors that make science
work-in the past, present, and presumably also in the future. We then suggest
that remedies for the present practice of science need to be applied
selectively so as not to slow progress, and illustrate with a few examples. We
conclude with arguments that communication of science needs to emphasize not
just problems but the enormous successes and benefits that science has brought
and is now bringing to all elements of modern society.Comment: 3 figure
Recommended from our members
A Bayesian Metric for Network Similarity
Networks of every kind and in numerous fields areomnipresent in todayās society (e.g. brain networks, socialnetworks) and are the intense subject of research. It wouldbe of great utility to have a computationally efficient andgenerally applicable method for assessing similarity ofnetworks. The field (going back to the 1950s) has not comeup with such a method (albeit a few moves in this directionexist, such as Jaccard coefficients, QAP--quadraticassignment procedure, and more recently Menezes & Roth,2013, and Asta & Shalizi, 2014). I present a Bayesian-basedmetric for assessing similarity of two networks, possibly ofdifferent size, that include nodes and links between nodes. Iassume the nodes are labeled so that both the nodes andlinks between two nodes that are shared between the twonetworks can be identified.The method calculates similarity as (a monotonictransformation of) the odds that the two observed networks,termed V and W, were produced by random sampling froma single master network, termed G, as opposed to generationby two different but similar networks, termed Gv and Gw.The simplest form of the method ignores strengths thatcould be assigned to nodes and links, and considers onlynodes and links that are, or are not, shared by the networks.Suppose there are n V nodes and N V links only in V, n Wnodes and N W links only in W and n c nodes and N c linksshared between the networks. Thus the number of nodes inV is n c + n V and the number in W is n c + n W . The number ofunique nodes in both V and W is n c + n V + n W = n. Thenumber of links in V is N c + N V and the number in W is N c +N W . The number of unique links in both V and W is N c + N V+ N W = N.The single master network, G, is assumed to consist of theunion of the nodes and links in the two networks, and has nnodes and N links. The probability a given shared node willbe randomly and independently sampled twice is[(n V +n c )/n][(n W +n c )/n]. The probability a given shared linkwill be randomly and independently sampled twice is[(N V +N c )/N][(N W +N c )/N].If there are two generating networks I assume they eachhave n nodes and N links. I also assume they are similar, because we would not be comparing dissimilar networks.The degree of similarity is controlled by ātuningāparameters 1 : Gv and Gw are assumed to share Ī±n nodes andĪ²N links. The probability a given shared node will besampled twice is then Ī±[(n V +n c )/n][(n W +n c )/n], and theprobability a given shared link will be sampled twice isĪ²[(N V +N c )/N][(N W +N c )/N]. The likelihood ratio Ī» js for G vs(GV, GW) as generator of a given shared node is then 1/Ī±and the likelihood ratio Ļ js of a given shared link is then 1/Ī².For a non-shared node, say in V, similar reasoning gives alikelihood ratio Ī» kV of[1-(n W +n c )/n)] /[1ā Ī±(n W +n c )/n]and for a non-shared link a likelihood ratio Ļ kV of[1-(N W +N c )/n)] /[1ā Ī±(N W +N c )/N]For a non-shared node or link in W substitute a Wsubscript for the V subscript in these likelihood ratios.Computational efficiency is a necessity if the similaritymetric is to be applied to large networks. For this reason Ido not calculate the exact probabilities for the numbers ofshared and non-shared nodes and links that are observed(the combinatoric complexity of such calculations isenormous). Instead I make the simplifying assumption thateach node and link contribute the likelihood ratios givenabove and that the total odds is obtained by multiplying allthe likelihood ratios together. This simplification canperhaps be justified if similar distortion is produced by thissimplifying assumption for both the cases of G and (G V ,G W )as generators. Under this simplifying assumption the overallodds becomes:Ļ(1/2) = (Ī» js ) nc (Ī» kV ) nV (Ī» jW ) nW (Ļ js ) Nc (Ļ kV ) NV (Ļ jW ) NWTaking the log of this product converts the calculation tosums and makes calculation highly efficient.This abstract is too short to permit giving the different andmore complex results that hold for the several cases whenthe nodes and/or links have associated strengths. I give asummary of some of the results here. The results for linksand nodes are similar so consider the results for nodes. Letthere be just one set of strength values, Si for the i-th node.Norm these to sum to 1.0. For either generation by G or(Gv,Gw) assume sampling is made without replacement andproportional to strength. Let Ziv and Ziw be theprobabilities that node i will be sampled by n v +n c samples,or n w +n c samples respectively. The Zās would be difficult to obtain analytically but could be estimated by Monte Carlosampling. Consider two possibilities for the way that Gv andGw overlap. In Case A the probability a node will be sharedis simply Ī±, independent of strength. In Case B, theprobability a node will be shared is an increasingfunction of strength, Y i .For Case A the likelihood ratio for a shared node i is:1/Ī±. For a node k only in V the likelihood ratio is: Ī» kV =(1-ĀāZkw)/{1 ā Ī± (1-ĀāZkw)}. For a node only in W exchangethe subscripts v and w. Then we have for the odds due tonodes: Ļ D = (1/Ī±) nc Ī k (Ī» kV )Ī j (Ī» jW ).For Case B the likelihood ratio for a shared node i is1/Y i . For a node k only in V the likelihood ratio is: Ī» kV =(1-ĀāZkw)/{1āY k (1-ĀāZkw)}. Again switch v and w subscriptsfor a node only in W. Then we have for the odds due tonodes: Ļ D = Ī i (1/Y i )Ī k (Ī» kV )Ī j (Ī» jW ).These expressions would have analogous forms forlinks, with different Ns, Zās and Yās, and the overall oddswould, as before, be a product of the odds for nodes andthe odds for links.The critical difference between Cases A and B is thedegree to which evidence based on an observed sharednode or link is strength dependent: For Case B thisevidence rises as strength decreases. This should raiseconcerns: However strengths are obtained there is likelyto be measurement noise that reduces the reliability oflow strength values. This might argue in favor of usingCase A, or if one preferred Case B to restrict the Yi valuesto lie above a lower bound. The idea would be to letevidence depend most on the nodes (or links with highstrength values.It should be observed that the existence of acomputationally efficient and generally applicable metricfor network similarity would allow alignment of non-labeled networks. One would search for the alignment ofnodes that would maximize the metric.I have many relevant publications demonstrating somedegree of expertise in Bayesian modeling (e.g.: Shiffrin &Chandramouli, in press; Shiffrin, Chandramouli, &GrĆ¼nwald, 2015; Chandramouli & Shiffrin, 2015; Nelson &Shiffrin, 2013; Cox & Shiffrin, 2012; Shiffrin, Lee, Kim, &Wagenmakers, 2008; Cohen, Shiffrin, Gold, Ross, & Ross,2007; Denton & Shiffrin; Huber, Shiffrin, Lyle, & Ruys,2001; Shiffrin & Steyvers, 1997). I note that the presentresults are in a vague sense an extension of the metricproposed for matching memory probes to memory tracesthat are given in Cox and Shiffrin (2012) and in theappendix of Nelsonb and Shiffrin (2013)
Prime diagnosticity in short-term repetition priming: Is primed evidence discounted, even when it reliably indicates the correct answer?
The authors conducted 4 repetition priming experiments that manipulated prime duration and prime diagnosticity in a visual forced-choice perceptual identification task. The strength and direction of prime diagnosticity produced marked effects on identification accuracy, but those effects were resistant to subsequent changes of diagnosticity. Participants learned to associate different diagnosticities with primes of different durations but not with primes presented in different colors. Regardless of prime diagnosticity, preference for a primed alternative covaried negatively with prime duration, suggesting that even for diagnostic primes, evidence discounting remains an important factor. A computational model, with the assumption that adaptation to the statistics of the experiment modulates the level of evidence discounting, accounted for these results
Recommended from our members
Constraints on Models of Recognition and Recall Imposed by Data on the Time Course of Retrieval
Reaction time distributions in recognition conditions were compared to those in cued recall to explore the time course of retrieval, to test current models, and to provide constraints for the development of n e w models (including, to take an example, the class of recurrent neural nets, since they naturally produce reaction time predictions). Two different experimental paradigms were used. Results from a free response procedure showed fundamental differences between the two test modes, both in mean reaction time and the general shape of the distributions. Analysis of data from a signal-to-respond procedure revealed large differences between recognition and recall in the rate of growth of performance. These results suggest the existence of different processes underlying retrieval in recognition and cued recall. One model posits parallel activation of separate memory traces; for recognition, the summed activation is used for a decision, but for recall a search is based on sequential probabilistic choices from the traces. Further constraining models was the observation of nearly identical reaction time distributions for positive and negative responses in recognition, suggesting a single process for recognition decisions for targets and distractors
Confusion and Compensation in Visual Perception: Effects of Spatiotemporal Proximity and Selective Attention
The authors investigated spatial, temporal, and attentional manipulations in a short-term repetition priming paradigm. Brief primes produced a strong preference to choose the primed alternative, whereas long primes had the opposite effect. However, a 2nd brief presentation of a long prime produced a preference for the primed word despite the long total prime duration. These surprising results are explained by a computational model that posits the offsetting components of source confusion (prime features are confused with target features) and discounting (evidence from primed features is discounted). The authors obtained compelling evidence for these components by showing how they can cooperate or compete through different manipulations of prime salience. The model allows for dissociations between prime salience and the magnitude of priming, thereby providing a unified account of "subliminal" and "supraliminal" priming
Extraordinary claims, extraordinary evidence? A discussion
Roberts (2020, Learning & Behavior, 48[2], 191-192) discussed research claiming honeybees can do arithmetic. Some readers of this research might regard such claims as unlikely. The present authors used this example as a basis for a debate on the criterion that ought to be used for publication of results or conclusions that could be viewed as unlikely by a significant number of readers, editors, or reviewers.Peer reviewe
How should the advent of large language models affect the practice of science?
Large language models (LLMs) are being increasingly incorporated into
scientific workflows. However, we have yet to fully grasp the implications of
this integration. How should the advent of large language models affect the
practice of science? For this opinion piece, we have invited four diverse
groups of scientists to reflect on this query, sharing their perspectives and
engaging in debate. Schulz et al. make the argument that working with LLMs is
not fundamentally different from working with human collaborators, while Bender
et al. argue that LLMs are often misused and over-hyped, and that their
limitations warrant a focus on more specialized, easily interpretable tools.
Marelli et al. emphasize the importance of transparent attribution and
responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans
should retain responsibility for determining the scientific roadmap. To
facilitate the discussion, the four perspectives are complemented with a
response from each group. By putting these different perspectives in
conversation, we aim to bring attention to important considerations within the
academic community regarding the adoption of LLMs and their impact on both
current and future scientific practices
Statistics in the service of science : donāt let the tail wag the dog
Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxicalāin essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination, which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization, which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios
- ā¦