7 research outputs found

    A new theorem in particle physics enabled by machine discovery

    Get PDF
    AbstractA widespread objection to research on scientific discovery is that there has been a noticeable dearth of significant novel findings in domain sciences contributed by machine discovery programs. The implication is that the essential parts of the discovery process are not captured by these programs. The aim of this note is to document for the AI audience a novel finding in particle physics that was enabled by the machine discovery program PAULI reported previously. This finding consists of a theorem that expresses the minimum number of conservation laws that are needed, mathematically speaking, to account for any consistent experimental data on particle reactions. This note also reports how a puzzle raised by the theorem—its conflict with physics practice—is resolved

    The computer revolution in science: steps towards the realization of computer-supported discovery environments

    Get PDF
    The tools that scientists use in their search processes together form so-called discovery environments. The promise of artificial intelligence and other branches of computer science is to radically transform conventional discovery environments by equipping scientists with a range of powerful computer tools including large-scale, shared knowledge bases and discovery programs. We will describe the future computer-supported discovery environments that may result, and illustrate by means of a realistic scenario how scientists come to new discoveries in these environments. In order to make the step from the current generation of discovery tools to computer-supported discovery environments like the one presented in the scenario, developers should realize that such environments are large-scale sociotechnical systems. They should not just focus on isolated computer programs, but also pay attention to the question how these programs will be used and maintained by scientists in research practices. In order to help developers of discovery programs in achieving the integration of their tools in discovery environments, we will formulate a set of guidelines that developers could follow

    Mind change efficient learning

    Get PDF
    This paper studies efficient learning with respect to mind changes. Our starting point is the idea that a learner that is efficient with respect to mind changes minimizes mind changes not only globally in the entire learning problem, but also locally in subproblems after receiving some evidence. Formalizing this idea leads to the notion of uniform mind change optimality. We characterize the structure of language classes that can be identified with at most α mind changes by some learner (not necessarily effective): A language class L is identifiable with α mind changes iff the accumulation order of L is at most α. Accumulation order is a classic concept from point-set topology. To aid the construction of learning algorithms, we show that the characteristic property of uniformly mind change optimal learners is that they output conjectures (languages) with maximal accumulation order. We illustrate the theory by describing mind change optimal learners for various problems such as identifying linear subspaces and one-variable patterns

    Optimizacija zaključivanja u nauci : pristup zasnovan na podacima

    Get PDF
    Scientific reasoning represents complex argumentation patterns that eventually lead to scientific discoveries. Social epistemology of science provides a perspective on the scientific community as a whole and on its collective knowledge acquisition. Different techniques have been employed with the goal of maximization of scientific knowledge on the group level. These techniques include formal models and computer simulations of scientific reasoning and interaction. Still, these models have tested mainly abstract hypothetical scenarios. The present thesis instead presents data-driven approaches in social epistemology of science. A data-driven approach requires data collection and curation for its further usage, which can include creating empirically calibrated models and simulations of scientific inquiry, performing statistical analyses, or employing datamining techniques and other procedures. We present and analyze in detail three co-authored research projects on which the thesis’ author was engaged during her PhD. The first project sought to identify optimal team composition in high energy physics laboratories using data-mining techniques. The results of this project are published in (Perovic et al. 2016), and indicate that projects with smaller numbers of teams and team members outperform bigger ones. In the second project, we attempted to determine whether there is an epistemic saturation point in experimentation in high energy physics. The initial results from this project are published in (Sikimic et al. 2018). In the thesis, we expand on this topic by using computer simulations to test for biases that could induce scientists to invest in projects 5 6 beyond their epistemic saturation point. Finally, in previous examples of data-driven analyses, citations are used as a measure of epistemic efficiency of projects in high energy physics. In order to additionally justify and analyze the usage of this parameter in their data-driven research, in the third project Perovic & Sikimic (under revision) analyzed and compared inductive patterns in experimental physics and biology with the reliability of citation records in these fields. They conclude that while citations are a relatively reliable measure of efficiency in high energy physics research, the same does not hold for the majority of research in experimental biology. Additionally, contributions of the author that are for the first time published in this theses are: (a) an empirically calibrated model of scientific interaction of research groups in biology, (b) a case study of irregular argumentation patterns in some pathogen discoveries, and (c) an introductory discussion of the benefits and limitations of datadriven approaches to the social epistemology of science. Using computer simulations of an empirically calibrated model, we demonstrate that having several levels of hierarchy and division into smaller research sub-teams is epistemically beneficial for researchers in experimental biology. We also show that argumentation analysis in biology represents a good starting point for further data-driven analyses in the field. Finally, we conclude that a data-driven approach is informative and useful for science policy, but requires careful considerations about data collection, curation, and interpretationZakljucivanje u nauci ogleda se u složenim argumentativnim strukturama koje u krajnjoj instanci dovode do naucnih otkrica. Socijalna epistemologija nauke posmatra nauku iz perspektive celokupne naucne zajednice i bavi se kolektivnim sticanjem znanja. Razlicite tehnike su se primenjivale u cilju maksimizacije naucnog znanja na nivou grupe. Ove tehnike ukljucuju formalne modele i kompijuterske simulacije naucnog zakljucivanja i interakcije. Ipak, ovi modeli su uglavnom testirali hipoteticke scenarije. Sa druge strane, ova disertacija predstavlja pristupe u socijalnoj epistemologiji nauke koji se zasnivaju na podacima. Pristup zasnovan na podacima podrazumeva prikupljanje podataka i njihovo sistematizovanje za dalju upotrebu. Ova upotreba podrazumeva empirijski kalibrirane modele i simulacije naucnog procesa, statisticke analize, algoritme za obradu velikog broja podataka itd. U tekstu predstavljamo i detaljno analiziramo tri koautorska istraživanja u kojima je autorka disertacije ucestvovala tokom doktorskih studija. Prvo istraživanje imalo je za cilj da odredi optimalnu strukturu timova u laboratorijama fizike visokih energija koristeci algoritme za obradu velikog broja podataka. Rezultati ovog istraživanja su objavljeni u (Perovic et al. 2016) i ukazuju na to da su projekti u koje je ukljucen manji broj timova i istraživaca efikasniji od vecih. U drugom istraživanju smo pokušali da utvrdimo da li postoji tacka epistemickog zasicenja, kada su u pitanju eksperimenti u fizici visokih energija. Inicijalni rezultati ovog istraživanja objavljeni su u (Sikimic et al. 2018). U disertaciji produbljujemo ovu temu korišcenjem kompjuterskih simulacija da 7 8 bismo testirali mehanizme pristrasnosti koji navode naucnike da ulažu u projekte iznad tacke epistemickog zasicenja. Konacno, u prethodnim primerima analiza zasnovanih na podacima, citiranost je korišcena kao mera epistemicke efikasnosti pojekata u fizici visokih energija. Da bi dodatno opravdali upotrebu ovog parametra u svojim analizama, u trecem istraživanju Perovic & Sikimic (under revision) su razmatrali i upore ivali induktivne šematizme u eksperimentalnoj fizici i biologiji sa pouzdanošcu mere citiranosti u ovim oblastima. Zakljucili su da, iako su citati relativno pouzdana mera efikasnosti u fizici visokih energija, to nije slucaj u najvecem delu istraživanja u oblasti eksperimentalne biologije. Povrh toga, doprinosi autorke koji su prvi put objavljeni u ovoj disertaciji jesu: (a) empirijski kalibrirani model naucne komunikacije unutar istraživackih grupa u biologiji, (b) analiza neocekivanih argumentativnih struktura u otkricima nekih patogena i (c) uvodna diskusija u pogledu prednosti i ogranicenja pristupa zasnovanih na podacima u socijalnoj epistemologiji nauke. Korišcenjem kompjuterskih simulacija na empirijski kalibriranim modelima, pokazujemo da je raslojavanje i podela na manje istraživacke timove epistemicki korisno za istraživace u eksperimentalnoj biologiji. Tako e, pokazujemo da je analiza argumenata u biologiji dobra osnova za dalje analize zasnovane na podacima u ovoj oblasti. Na kraju, zakljucujemo da je pristup zasnovan na podacima informativan i koristan za kreiranje naucne politike, ali da zahteva pažljiva razmatranja u pogledu prikupljanja podataka, njihovog sortiranja i interpretiranj

    Optimization of scientific reasoning : а data-driven approach

    Get PDF
    Zakljucivanje u nauci ogleda se u složenim argumentativnim strukturama koje u krajnjoj instanci dovode do naucnih otkrica. Socijalna epistemologija nauke posmatra nauku iz perspektive celokupne naucne zajednice i bavi se kolektivnim sticanjem znanja. Razlicite tehnike su se primenjivale u cilju maksimizacije naucnog znanja na nivou grupe. Ove tehnike ukljucuju formalne modele i kompijuterske simulacije naucnog zakljucivanja i interakcije. Ipak, ovi modeli su uglavnom testirali hipoteticke scenarije. Sa druge strane, ova disertacija predstavlja pristupe u socijalnoj epistemologiji nauke koji se zasnivaju na podacima. Pristup zasnovan na podacima podrazumeva prikupljanje podataka i njihovo sistematizovanje za dalju upotrebu. Ova upotreba podrazumeva empirijski kalibrirane modele i simulacije naucnog procesa, statisticke analize, algoritme za obradu velikog broja podataka itd. U tekstu predstavljamo i detaljno analiziramo tri koautorska istraživanja u kojima je autorka disertacije ucestvovala tokom doktorskih studija. Prvo istraživanje imalo je za cilj da odredi optimalnu strukturu timova u laboratorijama fizike visokih energija koristeci algoritme za obradu velikog broja podataka. Rezultati ovog istraživanja su objavljeni u (Perovic et al. 2016) i ukazuju na to da su projekti u koje je ukljucen manji broj timova i istraživaca efikasniji od vecih. U drugom istraživanju smo pokušali da utvrdimo da li postoji tacka epistemickog zasicenja, kada su u pitanju eksperimenti u fizici visokih energija. Inicijalni rezultati ovog istraživanja objavljeni su u (Sikimic et al. 2018). U disertaciji produbljujemo ovu temu korišcenjem kompjuterskih simulacija da 7 8 bismo testirali mehanizme pristrasnosti koji navode naucnike da ulažu u projekte iznad tacke epistemickog zasicenja. Konacno, u prethodnim primerima analiza zasnovanih na podacima, citiranost je korišcena kao mera epistemicke efikasnosti pojekata u fizici visokih energija. Da bi dodatno opravdali upotrebu ovog parametra u svojim analizama, u trecem istraživanju Perovic & Sikimic (under revision) su razmatrali i upore ivali induktivne šematizme u eksperimentalnoj fizici i biologiji sa pouzdanošcu mere citiranosti u ovim oblastima. Zakljucili su da, iako su citati relativno pouzdana mera efikasnosti u fizici visokih energija, to nije slucaj u najvecem delu istraživanja u oblasti eksperimentalne biologije. Povrh toga, doprinosi autorke koji su prvi put objavljeni u ovoj disertaciji jesu: (a) empirijski kalibrirani model naucne komunikacije unutar istraživackih grupa u biologiji, (b) analiza neocekivanih argumentativnih struktura u otkricima nekih patogena i (c) uvodna diskusija u pogledu prednosti i ogranicenja pristupa zasnovanih na podacima u socijalnoj epistemologiji nauke. Korišcenjem kompjuterskih simulacija na empirijski kalibriranim modelima, pokazujemo da je raslojavanje i podela na manje istraživacke timove epistemicki korisno za istraživace u eksperimentalnoj biologiji. Tako e, pokazujemo da je analiza argumenata u biologiji dobra osnova za dalje analize zasnovane na podacima u ovoj oblasti. Na kraju, zakljucujemo da je pristup zasnovan na podacima informativan i koristan za kreiranje naucne politike, ali da zahteva pažljiva razmatranja u pogledu prikupljanja podataka, njihovog sortiranja i interpretiranjaScientific reasoning represents complex argumentation patterns that eventually lead to scientific discoveries. Social epistemology of science provides a perspective on the scientific community as a whole and on its collective knowledge acquisition. Different techniques have been employed with the goal of maximization of scientific knowledge on the group level. These techniques include formal models and computer simulations of scientific reasoning and interaction. Still, these models have tested mainly abstract hypothetical scenarios. The present thesis instead presents data-driven approaches in social epistemology of science. A data-driven approach requires data collection and curation for its further usage, which can include creating empirically calibrated models and simulations of scientific inquiry, performing statistical analyses, or employing datamining techniques and other procedures. We present and analyze in detail three co-authored research projects on which the thesis’ author was engaged during her PhD. The first project sought to identify optimal team composition in high energy physics laboratories using data-mining techniques. The results of this project are published in (Perovic et al. 2016), and indicate that projects with smaller numbers of teams and team members outperform bigger ones. In the second project, we attempted to determine whether there is an epistemic saturation point in experimentation in high energy physics. The initial results from this project are published in (Sikimic et al. 2018). In the thesis, we expand on this topic by using computer simulations to test for biases that could induce scientists to invest in projects 5 6 beyond their epistemic saturation point. Finally, in previous examples of data-driven analyses, citations are used as a measure of epistemic efficiency of projects in high energy physics. In order to additionally justify and analyze the usage of this parameter in their data-driven research, in the third project Perovic & Sikimic (under revision) analyzed and compared inductive patterns in experimental physics and biology with the reliability of citation records in these fields. They conclude that while citations are a relatively reliable measure of efficiency in high energy physics research, the same does not hold for the majority of research in experimental biology. Additionally, contributions of the author that are for the first time published in this theses are: (a) an empirically calibrated model of scientific interaction of research groups in biology, (b) a case study of irregular argumentation patterns in some pathogen discoveries, and (c) an introductory discussion of the benefits and limitations of datadriven approaches to the social epistemology of science. Using computer simulations of an empirically calibrated model, we demonstrate that having several levels of hierarchy and division into smaller research sub-teams is epistemically beneficial for researchers in experimental biology. We also show that argumentation analysis in biology represents a good starting point for further data-driven analyses in the field. Finally, we conclude that a data-driven approach is informative and useful for science policy, but requires careful considerations about data collection, curation, and interpretatio
    corecore