169 research outputs found

    Scienceography: the study of how science is written

    Full text link
    Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.Comment: 13 pages,16 figures. Sixth International Conference on FUN WITH ALGORITHMS, 201

    The source-item coverage of the exponential function

    No full text
    International audienceStatistical distributions in the production of information are most often studied in the framework of Lotkaian informetrics. In this article, we recall some results of basic theory of Lotkaian informetrics, then we transpose methods (Theorem1) applied to Lotkaian distributions by Leo Egghe (Theorem 2) to the exponential distributions (Theorem 3, Theorem 4). We give examples and compare the results (Theorem 5). Finally, we propose to widen the problem using the concept of exponential informetric process (Theorem 6).Les distributions statistiques dans le domaine de la production d'information sont le plus souvent étudiées dans le cadre de l'infométrie lotkaienne. Dans cet article, nous rappelons les résultats de base de la théorie lotkaienne, puis nous transposons les méthodes (théorème 1) appliquées aux distributions lotkaiennes par Leo Egghe (Théorème 2) aux distributions exponentielles (Théorème 3, Théorème 4). Nous donnons des exemples, puis nous élargissons le problème au concept de processus exponentiel infométrique (Théorème 6

    AI for social good: social media mining of migration discourse

    Get PDF
    The number of international migrants has steadily increased over the years, and it has become one of the pressing issues in today’s globalized world. Our bibliometric review of around 400 articles on Scopus platform indicates an increased interest in migration-related research in recent times but the extant research is scattered at best. AI-based opinion mining research has predominantly noted negative sentiments across various social media platforms. Additionally, we note that prior studies have mostly considered social media data in the context of a particular event or a specific context. These studies offered a nuanced view of the societal opinions regarding that specific event, but this approach might miss the forest for the trees. Hence, this dissertation makes an attempt to go beyond simplistic opinion mining to identify various latent themes of migrant-related social media discourse. The first essay draws insights from the social psychology literature to investigate two facets of Twitter discourse, i.e., perceptions about migrants and behaviors toward migrants. We identified two prevailing perceptions (i.e., sympathy and antipathy) and two dominant behaviors (i.e., solidarity and animosity) of social media users toward migrants. Additionally, this essay has also fine-tuned the binary hate speech detection task, specifically in the context of migrants, by highlighting the granular differences between the perceptual and behavioral aspects of hate speech. The second essay investigates the journey of migrants or refugees from their home to the host country. We draw insights from Gennep's seminal book, i.e., Les Rites de Passage, to identify four phases of their journey: Arrival of Refugees, Temporal stay at Asylums, Rehabilitation, and Integration of Refugees into the host nation. We consider multimodal tweets for this essay. We find that our proposed theoretical framework was relevant for the 2022 Ukrainian refugee crisis – as a use-case. Our third essay points out that a limited sample of annotated data does not provide insights regarding the prevailing societal-level opinions. Hence, this essay employs unsupervised approaches on large-scale societal datasets to explore the prevailing societal-level sentiments on YouTube platform. Specifically, it probes whether negative comments about migrants get endorsed by other users. If yes, does it depend on who the migrants are – especially if they are cultural others? To address these questions, we consider two datasets: YouTube comments before the 2022 Ukrainian refugee crisis, and during the crisis. Second dataset confirms the Cultural Us hypothesis, and our findings are inconclusive for the first dataset. Our final or fourth essay probes social integration of migrants. The first part of this essay probed the unheard and faint voices of migrants to understand their struggle to settle down in the host economy. The second part of this chapter explored the viability of social media platforms as a viable alternative to expensive commercial job portals for vulnerable migrants. Finally, in our concluding chapter, we elucidated the potential of explainable AI, and briefly pointed out the inherent biases of transformer-based models in the context of migrant-related discourse. To sum up, the importance of migration was recognized as one of the essential topics in the United Nation’s Sustainable Development Goals (SDGs). Thus, this dissertation has attempted to make an incremental contribution to the AI for Social Good discourse

    Mapping the Evolution of "Clusters": A Meta-analysis

    Get PDF
    This paper presents a meta-analysis of the “cluster literature” contained in scientific journals from 1969 to 2007. Thanks to an original database we study the evolution of a stream of literature which focuses on a research object which is both a theoretical puzzle and an empirical widespread evidence. We identify different growth stages, from take-off to development and maturity. We test the existence of a life-cycle within the authorships and we discover the existence of a substitutability relation between different collaborative behaviours. We study the relationships between a “spatial” and an “industrial” approach within the textual corpus of cluster literature and we show the existence of a “predatory” interaction. We detect the relevance of clustering behaviours in the location of authors working on clusters and in measuring the influence of geographical distance in co-authorship. We measure the extent of a convergence process of the vocabulary of scientists working on clusters.Cluster, Life-Cycle, Cluster Literature, Textual Analysis, Agglomeration, Co-Authorship

    The Interdependence of Scientists in the Era of Team Science: An Exploratory Study Using Temporal Network Analysis

    Get PDF
    How is the rise in team science and the emergence of the research group as the fundamental unit of organization of science affecting scientists’ opportunities to collaborate? Are the majority of scientists becoming dependent on a select subset of their peers to organize the intergroup collaborations that are becoming the norm in science? This dissertation set out to explore the evolving nature of scientists’ interdependence in team-based research environments. The research was motivated by the desire to reconcile emerging views on the organization of scientific collaboration with the theoretical and methodological tendencies to think about and study scientists as autonomous actors who negotiate collaboration in a dyadic manner. Complex Adaptive Social Systems served as the framework for understanding the dynamics involved in the formation of collaborative relationships. Temporal network analysis at the mesoscopic level was used to study the collaboration dynamics of a specific research community, in this case the genomic research community emerging around GenBank, the international nucleotide sequence databank. The investigation into the dynamics of the mesoscopic layer of a scientific collaboration networked revealed the following—(1) there is a prominent half-life to collaborative relationships; (2) the half-life can be used to construct weighted decay networks for extracting the group structure influencing collaboration; (3) scientists across all levels of status are becoming increasingly interdependent, with the qualification that interdependence is highly asymmetrical, and (4) the group structure is increasingly influential on the collaborative interactions of scientists. The results from this study advance theoretical and empirical understanding of scientific collaboration in team-based research environments and methodological approaches to studying temporal networks at the mesoscopic level. The findings also have implications for policy researchers interested in the career cycles of scientists and the maintenance and building of scientific capacity in research areas of national interest

    Dynamical Systems on Networks: A Tutorial

    Full text link
    We give a tutorial for the study of dynamical systems on networks. We focus especially on "simple" situations that are tractable analytically, because they can be very insightful and provide useful springboards for the study of more complicated scenarios. We briefly motivate why examining dynamical systems on networks is interesting and important, and we then give several fascinating examples and discuss some theoretical results. We also briefly discuss dynamical systems on dynamical (i.e., time-dependent) networks, overview software implementations, and give an outlook on the field.Comment: 39 pages, 1 figure, submitted, more examples and discussion than original version, some reorganization and also more pointers to interesting direction

    A Survey on Modeling Language Evolution in the New Millennium

    Get PDF
    AbstractLanguage is a complex evolving system and it is not a trivial task to model the dynamics of processes occurring during its evolution. Therefore, modeling language evolution has attracted the interest of several researchers giving rise to a lot of models in the literature of the last millennium. This work reviews the literature devoted to computationally represent the evolution of human language through formal models and provides an analysis of the bibliographic production and scientific impact of the surveyed language evolution models to give some conclusions about current trends and future perspectives of this research field. The survey provides also an overview of the strategies for validating and comparing the different language evolution models and how these techniques have been applied by the surveyed models

    The boomerang returns? Accounting for the impact of uncertainties on the dynamics of remanufacturing systems

    Get PDF
    Recent years have witnessed companies abandon traditional open-loop supply chain structures in favour of closed-loop variants, in a bid to mitigate environmental impacts and exploit economic opportunities. Central to the closed-loop paradigm is remanufacturing: the restoration of used products to useful life. While this operational model has huge potential to extend product life-cycles, the collection and recovery processes diminish the effectiveness of existing control mechanisms for open-loop systems. We systematically review the literature in the field of closed-loop supply chain dynamics, which explores the time-varying interactions of material and information flows in the different elements of remanufacturing supply chains. We supplement this with further reviews of what we call the three ‘pillars’ of such systems, i.e. forecasting, collection, and inventory and production control. This provides us with an interdisciplinary lens to investigate how a ‘boomerang’ effect (i.e. sale, consumption, and return processes) impacts on the behaviour of the closed-loop system and to understand how it can be controlled. To facilitate this, we contrast closed-loop supply chain dynamics research to the well-developed research in each pillar; explore how different disciplines have accommodated the supply, process, demand, and control uncertainties; and provide insights for future research on the dynamics of remanufacturing systems

    4D monitoring of active sinkholes with a Terrestrial Laser Scanner (TLS): A Case study in the evaporite karst of the Ebro Valley, NE Spain

    Get PDF
    This work explores, for the first time, the application of a Terrestrial Laser Scanner (TLS) and a comparison of point clouds in the 4D monitoring of active sinkholes. The approach is tested in three highly-active sinkholes related to the dissolution of salt-bearing evaporites overlain by unconsolidated alluvium. The sinkholes are located in urbanized areas and have caused severe damage to critical infrastructure (flood-control dike, a major highway). The 3D displacement models derived from the comparison of point clouds with exceptionally high spatial resolution allow complex spatial and temporal subsidence patterns within one of the sinkholes to be resolved. Detected changes in the subsidence activity (e.g., sinkhole expansion, translation of the maximum subsidence zone, development of incipient secondary collapses) are related to potential controlling factors such as floods, water table changes or remedial measures. In contrast, with detailed mapping and high-precision leveling, the displacement models, covering a relatively short time span of around 6 months, do not capture the subtle subsidence (< 0.6-1 cm) that affects the marginal zones of the sinkholes, precluding precise mapping of the edges of the subsidence areas. However, the performance of TLS can be adversely affected by some methodological limitations and local conditions: (1) limited accuracy in large investigation areas that require the acquisition of a high number of scans, increasing the registration error; (2) surface changes unrelated to sinkhole activity (e.g., vegetation, loose material); (3) traffic-related vibrations and wind blast that affect the stability of the scanner

    STATISTICAL METHODS AND TOOLS FOR FOOTBALL ANALYTICS

    Get PDF
    Gli strumenti di digitalizzazione e di machine learning hanno avuto una crescita esponenziale nel corso degli ultimi anni e tutto ciò ha riguardato di riflesso i più svariati settori della nostra vita: in particolar modo, questa tesi ha l'obiettivo di focalizzarsi sulla sport analytics, in particolare sul calcio, lo sport più praticato al mondo. A causa della crescente necessità dei club professionistici, gli strumenti analitici nel calcio stanno diventando uno snodo cruciale per aiutare gli staff tecnici, le aree scouting e i management nell'ottimizzare e nel prendere decisioni; per questa ragione, in questa tesi sono state sviluppate diverse applicazioni statistiche, una per ogni capitolo, ognuna corrispondente ad un articolo scientifico pubblicato o in revisione da parte di una rivista scientifica. Nell'introduzione della tesi sono elencate le principali attività svolte durante il periodo di dottorato, seguite dal primo capitolo dedicato alla revisione della letteratura, effettuato in forma analitica grazie ad un originale analisi bibliometrica sugli ultimi 10 anni di produzione scientifica. Il secondo capitolo è dedicato ad un approfondimento metodologico sul Partial Least Squares Structural Equation Modeling (PLS-SEM), metodologia statistica utilizzata per la creazione di indicatori compositi volti ad analizzare la performance dei giocatori, tramite l'utilizzo di dati forniti dagli esperti di Electronic Arts (EA) e disponibili sulla piattaforma di data science Kaggle; nella seconda parte del capitolo è presente l'applicazione sviluppata, in particolare un modello gerarchico del terzo ordine utilizzando i Key Performance Indices di sofifa per calcolare un indicatore composito differenziato per ogni ruolo. Nel terzo capitolo il modello sviluppato nel capitolo precedente è stato rifinito e validato per ogni ruolo, applicando una Confirmatory Tetrad Analysis (CTA) e una Confirmatory Composite Analysis (CCA), utilizzando i dati relativi ai più recenti campionati (stagione 2021/2022); i risultati ottenuti sottolineano come le diverse aree e sottoaree di performance hanno diversi pesi e valori a seconda del ruolo del giocatore. Infine, con lo scopo di valutare la validità predittiva del modello, il nuovo indicatore composito (PI) overall è stato confrontato con un benchmark (EA overall) e con delle variabili proxy come il valore di mercato e l'ingaggio dei giocatori, ottenendo dei risultati interessanti e significativi. A questo punto, nell'ultimo capitolo gli indicatori compositi sviluppati in precedenza sono stati introdotti come regressori nel modello di expected goal (xG), con lo scopo di migliorarne l'accuratezza predittiva. Il modello xG è infatti uno dei modelli emergenti nel mondo della football analytics e ha lo scopo di prevedere i goal e misurarne la qualità. Per fare questo è stato applicato un modello logistico classico ed un modello logistico aggiustato su diversi scenari per campioni bilanciati. Nella fattispece, alcuni indicatori compositi e altri nuovi regressori (variabili di tracking) sono risultati significativi per il modello di classificazione, contribuendo a migliorare l'accuratezza nella predizione dei goal, confrontandolo con un benchmark.Machine learning and digitization tools are exponentially increasing in these last years and their applications are reflected in different areas of our life: in particular, this thesis aims to focus on football (i.e. soccer for Americans), the most practised sport in the world. Due to needing of professional teams, analytics tools in football are becoming a crucial point, in order to help technical staff, scouting and clubs management in policy evaluation and to optimize strategic decisions; for this reason, different statistical applications have been developed, one for each chapter, corresponding to published or submitted scientific articles. In the first part are presented the main activities I attended during my PhD, then the first chapter is dedicated to literature review, by an original bibliometric analysis relying football analytics development in the decade 2010-2020. The following chapter is designated for in-depth the Partial Least Squares Structural Equation Modeling (PLS-SEM) framework, in order to study and create some original composite indicators for players performance using data provided by Electronic Arts (EA) experts and available on the Kaggle data science platform; in particular, a Third-Order PLS-PM approach was adopted on the sofifa Key Performance Indices, in order to compute a composite indicator differentiated by role. In the next chapter the PLS-SEM model has been refined and validated, applying both Confirmatory Tetrad Analysis (CTA) and Confirmatory Composite Analysis (CCA), using EA \emph{sofifa} data relying the most recent football season (2021/2022); the final results underline how some sub-areas of performance have different significance weights depending on the player's role; as concurrent and predictive analysis, the new Player Indicator (PI) overall was compared with a benchmark (the EA overall) and with some performance quality proxies, such as players' market value and wage, showing interesting and consistent relations. At this point, these original composite indicators have been introduced as regressors in the last chapter for improving in terms of prediction performance the expected goal (xG) model; it is one emerging tool in the field of football analytics, that aims to predict goal and measure the quality of each shot, by applying a supervised machine learning approach (logit model) on different scenarios for sample balanced techniques. In particular, some performance composite indicators obtained by the PLS-SEM and some original tracking variables are significant for the classification model, contributing to increase the goal prediction probability, compared with a benchmark
    corecore