Search CORE

90 research outputs found

Deep R Programming

Author: Gagolewski Marek
Publication venue
Publication date: 29/12/2022
Field of study

Deep R Programming is a comprehensive course on one of the most popular languages in data science (statistical computing, graphics, machine learning, data wrangling and analytics). It introduces the base language in-depth and is aimed at ambitious students, practitioners, and researchers who would like to become independent users of this powerful environment. This textbook is a non-profit project. Its online and PDF versions are freely available at . This early draft is distributed in the hope that it will be useful.Comment: Draft: v0.2.1 (2023-04-27

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

A Framework for Benchmarking Clustering Algorithms

Author: Gagolewski Marek
Publication venue: 'Elsevier BV'
Publication date: 06/10/2022
Field of study

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

stringi: Fast and Portable Character String Processing in R

Author: Gagolewski Marek
Publication venue: Foundation for Open Access Statistics
Publication date: 02/07/2022
Field of study

Effective processing of character strings is required at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation. Pattern searching, string collation and sorting, normalization, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package for fast and portable handling of string data based on ICU (International Components for Unicode), should be included in each statistician's or data scientist's repertoire to complement their numerical computing and data wrangling skills

Journal of Statistical Software

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Clustering with minimum spanning trees: How good can it be?

Author: Bartoszuk Maciej
Brzozowski Łukasz
Cena Anna
Gagolewski Marek
Publication venue
Publication date: 09/03/2023
Field of study

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they can be meaningful in data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can overall be very competitive. Next, instead of proposing yet another algorithm that performs well on a limited set of examples, we review, study, extend, and generalise existing, the state-of-the-art MST-based partitioning schemes, which leads to a few new and interesting approaches. It turns out that the Genie method and the information-theoretic approaches often outperform the non-MST algorithms such as k-means, Gaussian mixtures, spectral clustering, BIRCH, and classical hierarchical agglomerative procedures

arXiv.org e-Print Archive

Hierarchical Clustering with OWA-based Linkages, the Lance-Williams Formula, and Dendrogram Inversions

Author: Beliakov Gleb
Cena Anna
Gagolewski Marek
James Simon
Publication venue
Publication date: 09/03/2023
Field of study

Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions

arXiv.org e-Print Archive

The use of fuzzy relations in the assessment of information resources producers' performance

Author: Jan Lasek
Marek Gagolewski
Publication venue
Publication date: 23/04/2020
Field of study

Abstract. The producers assessment problem has many important practical instances: it is an abstract model for intelligent systems evaluating e.g. the quality of computer software repositories, web resources, social networking services, and digital libraries. Each producer's performance is determined according not only to the overall quality of the items he/she outputted, but also to the number of such items (which may be different for each agent). Recent theoretical results indicate that the use of aggregation operators in the process of ranking and evaluation producers may not necessarily lead to fair and plausible outcomes. Therefore, to overcome some weaknesses of the most often applied approach, in this preliminary study we encourage the use of a fuzzy preference relation-based setting and indicate why it may provide better control over the assessment process

CiteSeerX

Vector valued information measures and integration with respect to fuzzy vector capacities

Author: Aleixandre Benavent
Aliprantis
Beliakov
Calabuig
Cao
Cerdà
Delgado
Denneberg
Diestel
E.A. Sánchez Pérez
Egghe
Fernández
Ferrer-Sapena
Gagolewski
Kawabe
Kawabe
Kawave
Klement
Lindenstrauss
Murofushi
Murofushi
Okada
Pap
Pap
Piwowar
Pérez
R. Szwedek
Ruan
Schaeffer
Tatjana
Torra
Watanabe
Watanabe
Zahedi
Publication venue: 'Elsevier BV'
Publication date: 15/01/2019
Field of study

[EN] Integration with respect to vector-valued fuzzy measures is used to define and study information measuring tools. Motivated by some current developments in Information Science, we apply the integration of scalar functions with respect to vector-valued fuzzy measures, also called vector capacities. Bartle-Dunford-Schwartz integration (for the additive case) and Choquet type integration (for the non-additive case) are considered, showing that these formalisms can be used to define and develop vector-valued impact measures. Examples related to existing bibliometric tools as well as to new measuring indices are given.The authors would like to thank both Prof. Dr. Olvido Delgado and the referee for their valuable comments and suggestions which helped to prepare the manuscript. The first author gratefully acknowledges the support of the Ministerio de Economia, Industria y Competitividad (Spain) under project MTM2016-77054-C2-1-P.Sánchez Pérez, EA.; Szwedek, R. (2019). Vector valued information measures and integration with respect to fuzzy vector capacities. Fuzzy Sets and Systems. 355:1-25. https://doi.org/10.1016/j.fss.2018.05.004S12535

Crossref

RiuNet

Mathematical properties of weighted impact factors based on measures of prestige of the citing journals

Author: A. Ferrer-Sapena
AC Pinto
D Torres-Salinas
DN Arnold
E Garfield
E Klement
E. A. Sánchez-Pérez
F Habibzadeh
F. Peset
G Beliakov
G Buela-Casal
G Pinski
HF Moed
J Ruiz Castillo
K Zyczkowski
KG Altmann
L Egghe
L Leydesdorff
L Waltman
L Waltman
L. M. González
M Gagolewski
M Zitt
M Zitt
MS Raghunathan
P Ahlgren
P Dorta-Gonzalez
P Dorta-Gonzalez
P Owlia
R Aleixandre Benavent
R. Aleixandre-Benavent
S Saha
V Torra
YR Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s11192-015-1741-0An abstract construction for general weighted impact factors is introduced. We show that the classical weighted impact factors are particular cases of our model, but it can also be used for defining new impact measuring tools for other sources of information as repositories of datasets providing the mathematical support for a new family of altmet- rics. Our aim is to show the main mathematical properties of this class of impact measuring tools, that hold as consequences of their mathematical structure and does not depend on the definition of any given index nowadays in use. In order to show the power of our approach in a well-known setting, we apply our construction to analyze the stability of the ordering induced in a list of journals by the 2-year impact factor (IF2). We study the change of this ordering when the criterium to define it is given by the numerical value of a new weighted impact factor, in which IF2 is used for defining the weights. We prove that, if we assume that the weight associated to a citing journal increases with its IF2, then the ordering given in the list by the new weighted impact factor coincides with the order defined by the IF2. We give a quantitative bound for the errors committed. We also show two examples of weighted impact factors defined by weights associated to the prestige of the citing journal for the fields of MATHEMATICS and MEDICINE, GENERAL AND INTERNAL, checking if they satisfy the increasing behavior mentioned above.Ferrer Sapena, A.; Sánchez Pérez, EA.; González, LM.; Peset Mancebo, MF.; Aleixandre Benavent, R. (2015). Mathematical properties of weighted impact factors based on measures of prestige of the citing journals. Scientometrics. 105(3):2089-2108. https://doi.org/10.1007/s11192-015-1741-0S208921081053Ahlgren, P., & Waltman, L. (2014). The correlation between citation-based and expert-based assessments of publication channels: SNIP and SJR vs. Norwegian quality assessments. Journal of Informetrics, 8, 985–996.Aleixandre Benavent, R., Valderrama Zurián, J. C., & González Alcaide, G. (2007). Scientific journals impact factor: Limitations and alternative indicators. El Profesional de la Información, 16(1), 4–11.Altmann, K. G., & Gorman, G. E. (1998). The usefulness of impact factor in serial selection: A rank and mean analysis using ecology journals. Library Acquisitions-Practise and Theory, 22, 147–159.Arnold, D. N., & Fowler, K. K. (2011). Nefarious numbers. Notices of the American Mathematical Society, 58(3), 434–437.Beliakov, G., & James, S. (2012). Using linear programming for weights identification of generalized bonferroni means in R. In: Proceedings of MDAI 2012 modeling decisions for artificial intelligence. Lecture Notes in Computer Science, Vol. 7647, pp. 35–44.Beliakov, G., & James, S. (2011). Citation-based journal ranks: The use of fuzzy measures. Fuzzy Sets and Systems, 167, 101–119.Buela-Casal, G. (2003). Evaluating quality of articles and scientific journals. Proposal of weighted impact factor and a quality index. Psicothema, 15(1), 23–25.Dorta-Gonzalez, P., & Dorta-Gonzalez, M. I. (2013). Comparing journals from different fields of science and social science through a JCR subject categories normalized impact factor. Scientometrics, 95(2), 645–672.Dorta-Gonzalez, P., Dorta-Gonzalez, M. I., Santos-Penate, D. R., & Suarez-Vega, R. (2014). Journal topic citation potential and between-field comparisons: The topic normalized impact factor. Journal of Informetrics, 8(2), 406–418.Egghe, L., & Rousseau, R. (2002). A general frame-work for relative impact indicators. Canadian Journal of Information and Library Science, 27(1), 29–48.Gagolewski, M., & Mesiar, R. (2014). Monotone measures and universal integrals in a uniform framework for the scientific impact assessment problem. Information Sciences, 263, 166–174.Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295(1), 90–93.Habibzadeh, F., & Yadollahie, M. (2008). Journal weighted impact factor: A proposal. Journal of Informetrics, 2(2), 164–172.Klement, E., Mesiar, R., & Pap, E. (2010). A universal integral as common frame for Choquet and Sugeno integral. IEEE Transaction on Fuzzy System, 18, 178–187.Leydesdorff, L., & Opthof, T. (2010). Scopus’s source normalized impact per paper (SNIP) versus a journal impact factor based on fractional counting of citations. Journal of the American Society for Information Science and Technology, 61, 2365–2369.Li, Y. R., Radicchi, F., Castellano, C., & Ruiz-Castillo, J. (2013). Quantitative evaluation of alternative field normalization procedures. Journal of Informetrics, 7(3), 746–755.Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4, 265–277.NISO. (2014). Alternative metrics initiative phase 1. White paper. http://www.niso.org/apps/group-public/download.php/13809/Altmetrics-project-phase1-white-paperOwlia, P., Vasei, M., Goliaei, B., & Nassiri, I. (2011). Normalized impact factor (NIF): An adjusted method for calculating the citation rate of biomedical journals. Journal of Biomedical Informatics, 44(2), 216–220.Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12, 297–312.Pinto, A. C., & Andrade, J. B. (1999). Impact factor of scientific journals: What is the meaning of this parameter? Quimica Nova, 22, 448–453.Raghunathan, M. S., & Srinivas, V. (2001). Significance of impact factor with regard to mathematics journals. Current Science, 80(5), 605.Ruiz Castillo, J., & Waltman, L. (2015). Field-normalized citation impact indicators using algorithmically constructed classification systems of science. Journal of Informetrics, 9, 102–117.Saha, S., Saint, S., & Christakis, D. A. (2003). Impact factor: A valid measure of journal quality? Journal of the Medical Library Association, 91, 42–46.Torra, V., & Narukawa, Y. (2008). The h-index and the number of citations: Two fuzzy integrals. IEEE Transaction on Fuzzy System, 16, 795–797.Torres-Salinas, D., & Jimenez-Contreras, E. (2010). Introduction and comparative study of the new scientific journals citation indicators in journal citation reports and scopus. El Profesional de la Información, 19, 201–207.Waltman, L., & van Eck, N. J. (2008). Some comments on the journal weighted impact factor proposed by Habibzadeh and Yadollahie. Journal of Informetrics, 2(4), 369–372.Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7, 272–285.Zitt, M. (2011). Behind citing-side normalization of citations: some properties of the journal impact factor. Scientometrics, 89, 329–344.Zitt, M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. Journal of the American Society for Information Science and Technology, 59, 1856–1860.Zyczkowski, K. (2010). Citation graph, weighted impact factors and performance indices. Scientometrics, 85(1), 301–315

Crossref

RiuNet

Non-repeatable science: assessing the frequency of voucher specimen deposition reveals that most arthropod research cannot be verified

Author: Astrin
Bates
Bortolus
Danks
Dekoninck
Dubois
Dubois
Dubois
Funk
Gagolewski
Gardner
Graham
Huber
Knutson
Lane
Leydesdorff
Martin
Parmesan
Pleijel
R Core Team
Radulovici
Rowley
Scheper
Schilthuizen
Schlick-Steiner
Suarez
Valkiūnas
Wheeler
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

A numerical algorithm with preference statements to evaluate the performance of scientists

Author: A García-Aracil
ATLAS Collaboration
B Alberts
C González-Brambila
D Roessner
DV Cicchetti
DW Pearce
J Nicolaisen
JL McKechnie
JR Hauser
L Waltman
M Batey
M Gagolewski
M Ricker
M Ricker
M Ricker
MA Fligner
Martin Ricker
ME Luna-Morales
P Korhonen
PS Nagpaul
RJ Sternberg
RR Sokal
S Upton
TH Cormen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref