Search CORE

19 research outputs found

BIP! NDR (NoDoiRefs): A Dataset of Citations From Papers Without DOIs in Computer Science Conferences and Workshops

Author: Chatzopoulos Serafeim
Koloveas Paris
Tryfonopoulos Christos
Vergoulis Thanasis
Publication venue
Publication date: 24/07/2023
Field of study

In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains more than 510K citations made by approximately 60K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI

arXiv.org e-Print Archive

Piloting topic-aware research impact assessment features in BIP! Services

Author: Chatzopoulos Serafeim
Kanellos Ilias
Vergoulis Thanasis
Vichos Kleanthis
Publication venue
Publication date: 10/05/2023
Field of study

Various research activities rely on citation-based impact indicators. However these indicators are usually globally computed, hindering their proper interpretation in applications like research assessment and knowledge discovery. In this work, we advocate for the use of topic-aware categorical impact indicators, to alleviate the aforementioned problem. In addition, we extend BIP! Services to support those indicators and showcase their benefits in real-world research activities.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

ATRAPOS: Evaluating Metapath Query Workloads in Real Time

Author: Chatzopoulos Serafeim
Dalamagas Theodore
Karras Panagiotis
Skoutas Dimitrios
Tryfonopoulos Christos
Vergoulis Thanasis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/01/2022
Field of study

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on large, web-scale HINs is highly demanding in computational cost, current approaches do not exploit interrelationships among the queries. In this paper, we present ATRAPOS, a new approach for the real-time evaluation of metapath query workloads that leverages a combination of efficient sparse matrix multiplication and intermediate result caching. ATRAPOS selects intermediate results to cache and reuse by detecting frequent sub-metapaths among workload queries in real time, using a tailor-made data structure, the Overlap Tree, and an associated caching policy. Our experimental study on real data shows that ATRAPOS accelerates exploratory data analysis and mining on HINs, outperforming off-the-shelf caching approaches and state-of-the-art research prototypes in all examined scenarios.Comment: 13 pages, 19 figure

arXiv.org e-Print Archive

jmelot/SoftwareImpactHackathon2023_InstitutionalOSS: Post-hackathon cleanup 2023

Author: bandrow
Dominika Tkaczyk
Jennifer Melot
Joseph Rhoads
Kaitlyn Hair
Scott F
Serafeim Chatzopoulos
Publication venue: Zenodo
Publication date: 21/01/2024
Field of study

Repo as of post-hackathon cleanup. Preliminary results available, but more evaluation and result filtering needed.</p&gt

ZENODO

BIP! DB: A Dataset of Impact Measures for Scientific Publications

Author: Andrea Mannocci
Claudio Atzori
Ilias Kanellos
Natalia Manola
Paolo Manghi
Sandro La Bruzzo
Serafeim Chatzopoulos
Thanasis Vergoulis
Publication venue: Zenodo
Publication date: 25/09/2023
Field of study

This dataset contains citation-based impact indicators (a.k.a, "measures") for ~153M distinct DOIs that correspond to scientific articles. In particular, for each DOI, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture):Influence indicators (i.e., indicators of the "total" impact of each article; how established the article is in general)Citation Count: The total number of citations of the article, the most well-known influence indicator.PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each article based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two articles with the same number of citations can have significantly different PageRank scores if the aggregated influence of the articles citing them is very different - the article receiving citations from more influential articles will get a larger score).  Popularity indicators (i.e., indicators of the "current" impact of each article; how popular the article is currently)RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published articles (new articles need time to receive a number of citations that can be indicative for their impact).AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published papers by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to read papers which received a lot of attention recently.Impulse indicators (i.e., indicators of the initial momentum that the article receives after its publication)Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all papers and the time window depends on the publication date of the paper, i.e., only citations 3 years after each paper's publication are counted.More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found <a href="https://bip.imsi.athenarc.gr/site/indicators">here</a> and in the respective references (e.g., in [5]).From version 5.1 onward, the impact indicators are calculated in two levels:<ul><li>The DOI level (assuming that each DOI corresponds to a distinct scientific article).</li><li>The OpenAIRE-id level (leveraging DOI synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id).</li></ul>Previous versions of the dataset only provided the scores at the DOI level.Also, from version 7 onward, for each article in our files we also offer an impact class, which informs the user about the percentile into which the article score belongs compared to the impact scores of the rest articles in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%).Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each article with a distinct DOI that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the <a href="https://graph.openaire.eu/docs/graph-production-workflow/deduplication/research-products">OpenAIRE article deduplication algorithm</a>. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same article). As a result, each node in the citation network we build is a deduplicated article having a distinct OpenAIRE id. We still report the scores at DOI level (i.e., we assign a score to each of the versions/instances of the article), however these DOI-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated article will receive the same scores). We have removed a small number of instances (having a DOI) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph.For each calculation level (DOI / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format  "identifier <tab> score <tab> class". The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study5 and the configuration of AttRank in the original paper.[3] Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename.  From version 9 onward, we also provide topic-specific impact classes for DOI-identified publications. In particular, we associated those articles with 2nd level concepts from OpenAlex (284 in total); we chose to keep only the three most dominant concepts for each publication, based on their confidence score, and only if this score was greater than 0.3. Then, for each publication and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class".The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v6.0.1, including data from (a) the OpenCitations' COCI dataset (Jan-2023 version), (b) a MAG [6,7] snapshot from Dec-2021, and (c) a Crossref snapshot from May-2023 (before version 10, these sources were gathered independently). The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described <a href="https://graph.openaire.eu/docs/graph-production-workflow/aggregation/non-compatible-sources/doiboost/#crossref-filtering">here</a> to remove from the dataset DOIs with problematic metadata.References:[1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.[2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380[3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)[4]  P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020).[5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)[6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839[7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045    Find our Academic Search Engine built on top of these data <a href="https://bip.imsi.athenarc.gr/">here</a>. Further note, that we also provide all calculated scores through <a href="https://bip-api.imsi.athenarc.gr/documentation">BIP! Finder's API</a>. Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.More details about BIP! DB can be found in our relevant peer-reviewed publication:Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460We kindly request that any published research that makes use of BIP! DB cite the above article.Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-46

ZENODO

BIP! DB: A Dataset of Impact Measures for Research Products

Author: Atzori Claudio
Chatzopoulos Serafeim
Kanellos Ilias
La Bruzzo Sandro
Manghi Paolo
Mannocci Andrea
Manola Natalia
Vergoulis Thanasis
Publication venue: Zenodo
Publication date: 17/01/2024
Field of study

This dataset contains citation-based impact indicators (a.k.a, "measures") for ~168.8M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score).   Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found <a href="https://bip.imsi.athenarc.gr/site/indicators">here</a> and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: <ul> <li>The PID level (assuming that each PID corresponds to a distinct research product).</li> <li>The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id).</li> </ul> Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included).  Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the <a href="https://graph.openaire.eu/docs/graph-production-workflow/deduplication/research-products">OpenAIRE article deduplication algorithm</a>. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format  "identifier <tab> score <tab> class". The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study [5] and the configuration of AttRank in the original paper. [3] Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename.   From version 9 onward, we also provide topic-specific impact classes for PID-identified products. In particular, we associated those products with 2nd level concepts from OpenAlex; we chose to keep only the three most dominant concepts for each product, based on their confidence score, and only if this score was greater than 0.3. Then, for each product and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class". The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v7.0.0, including data from (a) OpenCitations' COCI & POCI dataset, (b) MAG [6,7], and (c) Crossref. The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described <a href="https://graph.openaire.eu/docs/graph-production-workflow/aggregation/non-compatible-sources/doiboost/#crossref-filtering">here</a> to remove from the dataset PIDs with problematic metadata. References: [1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. [2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380 [3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020) [4]  P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020). [5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access) [6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 [7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045     Find our Academic Search Engine built on top of these data <a href="https://bip.imsi.athenarc.gr/">here</a>. Further note, that we also provide all calculated scores through <a href="https://bip-api.imsi.athenarc.gr/documentation">BIP! Finder's API</a>.  Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license. More details about BIP! DB can be found in our relevant peer-reviewed publication: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460 We kindly request that any published research that makes use of BIP! DB cite the above article.Please cite: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-46

ZENODO

DIANA-mirExTra v2.0: Uncovering microRNAs and transcription factors with crucial roles in NGS expression data

Author: Vlachos Ioannis S. Vergoulis, Thanasis Paraskevopoulou, Maria D. Lykokanellos, Filopoimin Georgakilas, Georgios Georgiou, Penny Chatzopoulos, Serafeim Karagkouni, Dimitra and Christodoulou, Foteini Dalamagas, Theodore Hatzigeorgiou, Artemis G.
Publication venue
Publication date: 01/01/2016
Field of study

Differential expression analysis (DEA) is one of the main instruments utilized for revealing molecular mechanisms in pathological and physiological conditions. DIANA-mirExTra v2.0 (http://www.microrna.gr/mirextrav2) performs a combined DEA of mRNAs and microRNAs (miRNAs) to uncover miRNAs and transcription factors (TFs) playing important regulatory roles between two investigated states. The web server uses as input miRNA/RNA-Seq read count data sets that can be uploaded for analysis. Users can combine their data with 350 small-RNA-Seq and 65 RNA-Seq in-house analyzed libraries which are provided by DIANA-mirExTra v2.0. The web server utilizes miRNA:mRNA, TF:mRNA and TF:miRNA interactions derived from extensive experimental data sets. More than 450 000 miRNA interactions and 2 000 000 TF binding sites from specific or high-throughput techniques have been incorporated, while accurate miRNA TSS annotation is obtained from microTSS experimental/in silico framework. These comprehensive data sets enable users to perform analyses based solely on experimentally supported information and to uncover central regulators within sequencing data: miRNAs controlling mRNAs and TFs regulating mRNA or miRNA expression. The server also supports predicted miRNA:gene interactions from DIANA-microT-CDS for 4 species (human, mouse, nematode and fruit fly). DIANA-mirExTra v2.0 has an intuitive user interface and is freely available to all users without any login requirement

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Α 78-year-old female who presents with a non-resolving pneumonia: what is your diagnosis?

Author: Agrons
Athanasia Pataka
Evangelia Panagiotidou
Evangelos Chatzopoulos
Ioannis Stanopoulos
Katalin Fekete-Passa
Maria Kilmpasani
Nikoleta Pastelli
Serafeim-Chrysovalantis Kotoulas
Sofia Akritidou
Vasilios Bagalas
Vasilis Bikos
Publication venue: 'European Respiratory Society (ERS)'
Publication date
Field of study

Crossref

Recommended from our members

DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA–gene interactions

Author: Chatzopoulos Serafeim
Dalamagas Theodore
Hatzigeorgiou Artemis G
Kanellos Ilias
Karagkouni Dimitra
Kavakiotis Ioannis
Maniou Sofia
Papadimitriou Dimitris
Paraskevopoulou Maria D
Skoufos Giorgos
Tastsoglou Spyros
Vergoulis Thanasis
Vlachos Ioannis S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 26/02/2018
Field of study

Abstract DIANA-TarBase v8 (http://www.microrna.gr/tarbase) is a reference database devoted to the indexing of experimentally supported microRNA (miRNA) targets. Its eighth version is the first database indexing >1 million entries, corresponding to ∼670 000 unique miRNA-target pairs. The interactions are supported by >33 experimental methodologies, applied to ∼600 cell types/tissues under ∼451 experimental conditions. It integrates information on cell-type specific miRNA–gene regulation, while hundreds of thousands of miRNA-binding locations are reported. TarBase is coming of age, with more than a decade of continuous support in the non-coding RNA field. A new module has been implemented that enables the browsing of interactions through different filtering combinations. It permits easy retrieval of positive and negative miRNA targets per species, methodology, cell type and tissue. An incorporated ranking system is utilized for the display of interactions based on the robustness of their supporting methodologies. Statistics, pie-charts and interactive bar-plots depicting the database content are available through a dedicated result page. An intuitive interface is introduced, providing a user-friendly application with flexible options to different queries

Harvard University - DASH

DIANA-mirExTra v2.0: Uncovering microRNAs and transcription factors with crucial roles in NGS expression data

Author: Artemis G. Hatzigeorgiou
Dimitra Karagkouni
Filopoimin Lykokanellos
Foteini Christodoulou
Georgios Georgakilas
Ioannis S. Vlachos
Jung
Maria D. Paraskevopoulou
Penny Georgiou
Serafeim Chatzopoulos
Thanasis Vergoulis
Theodore Dalamagas
Xu
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref