    Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation

    Contents The four CSV files are the data used for the evaluation in: Saier T., Färber M. (2020) Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation. In: Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12035. DOI: 10.1007/978-3-030-45439-5_15 Code: github.com/IllDepence/ecir2020 The evaluation was conducted in a citation re-prediction setting. CSV Format 7 columns divided by \u241E cited document ID for *_nomarker.csv: citation marker position ambiguous for *_withmarker.csv: citation marker position at 'MAINCIT' in citation context adjacent cited document IDs only given in citrec_unarxive_*.csv divided by \u241F order matches 'CIT' markers in citation context citing document ID citation context MAG field of study IDs divided by \u241F predicate:argument tuples generated based on PredPatt JSON noun phrases for *_nomarker.csv: divided by \u241F for *_withmarker.csv: divided by \u241D into noun phrases noun phrase directly preceding citation marker Data Sources citrec_unarxive_cs_withmarker.csv data set unarXive Paper DOI: 10.1007/s11192-020-03382-z Data DOI: 10.5281/zenodo.2553522 filter citing doc from computer science cited doc is cited at least 5 times citrec_mag_cs_en.csv data set Microsoft Academic Graph (MAG) Paper DOI: 10.1145/2740908.2742839 filter citing doc from computer science and in English citing doc abstract in MAG given cited doc is cited at least 50 times citrec_refseer.csv data set RefSeer Paper URL: ojs.aaai.org/index.php/AAAI/article/view/9528 Data URL: psu.app.box.com/v/refseer filter for citing and cited docs title, venue, venuetype, abstract, and year not NULL citrec_acl-arc_withmarker.csv data set ACL ARC Paper URL: aclanthology.org/L08-1005 Data URL: acl-arc.comp.nus.edu.sg/ filter cited doc has a DBLP ID Paper Citation @inproceedings{Saier2020ECIR, author = {Tarek Saier and Michael F{\"{a}}rber}, title = {{Semantic Modelling of Citation Contexts for Context-aware Citation Recommendation}}, booktitle = {Proceedings of the 42nd European Conference on Information Retrieval}, pages = {220--233}, year = {2020}, month = apr, doi = {10.1007/978-3-030-45439-5_15},

    Domain-independent Extraction of Scientific Concepts from Research Articles

    We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.Comment: Accepted for publishing in 42nd European Conference on IR Research, ECIR 202

    Reproducibility of experiments in recommender systems evaluation

    © IFIP International Federation for Information Processing 2018 Published by Springer International Publishing AG 2018. All Rights Reserved. Recommender systems evaluation is usually based on predictive accuracy metrics with better scores meaning recommendations of higher quality. However, the comparison of results is becoming increasingly difficult, since there are different recommendation frameworks and different settings in the design and implementation of the experiments. Furthermore, there might be minor differences on algorithm implementation among the different frameworks. In this paper, we compare well known recommendation algorithms, using the same dataset, metrics and overall settings, the results of which point to result differences across frameworks with the exact same settings. Hence, we propose the use of standards that should be followed as guidelines to ensure the replication of experiments and the reproducibility of the results

    Requirements Analysis for an Open Research Knowledge Graph

    Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get an overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KGs) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective by presenting a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications and outline possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory and Practice of Digital Libraries, TPDL 202

    Presenilin Controls CBP Levels in the Adult Drosophila Central Nervous System

    Background: Dominant mutations in both human Presenilin (Psn) genes have been correlated with the formation of amyloid plaques and development of familial early-onset Alzheimer’s disease (AD). However, a definitive mechanism whereby plaque formation causes the pathology of familial and sporadic forms of AD has remained elusive. Recent discoveries of several substrates for Psn protease activity have sparked alternative hypotheses for the pathophysiology underlying AD. CBP (CREB-binding protein) is a haplo-insufficient transcriptional co-activator with histone acetly-transferase (HAT) activity that has been proposed to be a downstream target of Psn signaling. Individuals with altered CBP have cognitive deficits that have been linked to several neurological disorders. Methodology/Principal Findings: Using a transgenic RNA-interference strategy to selectively silence CBP, Psn, and Notch in adult Drosophila, we provide evidence for the first time that Psn is required for normal CBP levels and for maintaining specific global acetylations at lysine 8 of histone 4 (H4K8ac) in the central nervous system (CNS). In addition, flies conditionally compromised for the adult-expression of CBP display an altered geotaxis behavior that may reflect a neurological defect. Conclusions/Significance: Our data support a model in which Psn regulates CBP levels in the adult fly brain in a manner that is independent of Notch signaling. Although we do not understand the molecular mechanism underlying th

    Modulation of γ-Secretase Activity by Multiple Enzyme-Substrate Interactions: Implications in Pathogenesis of Alzheimer's Disease

    BACKGROUND: We describe molecular processes that can facilitate pathogenesis of Alzheimer's disease (AD) by analyzing the catalytic cycle of a membrane-imbedded protease γ-secretase, from the initial interaction with its C99 substrate to the final release of toxic Aβ peptides. RESULTS: The C-terminal AICD fragment is cleaved first in a pre-steady-state burst. The lowest Aβ42/Aβ40 ratio is observed in pre-steady-state when Aβ40 is the dominant product. Aβ42 is produced after Aβ40, and therefore Aβ42 is not a precursor for Aβ40. The longer more hydrophobic Aβ products gradually accumulate with multiple catalytic turnovers as a result of interrupted catalytic cycles. Saturation of γ-secretase with its C99 substrate leads to 30% decrease in Aβ40 with concomitant increase in the longer Aβ products and Aβ42/Aβ40 ratio. To different degree the same changes in Aβ products can be observed with two mutations that lead to an early onset of AD, ΔE9 and G384A. Four different lines of evidence show that γ-secretase can bind and cleave multiple substrate molecules in one catalytic turnover. Consequently depending on its concentration, NotchΔE substrate can activate or inhibit γ-secretase activity on C99 substrate. Multiple C99 molecules bound to γ-secretase can affect processive cleavages of the nascent Aβ catalytic intermediates and facilitate their premature release as the toxic membrane-imbedded Aβ-bundles. CONCLUSIONS: Gradual saturation of γ-secretase with its substrate can be the pathogenic process in different alleged causes of AD. Thus, competitive inhibitors of γ-secretase offer the best chance for a successful therapy, while the noncompetitive inhibitors could even facilitate development of the disease by inducing enzyme saturation at otherwise sub-saturating substrate. Membrane-imbedded Aβ-bundles generated by γ-secretase could be neurotoxic and thus crucial for our understanding of the amyloid hypothesis and AD pathogenesis

    Cdx ParaHox genes acquired distinct developmental roles after gene duplication in vertebrate evolution

    BACKGROUND: The functional consequences of whole genome duplications in vertebrate evolution are not fully understood. It remains unclear, for instance, why paralogues were retained in some gene families but extensively lost in others. Cdx homeobox genes encode conserved transcription factors controlling posterior development across diverse bilaterians. These genes are part of the ParaHox gene cluster. Multiple Cdx copies were retained after genome duplication, raising questions about how functional divergence, overlap, and redundancy respectively contributed to their retention and evolutionary fate. RESULTS: We examined the degree of regulatory and functional overlap between the three vertebrate Cdx genes using single and triple morpholino knock-down in Xenopus tropicalis followed by RNA-seq. We found that one paralogue, Cdx4, has a much stronger effect on gene expression than the others, including a strong regulatory effect on FGF and Wnt genes. Functional annotation revealed distinct and overlapping roles and subtly different temporal windows of action for each gene. The data also reveal a colinear-like effect of Cdx genes on Hox genes, with repression of Hox paralogy groups 1 and 2, and activation increasing from Hox group 5 to 11. We also highlight cases in which duplicated genes regulate distinct paralogous targets revealing pathway elaboration after whole genome duplication. CONCLUSIONS: Despite shared core pathways, Cdx paralogues have acquired distinct regulatory roles during development. This implies that the degree of functional overlap between paralogues is relatively low and that gene expression pattern alone should be used with caution when investigating the functional evolution of duplicated genes. We therefore suggest that developmental programmes were extensively rewired after whole genome duplication in the early evolution of vertebrates

    Gene Expression Profiling in Cells with Enhanced γ-Secretase Activity

    BACKGROUND: Processing by gamma-secretase of many type-I membrane protein substrates triggers signaling cascades by releasing intracellular domains (ICDs) that, following nuclear translocation, modulate the transcription of different genes regulating a diverse array of cellular and biological processes. Because the list of gamma-secretase substrates is growing quickly and this enzyme is a cancer and Alzheimer's disease therapeutic target, the mapping of gamma-secretase activity susceptible gene transcription is important for sharpening our view of specific affected genes, molecular functions and biological pathways. METHODOLOGY/PRINCIPAL FINDINGS: To identify genes and molecular functions transcriptionally affected by gamma-secretase activity, the cellular transcriptomes of Chinese hamster ovary (CHO) cells with enhanced and inhibited gamma-secretase activity were analyzed and compared by cDNA microarray. The functional clustering by FatiGO of the 1,981 identified genes revealed over- and under-represented groups with multiple activities and functions. Single genes with the most pronounced transcriptional susceptibility to gamma-secretase activity were evaluated by real-time PCR. Among the 21 validated genes, the strikingly decreased transcription of PTPRG and AMN1 and increased transcription of UPP1 potentially support data on cell cycle disturbances relevant to cancer, stem cell and neurodegenerative diseases' research. The mapping of interactions of proteins encoded by the validated genes exclusively relied on evidence-based data and revealed broad effects on Wnt pathway members, including WNT3A and DVL3. Intriguingly, the transcription of TERA, a gene of unknown function, is affected by gamma-secretase activity and was significantly altered in the analyzed human Alzheimer's disease brain cortices. CONCLUSIONS/SIGNIFICANCE: Investigating the effects of gamma-secretase activity on gene transcription has revealed several affected clusters of molecular functions and, more specifically, 21 genes that hold significant potential for a better understanding of the biology of gamma-secretase and its roles in cancer and Alzheimer's disease pathology

    3D genomics across the tree of life reveals condensin II as a determinant of architecture type

    We investigated genome folding across the eukaryotic tree of life. We find two types of three-dimensional(3D) genome architectures at the chromosome scale. Each type appears and disappears repeatedlyduring eukaryotic evolution. The type of genome architecture that an organism exhibits correlates with theabsence of condensin II subunits. Moreover, condensin II depletion converts the architecture of thehuman genome to a state resembling that seen in organisms such as fungi or mosquitoes. In this state,centromeres cluster together at nucleoli, and heterochromatin domains merge. We propose a physicalmodel in which lengthwise compaction of chromosomes by condensin II during mitosis determineschromosome-scale genome architecture, with effects that are retained during the subsequent interphase.This mechanism likely has been conserved since the last common ancestor of all eukaryotes.C.H. is supported by the Boehringer Ingelheim Fonds; C.H., Á.S.C., and B.D.R. are supported by an ERC CoG (772471, “CohesinLooping”); A.M.O.E. and B.D.R. are supported by the Dutch Research Council (NWO-Echo); and J.A.R. and R.H.M. are supported by the Dutch Cancer Society (KWF). T.v.S. and B.v.S. are supported by NIH Common Fund “4D Nucleome” Program grant U54DK107965. H.T. and E.d.W. are supported by an ERC StG (637597, “HAP-PHEN”). J.A.R., T.v.S., H.T., R.H.M., B.v.S., and E.d.W. are part of the Oncode Institute, which is partly financed by the Dutch Cancer Society. Work at the Center for Theoretical Biological Physics is sponsored by the NSF (grants PHY-2019745 and CHE-1614101) and by the Welch Foundation (grant C-1792). V.G.C. is funded by FAPESP (São Paulo State Research Foundation and Higher Education Personnel) grants 2016/13998-8 and 2017/09662-7. J.N.O. is a CPRIT Scholar in Cancer Research. E.L.A. was supported by an NSF Physics Frontiers Center Award (PHY-2019745), the Welch Foundation (Q-1866), a USDA Agriculture and Food Research Initiative grant (2017-05741), the Behavioral Plasticity Research Institute (NSF DBI-2021795), and an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375). Hi-C data for the 24 species were created by the DNA Zoo Consortium (www.dnazoo.org). DNA Zoo is supported by Illumina, Inc.; IBM; and the Pawsey Supercomputing Center. P.K. is supported by the University of Western Australia. L.L.M. was supported by NIH (1R01NS114491) and NSF awards (1557923, 1548121, and 1645219) and the Human Frontiers Science Program (RGP0060/2017). The draft A. californica project was supported by NHGRI. J.L.G.-S. received funding from the ERC (grant agreement no. 740041), the Spanish Ministerio de Economía y Competitividad (grant no. BFU2016-74961-P), and the institutional grant Unidad de Excelencia María de Maeztu (MDM-2016-0687). R.D.K. is supported by NIH grant RO1DK121366. V.H. is supported by NIH grant NIH1P41HD071837. K.M. is supported by a MEXT grant (20H05936). M.C.W. is supported by the NIH grants R01AG045183, R01AT009050, R01AG062257, and DP1DK113644 and by the Welch Foundation. E.F. was supported by NHGR