41 research outputs found

    A keyquery-based classification system for CORE

    Get PDF
    We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements

    Rheology of soft colloids across the onset of rigidity: scaling behavior, thermal, and non-thermal responses

    Get PDF
    We study the rheological behavior of colloidal suspensions composed of soft sub-micron-size hydrogel particles across the liquid-solid transition. The measured stress and strain-rate data, when normalized by thermal stress and time scales, suggest our systems reside in a regime wherein thermal effects are important. In a different vein, critical point scaling predictions for the jamming transition, typical in athermal systems, are tested. Near dynamic arrest, the suspensions exhibit scaling exponents similar to those reported in Nordstrom et al., Phys. Rev. Lett., 2010, 105, 175701. The observation suggests that our system exhibits a glass transition near the onset of rigidity, but it also exhibits a jamming-like scaling further from the transition point. These observations are thought-provoking in light of recent theoretical and simulation findings, which show that suspension rheology across the full range of microgel particle experiments can exhibit both thermal and athermal mechanisms

    Improving the Reproducibility of PAN s Shared Tasks

    Full text link
    This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.This work was partially supported by the WIQ-EI IRSESproject (Grant No. 269180) within the FP7 Marie Curie action.Potthast, M.; Gollub, T.; Rangel, F.; Rosso, P.; Stamatatos, E.; Stein, B. (2014). Improving the Reproducibility of PAN s Shared Tasks. En Information Access Evaluation. Multilinguality, Multimodality, and Interaction: 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, September 15-18, 2014. Proceedings. Springer Verlag (Germany). 268-299. https://doi.org/10.1007/978-3-319-11382-1_22S26829

    Overview of the 5th International Competition on Plagiarism Detection

    Full text link
    Abstract This paper overviews 18 plagiarism detectors that have been evaluated within the fifth international competition on plagiarism detection at PAN 2013. We report on their performances for the two tasks source retrieval and text alignment of external plagiarism detection. Furthermore, we continue last year’s initiative to invite software submissions instead of run submissions, and, re-evaluate this year’s submissions on last year’s evaluation corpora and vice versa, thus demonstrating the benefits of software submissions in terms of reproducibility.Potthast, M.; Hagen, M.; Gollub, T.; Tippmann, M.; Kiesel, J.; Rosso, P.; Stamatatos, E.... (2013). Overview of the 5th International Competition on Plagiarism Detection. CLEF Conference on Multilingual and Multimodal Information Access Evaluation. 301-331. http://hdl.handle.net/10251/46635S30133

    Report on the Evaluation-as-a-Service (EaaS) Expert Workshop

    Get PDF
    In this report, we summarize the outcome of the "Evaluation-as-a-Service" workshop that was held on the 5th and 6th March 2015 in Sierre, Switzerland. The objective of the meeting was to bring together initiatives that use cloud infrastructures, virtual machines, APIs (Application Programming Interface) and related projects that provide evaluation of information retrieval or machine learning tools as a service

    The genetic architecture of the human cerebral cortex

    Get PDF
    The cerebral cortex underlies our complex cognitive capabilities, yet little is known about the specific genetic loci that influence human cortical structure. To identify genetic variants that affect cortical structure, we conducted a genome-wide association meta-analysis of brain magnetic resonance imaging data from 51,665 individuals. We analyzed the surface area and average thickness of the whole cortex and 34 regions with known functional specializations. We identified 199 significant loci and found significant enrichment for loci influencing total surface area within regulatory elements that are active during prenatal cortical development, supporting the radial unit hypothesis. Loci that affect regional surface area cluster near genes in Wnt signaling pathways, which influence progenitor expansion and areal identity. Variation in cortical structure is genetically correlated with cognitive function, Parkinson's disease, insomnia, depression, neuroticism, and attention deficit hyperactivity disorder

    Information Retrieval for the Digital Humanities

    No full text
    In ten chapters, this thesis presents information retrieval technology which is tailored to the research activities that arise in the context of corpus-based digital humanities projects. The presentation is structured by a conceptual research process that is introduced in Chapter 1. The process distinguishes a set of five research activities: research question generation, corpus acquisition, research question modeling, corpus annotation, and result dissemination. Each of these research activities elicits different information retrieval tasks with special challenges, for which algorithmic approaches are presented after an introduction of the core information retrieval concepts in Chapter 2. A vital concept in many of the presented approaches is the keyquery paradigm introduced in Chapter 3, which represents an operation that returns relevant search queries in response to a given set of input documents. Keyqueries are proposed in Chapter 4 for the recommendation of related work, and in Chapter 5 for improving access to aspects hidden in the long tail of search result lists. With pseudo-descriptions, a document expansion approach is presented in Chapter 6. The approach improves the retrieval performance for corpora where only bibliographic meta-data is originally available. In Chapter 7, the keyquery paradigm is employed to generate dynamic taxonomies for corpora in an unsupervised fashion. Chapter 8 turns to the exploration of annotated corpora, and presents scoped facets as a conceptual extension to faceted search systems, which is particularly useful in exploratory search settings. For the purpose of highlighting the major topical differences in a sequence of sub-corpora, an algorithm called topical sequence profiling is presented in Chapter 9. The thesis concludes with two pilot studies regarding the visualization of (re)search results for the means of successful result dissemination: a metaphoric interpretation of the information nutrition label, as well as the philosophical bodies, which are 3D-printed search results.In zehn Kapiteln stellt diese Arbeit Information-Retrieval-Technologien vor, die auf die Forschungsaktivitäten korpusbasierter Digital-Humanities-Projekte zugeschnitten sind. Die Arbeit strukturiert sich an Hand eines konzeptionellen Forschungsprozess der in Kapitel 1 vorgestellt wird. Der Prozess gliedert sich in fünf Forschungsaktivitäten: Die Generierung einer Forschungsfrage, die Korpusakquise, die Modellierung der Forschungsfrage, die Annotation des Korpus sowie die Verbreitung der Ergebnisse. Jede dieser Forschungsaktivitäten bringt unterschiedliche Information-Retrieval-Aufgaben mit besonderen Herausforderungen mit sich, für die, nach einer Einführung in die zentralen Information-Retrieval-Konzepte in Kapitel 2, algorithmische Ansätze vorgestellt werden. Ein wesentliches Konzept der vorgestellten Ansätze ist das in Kapitel 3 eingeführte Keyquery-Paradigma. Hinter dem Paradigma steht eine Suchoperation, die als Antwort auf eine gegebene Menge von Eingabedokumenten relevante Suchanfragen zurückgibt. Keyqueries werden in Kapitel 4 für die Empfehlung verwandter Arbeiten, in Kapitel 5 für die Verbesserung des Zugangs zu Aspekten im Long Tail von Suchergebnislisten vorgeschlagen. Mit Pseudo-Beschreibungen wird in Kapitel 6 ein Ansatz zur Document-Expansion vorgestellt. Der Ansatz verbessert die Suchleistung für Korpora, bei denen ursprünglich nur bibliografische Metadaten vorhanden sind. In Kapitel 7 wird das Keyquery-Paradigma eingesetzt, um auf unüberwachte Weise dynamische Taxonomien für Korpora zu generieren. Kapitel 8 wendet sich der Exploration von annotierten Korpora zu und stellt Scoped Facets als konzeptionelle Erweiterung von facettierten Suchsystemen vor, die besonders in explorativen Suchszenarien nützlich ist. Um die wichtigsten thematischen Unterschiede und Entwicklungen in einer Sequenz von Sub-Korpora hervorzuheben, wird in Kapitel 9 ein Algorithmus zum Topical Sequence Profiling vorgestellt. Die Arbeit schließt mit zwei Pilotstudien zur Visualisierung von Such- bzw. Forschungsergebnissen als Mittel für eine erfolgreiche Ergebnisverbreitung: eine metaphorische Interpretation des Information-Nutrition-Labels, sowie die philosophischen Körper, 3D-gedruckte Suchergebnisse
    corecore