105,463 research outputs found

    A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why?

    Full text link
    Understanding the fundamental concepts and trends in a scientific field is crucial for keeping abreast of its continuous advancement. In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP and utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data. Subsequently, we leverage this structure to measure the intensity of these relationships. By conducting extensive experiments on the ACL Anthology corpus, we demonstrate that our framework effectively uncovers evolutionary trends and the underlying causes for a wide range of NLP research topics. Specifically, we show that tasks and methods are primary drivers of research in NLP, with datasets following, while metrics have minimal impact.Comment: accepted at EMNLP 202

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    The HyperBagGraph DataEdron: An Enriched Browsing Experience of Multimedia Datasets

    Full text link
    Traditional verbatim browsers give back information in a linear way according to a ranking performed by a search engine that may not be optimal for the surfer. The latter may need to assess the pertinence of the information retrieved, particularly when s⋅\cdothe wants to explore other facets of a multi-facetted information space. For instance, in a multimedia dataset different facets such as keywords, authors, publication category, organisations and figures can be of interest. The facet simultaneous visualisation can help to gain insights on the information retrieved and call for further searches. Facets are co-occurence networks, modeled by HyperBag-Graphs -- families of multisets -- and are in fact linked not only to the publication itself, but to any chosen reference. These references allow to navigate inside the dataset and perform visual queries. We explore here the case of scientific publications based on Arxiv searches.Comment: Extension of the hypergraph framework shortly presented in arXiv:1809.00164 (possible small overlaps); use the theoretical framework of hb-graphs presented in arXiv:1809.0019

    Organising the knowledge space for software components

    Get PDF
    Software development has become a distributed, collaborative process based on the assembly of off-the-shelf and purpose-built components. The selection of software components from component repositories and the development of components for these repositories requires an accessible information infrastructure that allows the description and comparison of these components. General knowledge relating to software development is equally important in this context as knowledge concerning the application domain of the software. Both form two pillars on which the structural and behavioural properties of software components can be addressed. Form, effect, and intention are the essential aspects of process-based knowledge representation with behaviour as a primary property. We investigate how this information space for software components can be organised in order to facilitate the required taxonomy, thesaurus, conceptual model, and logical framework functions. Focal point is an axiomatised ontology that, in addition to the usual static view on knowledge, also intrinsically addresses the dynamics, i.e. the behaviour of software. Modal logics are central here – providing a bridge between classical (static) knowledge representation approaches and behaviour and process description and classification. We relate our discussion to the Web context, looking at Web services as components and the Semantic Web as the knowledge representation framewor

    Resource Discovery Services versus OPACs in Information Searching

    Get PDF
    In order to learn more about the strengths and weaknesses of Resource Discovery Services (RDS) and Online Public Access Catalogues (OPAC), we selected three universities from Spain which met the following requirements: 1) they taught Library and Information Science Studies, since our expertise in this scientific field would enable us to determine the relevance of the results retrieved, and 2) they had implemented different tools. We conducted several subject and author searches in RDS and OPAC, with the overall objective of exploring the efficiency of the search process and the relevance of the results retrieved when using each tool. The specific objectives were: to determine the comprehensiveness of the relevant results retrieved in a parallel search of OPAC and Resource Discovery Services; to analyze these results; to identify the filter functions for selection in RDS and OPAC; to determine the ease with which the results could be refined and/or the search could be widened by making use of facets, indexes, or advanced search features; and to assess the implementation and usefulness of recommendations of related documents and user opinions/reviews in the form of tags, comments and ratings

    Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

    Get PDF
    Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists’ research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop. Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of “software papers”, and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software
    • 

    corecore