12,645 research outputs found

    Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing

    Get PDF
    Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers

    Closing the loop: assisting archival appraisal and information retrieval in one sweep

    Get PDF
    In this article, we examine the similarities between the concept of appraisal, a process that takes place within the archives, and the concept of relevance judgement, a process fundamental to the evaluation of information retrieval systems. More specifically, we revisit selection criteria proposed as result of archival research, and work within the digital curation communities, and, compare them to relevance criteria as discussed within information retrieval's literature based discovery. We illustrate how closely these criteria relate to each other and discuss how understanding the relationships between the these disciplines could form a basis for proposing automated selection for archival processes and initiating multi-objective learning with respect to information retrieval

    Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective

    Get PDF
    The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in NLP into a single, widely-applicable methodology. Following these best practices is crucial to strengthen experimental evidence, improve reproducibility and support scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs

    Sustainability and transparency in computational cognitive neuroscience

    No full text
    In this talk, I will discuss open science practices that aim to foster sustainability and transparency in computational cognitive neuroscience. First, I will review recent community efforts that aim to ease data sharing and analytical reproducibility, such as the reports of the OHBM Committees on Best Practice in Data Analysis and Sharing (COBIDAS) and the Brain Imaging Data Structures (BIDS). Second, I will discuss neuroimaging data sharing strategies in the light of ethical and legal constraints, such as the European General Data Protection Regulation (GDPR). Finally, I will discuss some common-sense guidelines for day-to-day research practice that aim to maximize the societal impact of computational cognitive neuroscience

    Compatibility between Text Mining and Qualitative Research in the Perspectives of Grounded Theory, Content Analysis, and Reliability

    Get PDF
    The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated procedure, the text miner might add, delete, and revise the initial categories in an iterative fashion. Second, text mining is similar to content analysis, which also aims to extract common themes and threads by counting words. Although both of them utilize computer algorithms, text mining is characterized by its capability of processing natural languages. Last, the criteria of sound text mining adhere to those in qualitative research in terms of consistency and replicability

    Open Science in Software Engineering

    Full text link
    Open science describes the movement of making any research artefact available to the public and includes, but is not limited to, open access, open data, and open source. While open science is becoming generally accepted as a norm in other scientific disciplines, in software engineering, we are still struggling in adapting open science to the particularities of our discipline, rendering progress in our scientific community cumbersome. In this chapter, we reflect upon the essentials in open science for software engineering including what open science is, why we should engage in it, and how we should do it. We particularly draw from our experiences made as conference chairs implementing open science initiatives and as researchers actively engaging in open science to critically discuss challenges and pitfalls, and to address more advanced topics such as how and under which conditions to share preprints, what infrastructure and licence model to cover, or how do it within the limitations of different reviewing models, such as double-blind reviewing. Our hope is to help establishing a common ground and to contribute to make open science a norm also in software engineering.Comment: Camera-Ready Version of a Chapter published in the book on Contemporary Empirical Methods in Software Engineering; fixed layout issue with side-note

    Achieving Replicability: Is There Life for Our Experiments After Publication?

    Get PDF
    Metaheuristics are algorithmic schemes that ease the derivation of novel algorithms to solve optimization problems. These algorithms are typically approximated and stochastic, leading to the preeminence of experimentation as the mean of supporting claims in research and applications. However, the huge number of variants and parameters of most metaheuristics, the ambiguity of natural language used in papers, and the lack of widely accepted reporting standards threatens the replicability of those experiments. This problem, that has been identified in the literature by several authors, significantly hinders the construction of a complete and cohesive body of knowledge on the behavior of metaheuristics. This paper proposes a set of minimum information guidelines for reporting metaheuristic experiments, and an experiment description language that supports the meeting of those guidelines. By using this language, metaheuristic optimization experiments are described in a toolindependent and unambiguous way, while maintaining readability and succinctness. Those contributions pave the way for replication using different problem instances and parameters, bringing a new life to metaheuristic experiments after publication.Ministerio de Ciencia e Innovación TIN2009-07366Ministerio de Economía y Competitividad TIN2012-32273Junta de Andalucía P07-TIC-2533Junta de Andalucía TIC-590

    Should I disclose my dataset? Caveats between reproducibility and individual data rights

    Full text link
    Natural language processing techniques have helped domain experts solve legal problems. Digital availability of court documents increases possibilities for researchers, who can access them as a source for building datasets -- whose disclosure is aligned with good reproducibility practices in computational research. Large and digitized court systems, such as the Brazilian one, are prone to be explored in that sense. However, personal data protection laws impose restrictions on data exposure and state principles about which researchers should be mindful. Special caution must be taken in cases with human rights violations, such as gender discrimination, over which we elaborate as an example of interest. We present legal and ethical considerations on the issue, as well as guidelines for researchers dealing with this kind of data and deciding whether to disclose it.Comment: 10 pages, 2 figures. To be published in the 4th Workshop on Natural Legal Language Processing (NLLP 2022), co-located with the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022
    • …
    corecore