20 research outputs found

    The linguistic patterns and rhetorical structure of citation context : an approach using n-grams

    Full text link
    Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

    Using contextual information to understand searching and browsing behavior

    Get PDF
    There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications

    DARIAH and the Benelux

    Get PDF

    Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC

    Get PDF
    A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC

    Mining arguments in scientific abstracts: Application to argumentative quality assessment

    Get PDF
    Argument mining consists in the automatic identification of argumentative structures in natural language, a task that has been recognized as particularly challenging in the scientific domain. In this work we propose SciARG, a new annotation scheme, and apply it to the identification of argumentative units and relations in abstracts in two scientific disciplines: computational linguistics and biomedicine, which allows us to assess the applicability of our scheme to different knowledge fields. We use our annotated corpus to train and evaluate argument mining models in various experimental settings, including single and multi-task learning. We investigate the possibility of leveraging existing annotations, including discourse relations and rhetorical roles of sentences, to improve the performance of argument mining models. In particular, we explore the potential offered by a sequential transfer- learning approach in which supplementary training tasks are used to fine-tune pre-trained parameter-rich language models. Finally, we analyze the practical usability of the automatically-extracted components and relations for the prediction of argumentative quality dimensions of scientific abstracts.Agencia Nacional de Investigación e InnovaciónMinisterio de Economía, Industria y Competitividad (España

    Textual Analysis of Intangible Information

    Get PDF
    Traditionally, equity investors have relied upon the information reported in firms’ financial accounts to make their investment decisions. Due to the conservative nature of accounting standards, firms cannot value their intangible assets such as corporate culture, brand value and reputation. Investors’ efforts to collect such information have been hampered by the voluntary nature of Corporate Social Responsibility (CSR) reporting standards, which have resulted in the publication of inconsistent, stale and incomplete information across firms. In short, information on intangible assets is less salient to investors compared to accounting information because it is more costly to collect, process and analyse. In this thesis we design an automated approach to collect and quantify information on firms’ intangible assets by drawing upon techniques commonly adopted in the fields of Natural Language Processing (NLP) and Information Retrieval. The exploitation of unstructured data available on the Web holds promise for investors seeking to integrate a wider variety of information into their investment processes. The objectives of this research are: 1) to draw upon textual analysis methodologies to measure intangible information from a range of unstructured data sources, 2) to integrate intangible information and accounting information into an investment analysis framework, 3) evaluate the merits of unstructured data for the prediction of firms’ future earnings
    corecore