117 research outputs found

    L'intertextualité dans les publications scientifiques

    No full text
    La base de données bibliographiques de l'IEEE contient un certain nombre de duplications avérées avec indication des originaux copiés. Ce corpus est utilisé pour tester une méthode d'attribution d'auteur. La combinaison de la distance intertextuelle avec la fenêtre glissante et diverses techniques de classification permet d'identifier ces duplications avec un risque d'erreur très faible. Cette expérience montre également que plusieurs facteurs brouillent l'identité de l'auteur scientifique, notamment des collectifs de chercheurs à géométrie variable et une forte dose d'intertextualité acceptée voire recherchée

    Who wrote this scientific text?

    No full text
    The IEEE bibliographic database contains a number of proven duplications with indication of the original paper(s) copied. This corpus is used to test a method for the detection of hidden intertextuality (commonly named "plagiarism"). The intertextual distance, combined with the sliding window and with various classification techniques, identifies these duplications with a very low risk of error. These experiments also show that several factors blur the identity of the scientific author, including variable group authorship and the high levels of intertextuality accepted, and sometimes desired, in scientific papers on the same topic

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    Meta Co-Training: Two Views are Better than One

    Full text link
    In many practical computer vision scenarios unlabeled data is plentiful, but labels are scarce and difficult to obtain. As a result, semi-supervised learning which leverages unlabeled data to boost the performance of supervised classifiers have received significant attention in recent literature. One major class of semi-supervised algorithms is co-training. In co-training two different models leverage different independent and sufficient "views" of the data to jointly make better predictions. During co-training each model creates pseudo labels on unlabeled points which are used to improve the other model. We show that in the common case when independent views are not available we can construct such views inexpensively using pre-trained models. Co-training on the constructed views yields a performance improvement over any of the individual views we construct and performance comparable with recent approaches in semi-supervised learning, but has some undesirable properties. To alleviate the issues present with co-training we present Meta Co-Training which is an extension of the successful Meta Pseudo Labels approach to two views. Our method achieves new state-of-the-art performance on ImageNet-10% with very few training resources, as well as outperforming prior semi-supervised work on several other fine-grained image classification datasets.Comment: 16 pages, 14 figures, 10 tables, for implementation see https://github.com/JayRothenberger/Meta-Co-Trainin

    Pervasive Data Access in Wireless and Mobile Computing Environments

    Get PDF
    The rapid advance of wireless and portable computing technology has brought a lot of research interests and momentum to the area of mobile computing. One of the research focus is on pervasive data access. with wireless connections, users can access information at any place at any time. However, various constraints such as limited client capability, limited bandwidth, weak connectivity, and client mobility impose many challenging technical issues. In the past years, tremendous research efforts have been put forth to address the issues related to pervasive data access. A number of interesting research results were reported in the literature. This survey paper reviews important works in two important dimensions of pervasive data access: data broadcast and client caching. In addition, data access techniques aiming at various application requirements (such as time, location, semantics and reliability) are covered

    Re-examining and re-conceptualising enterprise search and discovery capability: towards a model for the factors and generative mechanisms for search task outcomes.

    Get PDF
    Many organizations are trying to re-create the Google experience, to find and exploit their own corporate information. However, there is evidence that finding information in the workplace using search engine technology has remained difficult, with socio-technical elements largely neglected in the literature. Explication of the factors and generative mechanisms (ultimate causes) to effective search task outcomes (user satisfaction, search task performance and serendipitous encountering) may provide a first step in making improvements. A transdisciplinary (holistic) lens was applied to Enterprise Search and Discovery capability, combining critical realism and activity theory with complexity theories to one of the worlds largest corporations. Data collection included an in-situ exploratory search experiment with 26 participants, focus groups with 53 participants and interviews with 87 business professionals. Thousands of user feedback comments and search transactions were analysed. Transferability of findings was assessed through interviews with eight industry informants and ten organizations from a range of industries. A wide range of informational needs were identified for search filters, including a need to be intrigued. Search term word co-occurrence algorithms facilitated serendipity to a greater extent than existing methods deployed in the organization surveyed. No association was found between user satisfaction (or self assessed search expertise) with search task performance and overall performance was poor, although most participants had been satisfied with their performance. Eighteen factors were identified that influence search task outcomes ranging from user and task factors, informational and technological artefacts, through to a wide range of organizational norms. Modality Theory (Cybersearch culture, Simplicity and Loss Aversion bias) was developed to explain the study observations. This proposes that at all organizational levels there are tendencies for reductionist (unimodal) mind-sets towards search capability leading to fixes that fail. The factors and mechanisms were identified in other industry organizations suggesting some theory generalizability. This is the first socio-technical analysis of Enterprise Search and Discovery capability. The findings challenge existing orthodoxy, such as the criticality of search literacy (agency) which has been neglected in the practitioner literature in favour of structure. The resulting multifactorial causal model and strategic framework for improvement present opportunities to update existing academic models in the IR, LIS and IS literature, such as the DeLone and McLean model for information system success. There are encouraging signs that Modality Theory may enable a reconfiguration of organizational mind-sets that could transform search task outcomes and ultimately business performance

    Summarization from Medical Documents: A Survey

    Full text link
    Objective: The aim of this paper is to survey the recent work in medical documents summarization. Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc. Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics. Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applicationsComment: 21 pages, 4 table
    • …
    corecore