6 research outputs found

    IRIT at TREC Knowledge Base Acceleration 2013: Cumulative Citation Recommendation Task

    Get PDF
    International audienceThis paper describes the IRIT lab participation to the Cumulative Citation Recommendation task of the TREC 2013 Knowledge Base Acceleration Track. In this task, we are asked to implement a system which aims to detect “Vital” documents that a human would want to cite when updating the Wikipedia article for the target entity. Our approach is built on two steps. First, for each topic (entity), we retrieve a set of potential relevant documents containing at least one entity mention. These documents are then classified using a supervised learning algorithm to identify which ones are vital. We submitted three runs using different combinations of features. Obtained results are presented and discussed

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

    Towards Population of Knowledge Bases from Conversational Sources

    Get PDF
    With an increasing amount of data created daily, it is challenging for users to organize and discover information from massive collections of digital content (e.g., text and speech). The population of knowledge bases requires linking information from unstructured sources (e.g., news articles and web pages) to structured external knowledge bases (e.g., Wikipedia), which has the potential to advance information archiving and access, and to support knowledge discovery and reasoning. Because of the complexity of this task, knowledge base population is composed of multiple sub-tasks, including the entity linking task, defined as linking the mention of entities (e.g., persons, organizations, and locations) found in documents to their referents in external knowledge bases and the event task, defined as extracting related information for events that should be entered in the knowledge base. Most prior work on tasks related to knowledge base population has focused on dissemination-oriented sources written in the third person (e.g., new articles) that benefit from two characteristics: the content is written in formal language and is to some degree self-contextualized, and the entities mentioned (e.g., persons) are likely to be widely known to the public so that rich information can be found from existing general knowledge bases (e.g., Wikipedia and DBpedia). The work proposed in this thesis focuses on tasks related to knowledge base population for conversational sources written in the first person (e.g., emails and phone recordings), which offers new challenges. One challenge is that most conversations (e.g., 68% of the person names and 53% of the organization names in Enron emails) refer to entities that are known to the conversational participants but not widely known. Thus, existing entity linking techniques relying on general knowledge bases are not appropriate. Another challenge is that some of the shared context between participants in first-person conversations may be implicit and thus challenging to model, increasing the difficulty, even for human annotators, of identifying the true referents. This thesis focuses on several tasks relating to the population of knowledge bases for conversational content: the population of collection-specific knowledge bases for organization entities and meetings from email collections; the entity linking task that resolves the mention of three types of entities (person, organization, and location) found in both conversational text (emails) and speech (phone recordings) sources to multiple knowledge bases, including a general knowledge base built from Wikipedia and collection-specific knowledge bases; the meeting linking task that links meeting-related email messages to the referenced meeting entries in the collection-specific meeting knowledge base; and speaker identification techniques to improve the entity linking task for phone recordings without known speakers. Following the model-based evaluation paradigm, three collections (namely, Enron emails, Avocado emails, and Enron phone recordings) are used as the representations of conversational sources, new test collections are created for each task, and experiments are conducted for each task to evaluate the efficacy of the proposed methods and to provide a comparison to existing state-of-the-art systems. This work has implications in the research fields of e-discovery, scientific collaboration, speaker identification, speech retrieval, and privacy protection

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Factors Influencing Customer Satisfaction towards E-shopping in Malaysia

    Get PDF
    Online shopping or e-shopping has changed the world of business and quite a few people have decided to work with these features. What their primary concerns precisely and the responses from the globalisation are the competency of incorporation while doing their businesses. E-shopping has also increased substantially in Malaysia in recent years. The rapid increase in the e-commerce industry in Malaysia has created the demand to emphasize on how to increase customer satisfaction while operating in the e-retailing environment. It is very important that customers are satisfied with the website, or else, they would not return. Therefore, a crucial fact to look into is that companies must ensure that their customers are satisfied with their purchases that are really essential from the ecommerce’s point of view. With is in mind, this study aimed at investigating customer satisfaction towards e-shopping in Malaysia. A total of 400 questionnaires were distributed among students randomly selected from various public and private universities located within Klang valley area. Total 369 questionnaires were returned, out of which 341 questionnaires were found usable for further analysis. Finally, SEM was employed to test the hypotheses. This study found that customer satisfaction towards e-shopping in Malaysia is to a great extent influenced by ease of use, trust, design of the website, online security and e-service quality. Finally, recommendations and future study direction is provided. Keywords: E-shopping, Customer satisfaction, Trust, Online security, E-service quality, Malaysia

    A Systematic Review and Meta-Analysis of the Incidence of Injury in Professional Female Soccer

    Get PDF
    The epidemiology of injury in male professional football is well documented and has been used as a basis to monitor injury trends and implement injury prevention strategies. There are no systematic reviews that have investigated injury incidence in women’s professional football. Therefore, the extent of injury burden in women’s professional football remains unknown. PURPOSE: The primary aim of this study was to calculate an overall incidence rate of injury in senior female professional soccer. The secondary aims were to provide an incidence rate for training and match play. METHODS: PubMed, Discover, EBSCO, Embase and ScienceDirect electronic databases were searched from inception to September 2018. Two reviewers independently assessed study quality using the Strengthening the Reporting of Observational Studies in Epidemiology statement using a 22-item STROBE checklist. Seven prospective studies (n=1137 professional players) were combined in a pooled analysis of injury incidence using a mixed effects model. Heterogeneity was evaluated using the Cochrane Q statistic and I2. RESULTS: The epidemiological incidence proportion over one season was 0.62 (95% CI 0.59 - 0.64). Mean total incidence of injury was 3.15 (95% CI 1.54 - 4.75) injuries per 1000 hours. The mean incidence of injury during match play was 10.72 (95% CI 9.11 - 12.33) and during training was 2.21 (95% CI 0.96 - 3.45). Data analysis found a significant level of heterogeneity (total Incidence, X2 = 16.57 P < 0.05; I2 = 63.8%) and during subsequent sub group analyses in those studies reviewed (match incidence, X2 = 76.4 (d.f. = 7), P <0.05; I2 = 90.8%, training incidence, X2 = 16.97 (d.f. = 7), P < 0.05; I2 = 58.8%). Appraisal of the study methodologies revealed inconsistency in the use of injury terminology, data collection procedures and calculation of exposure by researchers. Such inconsistencies likely contribute to the large variance in the incidence and prevalence of injury reported. CONCLUSIONS: The estimated risk of sustaining at least one injury over one football season is 62%. Continued reporting of heterogeneous results in population samples limits meaningful comparison of studies. Standardising the criteria used to attribute injury and activity coupled with more accurate methods of calculating exposure will overcome such limitations
    corecore