832 research outputs found

    Explicit diversification of event aspects for temporal summarization

    Get PDF
    During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

    A new unsupervised feature selection method for text clustering based on genetic algorithms

    Get PDF
    Nowadays a vast amount of textual information is collected and stored in various databases around the world, including the Internet as the largest database of all. This rapidly increasing growth of published text means that even the most avid reader cannot hope to keep up with all the reading in a field and consequently the nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. Text clustering is one of the most important areas in text mining, which includes text preprocessing, dimension reduction by selecting some terms (features) and finally clustering using selected terms. Feature selection appears to be the most important step in the process. Conventional unsupervised feature selection methods define a measure of the discriminating power of terms to select proper terms from corpus. However up to now the valuation of terms in groups has not been investigated in reported works. In this paper a new and robust unsupervised feature selection approach is proposed that evaluates terms in groups. In addition a new Modified Term Variance measuring method is proposed for evaluating groups of terms. Furthermore a genetic based algorithm is designed and implemented for finding the most valuable groups of terms based on the new measure. These terms then will be utilized to generate the final feature vector for the clustering process . In order to evaluate and justify our approach the proposed method and also a conventional term variance method are implemented and tested using corpus collection Reuters-21578. For a more accurate comparison, methods have been tested on three corpuses and for each corpus clustering task has been done ten times and results are averaged. Results of comparing these two methods are very promising and show that our method produces better average accuracy and F1-measure than the conventional term variance method

    Entity Query Feature Expansion Using Knowledge Base Links

    Get PDF
    Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

    Knowledge Discovery in Databases: An Information Retrieval Perspective

    Get PDF
    The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided

    The development of extruded meat alternatives using Maillard-reacted beef bone hydrolysate and plant proteins : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Food Technology at Massey University, Palmerston North, New Zealand

    Get PDF
    Figures are re-used with permission.This research thesis aimed to process beef bone extract into a flavoursome protein ingredient to be added to extruded meat analogues to form meat alternatives and study their impact on the structural, textural, and sensory properties of meat alternatives. The thesis consists of three main parts. In the first part, two methods namely enzymatic hydrolysis and Maillard reaction (MR) treatments were evaluated for their suitability of modifying the flavour character of beef bone extract to become flavoursome protein ingredients. The second part studied the effects of soy protein concentrate (SPC) to wheat gluten (WG) ratio as a way of improving the structural and textural properties of current extruded meat analogues. The third part studied the effects of flavoursome protein ingredient (i.e. Maillard-reacted beef bone hydrolysate) with plant proteins on extruded meat alternatives. It also investigated the effects of moisture contents on extruded meat alternatives and their application in sausages. To begin, an experimental study on the effects of enzymatic hydrolysis treatments (i.e. single, simultaneous and sequential) on the physicochemical properties of beef bone extract using Protamex®, bromelain, and Flavourzyme® was conducted. Next, the changes in the physicochemical properties and volatile compounds of beef bone hydrolysates during heat treatment as a result of the MR were investigated. Beef bone hydrolysates were combined with ribose in aqueous solutions and heated at 113°C to produce Maillard reaction products (MRPs). Results showed that Flavourzyme® was the most effective in increasing the proportion of low Mw peptides, reducing viscosity and enhancing the flavour intensity of beef bone extract. Concurrently, the effects of SPC to WG ratio at a constant mass of SPC and WG on the physicochemical properties of extruded meat analogues were studied. Meat analogues containing 30%WG showed the highest degree of texturisation, fibrous structure, hardness and chewiness using instrumental and sensory analysis. For the third part of this research thesis, the effects of flavoursome protein ingredient (i.e. Flavourzyme®-MRP) at different concentrations (0, 10, 20, 30 and 40% wet weight) with plant proteins on extruded meat alternatives were investigated. Meat alternatives containing 20%MRP obtained the highest sensory scores for appearance, meaty aroma, meaty taste, and overall acceptability. Results showed that the addition of MRP with soy protein concentrate and wheat gluten to produce meat alternatives changed the textural, structural, and sensory properties significantly. The effects of moisture content (MC) on the physicochemical properties of extruded meat alternatives made from Flavourzyme®-MRP and plant proteins were studied. Samples were extruded at different dry feed rate of 1.8, 2.2, 2.6 and 3.0 kg/h to obtain MC of 60%MC, 56%MC, 52%MC and 49%MC, respectively. Meat alternatives at 49%MC were the closest in terms of both textural and microstructural properties to reference sample, boiled chicken breast. Results showed that the change in MC as a process parameter played an important role in the formation of fibrous structure in extruded meat alternatives. Lastly, the physicochemical properties of sausages made from extruded meat alternatives at different MC were conducted. Five sausages made from meat alternatives (S49%MC, S52%MC, S56%MC and S60%MC) and chicken breast (SCB) as a reference sample were prepared. Results showed that S49%MC had the highest sensory scores among all sausages made from meat alternatives. However, SCB obtained the highest sensory scores for all attributes except for appearance among all sausages at a 95% confidence level. Overall, the present work demonstrated that a flavoursome protein ingredient (i.e. Flavourzyme®-MRP) from low-value meat by-product (i.e. beef bone extract) can be successfully incorporated into extruded meat analogues to form meat alternatives with high aroma and taste quality while maintaining fibrous structure. However, further work needs to be done to improve the textural and sensory properties of sausages made from extruded meat alternatives

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task

    Building realist programme theory for large complex and messy interventions

    Get PDF
    Programme theory, that is, the specific idea about how a programme causes the intended or observed outcomes, should be the central aspect of any realist evaluation or synthesis. The methods used for explicating or building initial rough programme theories in realist research are varied and arguably often underreported. In addition, pre-existing psychological and sociological theories, at a higher level of abstraction, could be used to a greater extent to inform their development. This article illustrates a method for building initial rough programme theories for use in realist research evaluation and synthesis. This illustration involves showing how the initial rough programme theories were developed in a realist evaluation concerning sexual health services for young people. In this evaluation, a broad framework of abstract theories was constructed early in the process to support initial rough programme theory building and frame more specific programme theories as they were developed. These abstract theories were selected to support theorising at macro, meso and micro levels of social structure. The paper discusses the benefits of using this method to build initial theories for particular types of interventions which are large, complex and messy. It also addresses challenges relating to the selection of suitable theories

    Data mining in soft computing framework: a survey

    Get PDF
    The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included

    Building realist programme theory for large complex and messy interventions

    Get PDF
    Programme theory, that is, the specific idea about how a programme causes the intended or observed outcomes, should be the central aspect of any realist evaluation or synthesis. The methods used for explicating or building initial rough programme theories in realist research are varied and arguably often underreported. In addition, pre-existing psychological and sociological theories, at a higher level of abstraction, could be used to a greater extent to inform their development. This article illustrates a method for building initial rough programme theories for use in realist research evaluation and synthesis. This illustration involves showing how the initial rough programme theories were developed in a realist evaluation concerning sexual health services for young people. In this evaluation, a broad framework of abstract theories was constructed early in the process to support initial rough programme theory building and frame more specific programme theories as they were developed. These abstract theories were selected to support theorising at macro, meso and micro levels of social structure. The paper discusses the benefits of using this method to build initial theories for particular types of interventions which are large, complex and messy. It also addresses challenges relating to the selection of suitable theories
    corecore