358 research outputs found

    Relating Dependent Terms in Information Retrieval

    Get PDF
    Les moteurs de recherche font partie de notre vie quotidienne. Actuellement, plus d’un tiers de la population mondiale utilise l’Internet. Les moteurs de recherche leur permettent de trouver rapidement les informations ou les produits qu'ils veulent. La recherche d'information (IR) est le fondement de moteurs de recherche modernes. Les approches traditionnelles de recherche d'information supposent que les termes d'indexation sont indĂ©pendants. Pourtant, les termes qui apparaissent dans le mĂȘme contexte sont souvent dĂ©pendants. L’absence de la prise en compte de ces dĂ©pendances est une des causes de l’introduction de bruit dans le rĂ©sultat (rĂ©sultat non pertinents). Certaines Ă©tudes ont proposĂ© d’intĂ©grer certains types de dĂ©pendance, tels que la proximitĂ©, la cooccurrence, la contiguĂŻtĂ© et de la dĂ©pendance grammaticale. Dans la plupart des cas, les modĂšles de dĂ©pendance sont construits sĂ©parĂ©ment et ensuite combinĂ©s avec le modĂšle traditionnel de mots avec une importance constante. Par consĂ©quent, ils ne peuvent pas capturer correctement la dĂ©pendance variable et la force de dĂ©pendance. Par exemple, la dĂ©pendance entre les mots adjacents "Black Friday" est plus importante que celle entre les mots "road constructions". Dans cette thĂšse, nous Ă©tudions diffĂ©rentes approches pour capturer les relations des termes et de leurs forces de dĂ©pendance. Nous avons proposĂ© des mĂ©thodes suivantes: ─ Nous rĂ©examinons l'approche de combinaison en utilisant diffĂ©rentes unitĂ©s d'indexation pour la RI monolingue en chinois et la RI translinguistique entre anglais et chinois. En plus d’utiliser des mots, nous Ă©tudions la possibilitĂ© d'utiliser bi-gramme et uni-gramme comme unitĂ© de traduction pour le chinois. Plusieurs modĂšles de traduction sont construits pour traduire des mots anglais en uni-grammes, bi-grammes et mots chinois avec un corpus parallĂšle. Une requĂȘte en anglais est ensuite traduite de plusieurs façons, et un score classement est produit avec chaque traduction. Le score final de classement combine tous ces types de traduction. Nous considĂ©rons la dĂ©pendance entre les termes en utilisant la thĂ©orie d’évidence de Dempster-Shafer. Une occurrence d'un fragment de texte (de plusieurs mots) dans un document est considĂ©rĂ©e comme reprĂ©sentant l'ensemble de tous les termes constituants. La probabilitĂ© est assignĂ©e Ă  un tel ensemble de termes plutĂŽt qu’a chaque terme individuel. Au moment d’évaluation de requĂȘte, cette probabilitĂ© est redistribuĂ©e aux termes de la requĂȘte si ces derniers sont diffĂ©rents. Cette approche nous permet d'intĂ©grer les relations de dĂ©pendance entre les termes. Nous proposons un modĂšle discriminant pour intĂ©grer les diffĂ©rentes types de dĂ©pendance selon leur force et leur utilitĂ© pour la RI. Notamment, nous considĂ©rons la dĂ©pendance de contiguĂŻtĂ© et de cooccurrence Ă  de diffĂ©rentes distances, c’est-Ă -dire les bi-grammes et les paires de termes dans une fenĂȘtre de 2, 4, 8 et 16 mots. Le poids d’un bi-gramme ou d’une paire de termes dĂ©pendants est dĂ©terminĂ© selon un ensemble des caractĂšres, en utilisant la rĂ©gression SVM. Toutes les mĂ©thodes proposĂ©es sont Ă©valuĂ©es sur plusieurs collections en anglais et/ou chinois, et les rĂ©sultats expĂ©rimentaux montrent que ces mĂ©thodes produisent des amĂ©liorations substantielles sur l'Ă©tat de l'art.Search engine has become an integral part of our life. More than one-third of world populations are Internet users. Most users turn to a search engine as the quick way to finding the information or product they want. Information retrieval (IR) is the foundation for modern search engines. Traditional information retrieval approaches assume that indexing terms are independent. However, terms occurring in the same context are often dependent. Failing to recognize the dependencies between terms leads to noise (irrelevant documents) in the result. Some studies have proposed to integrate term dependency of different types, such as proximity, co-occurrence, adjacency and grammatical dependency. In most cases, dependency models are constructed apart and then combined with the traditional word-based (unigram) model on a fixed importance proportion. Consequently, they cannot properly capture variable term dependency and its strength. For example, dependency between adjacent words “black Friday” is more important to consider than those of between “road constructions”. In this thesis, we try to study different approaches to capture term relationships and their dependency strengths. We propose the following methods for monolingual IR and Cross-Language IR (CLIR): We re-examine the combination approach by using different indexing units for Chinese monolingual IR, then propose the similar method for CLIR. In addition to the traditional method based on words, we investigate the possibility of using Chinese bigrams and unigrams as translation units. Several translation models from English words to Chinese unigrams, bigrams and words are created based on a parallel corpus. An English query is then translated in several ways, each producing a ranking score. The final ranking score combines all these types of translations. We incorporate dependencies between terms in our model using Dempster-Shafer theory of evidence. Every occurrence of a text fragment in a document is represented as a set which includes all its implied terms. Probability is assigned to such a set of terms instead of individual terms. During query evaluation phase, the probability of the set can be transferred to those of the related query, allowing us to integrate language-dependent relations to IR. We propose a discriminative language model that integrates different term dependencies according to their strength and usefulness to IR. We consider the dependency of adjacency and co-occurrence within different distances, i.e. bigrams, pairs of terms within text window of size 2, 4, 8 and 16. The weight of bigram or a pair of dependent terms in the final model is learnt according to a set of features. All the proposed methods are evaluated on several English and/or Chinese collections, and experimental results show these methods achieve substantial improvements over state-of-the-art baselines

    Index ordering by query-independent measures

    Get PDF
    There is an ever-increasing amount of data that is being produced from various data sources — this data must then be organised effectively if we hope to search though it. Traditional information retrieval approaches search through all available data in a particular collection in order to find the most suitable results, however, for particularly large collections this may be extremely time consuming. Our purposed solution to this problem is to only search a limited amount of the collection at query-time, in order to speed this retrieval process up. Although, in doing this we aim to limit the loss in retrieval efficacy (in terms of accuracy of results). The way we aim to do this is to firstly identify the most “important” documents within the collection, and then sort the documents within the collection in order of their "importance” in the collection. In this way we can choose to limit the amount of information to search through, by eliminating the documents of lesser importance, which should not only make the search more efficient, but should also limit any loss in retrieval accuracy. In this thesis we investigate various different query-independent methods that may indicate the importance of a document in a collection. The more accurate the measure is at determining an important document, the more effectively we can eliminate documents from the retrieval process - improving the query-throughput of the system, as well as providing a high level of accuracy in the returned results. The effectiveness of these approaches are evaluated using the datasets provided by the terabyte track at the Text REtreival Conference (TREC)

    Advances and Applications of Dezert-Smarandache Theory (DSmT) for Information Fusion (Collected Works), Vol. 4

    Get PDF
    The fourth volume on Advances and Applications of Dezert-Smarandache Theory (DSmT) for information fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics. The contributions (see List of Articles published in this book, at the end of the volume) have been published or presented after disseminating the third volume (2009, http://fs.unm.edu/DSmT-book3.pdf) in international conferences, seminars, workshops and journals. First Part of this book presents the theoretical advancement of DSmT, dealing with Belief functions, conditioning and deconditioning, Analytic Hierarchy Process, Decision Making, Multi-Criteria, evidence theory, combination rule, evidence distance, conflicting belief, sources of evidences with different importance and reliabilities, importance of sources, pignistic probability transformation, Qualitative reasoning under uncertainty, Imprecise belief structures, 2-Tuple linguistic label, Electre Tri Method, hierarchical proportional redistribution, basic belief assignment, subjective probability measure, Smarandache codification, neutrosophic logic, Evidence theory, outranking methods, Dempster-Shafer Theory, Bayes fusion rule, frequentist probability, mean square error, controlling factor, optimal assignment solution, data association, Transferable Belief Model, and others. More applications of DSmT have emerged in the past years since the apparition of the third book of DSmT 2009. Subsequently, the second part of this volume is about applications of DSmT in correlation with Electronic Support Measures, belief function, sensor networks, Ground Moving Target and Multiple target tracking, Vehicle-Born Improvised Explosive Device, Belief Interacting Multiple Model filter, seismic and acoustic sensor, Support Vector Machines, Alarm classification, ability of human visual system, Uncertainty Representation and Reasoning Evaluation Framework, Threat Assessment, Handwritten Signature Verification, Automatic Aircraft Recognition, Dynamic Data-Driven Application System, adjustment of secure communication trust analysis, and so on. Finally, the third part presents a List of References related with DSmT published or presented along the years since its inception in 2004, chronologically ordered

    Dependence in probabilistic modeling, Dempster-Shafer theory, and probability bounds analysis.

    Full text link

    Groundwater level prediction using a multiple objective genetic algorithm-grey relational analysis based weighted ensemble of anfis models

    Get PDF
    Predicting groundwater levels is critical for ensuring sustainable use of an aquifer’s limited groundwater reserves and developing a useful groundwater abstraction management strategy. The purpose of this study was to assess the predictive accuracy and estimation capability of various models based on the Adaptive Neuro Fuzzy Inference System (ANFIS). These models included Differential Evolution-ANFIS (DE-ANFIS), Particle Swarm Optimization-ANFIS (PSO-ANFIS), and traditional Hybrid Algorithm tuned ANFIS (HA-ANFIS) for the one-and multi-week forward forecast of groundwater levels at three observation wells. Model-independent partial autocorrelation functions followed by frequentist lasso regression-based feature selection approaches were used to recognize appropriate input variables for the prediction models. The performances of the ANFIS models were evaluated using various statistical performance evaluation indexes. The results revealed that the optimized ANFIS models performed equally well in predicting one-week-ahead groundwater levels at the observation wells when a set of various performance evaluation indexes were used. For improving prediction accuracy, a weighted-average ensemble of ANFIS models was proposed, in which weights for the individual ANFIS models were calculated using a Multiple Objective Genetic Algorithm (MOGA). The MOGA accounts for a set of benefits (higher values indicate better model performance) and cost (smaller values indicate better model performance) performance indexes calculated on the test dataset. Grey relational analysis was used to select the best solution from a set of feasible solutions produced by a MOGA. A MOGA-based individual model ranking revealed the superiority of DE-ANFIS (weight = 0.827), HA-ANFIS (weight = 0.524), and HAANFIS (weight = 0.697) at observation wells GT8194046, GT8194048, and GT8194049, respectively. Shannon’s entropy-based decision theory was utilized to rank the ensemble and individual ANFIS models using a set of performance indexes. The ranking result indicated that the ensemble model outperformed all individual models at all observation wells (ranking value = 0.987, 0.985, and 0.995 at observation wells GT8194046, GT8194048, and GT8194049, respectively). The worst performers were PSO-ANFIS (ranking value = 0.845), PSO-ANFIS (ranking value = 0.819), and DE-ANFIS (ranking value = 0.900) at observation wells GT8194046, GT8194048, and GT8194049, respectively. The generalization capability of the proposed ensemble modelling approach was evaluated for forecasting 2-, 4-, 6-, and 8-weeks ahead groundwater levels using data from GT8194046. The evaluation results confirmed the useability of the ensemble modelling for forecasting groundwater levels at higher forecasting horizons. The study demonstrated that the ensemble approach may be successfully used to predict multi-week-ahead groundwater levels, utilizing previous lagged groundwater levels as inputs

    An Axiomatic Framework for Propagating Uncertainty in Directed Acyclic Networks

    Get PDF
    This paper presents an axiomatic system for propagating uncertainty in Pearl's causal networks, (Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1988 [7]). The main objective is to study all aspects of knowledge representation and reasoning in causal networks from an abstract point of view, independent of the particular theory being used to represent information (probabilities, belief functions or upper and lower probabilities). This is achieved by expressing concepts and algorithms in terms of valuations, an abstract mathematical concept representing a piece of information, introduced by Shenoy and Sharer [1, 2]. Three new axioms are added to Shenoy and Shafer's axiomatic framework [1, 2], for the propagation of general valuations in hypertrees. These axioms allow us to address from an abstract point of view concepts such as conditional information (a generalization of conditional probabilities) and give rules relating the decomposition of global information with the concept of independence (a generalization of probability rules allowing the decomposition of a bidimensional distribution with independent marginals in the product of its two marginals). Finally, Pearl's propagation algorithms are also developed and expressed in terms of operations with valuations.Commission of the European Communities under ESPRIT BRA 3085: DRUM

    On the semantics of fuzzy logic

    Get PDF
    AbstractThis paper presents a formal characterization of the major concepts and constructs of fuzzy logic in terms of notions of distance, closeness, and similarity between pairs of possible worlds. The formalism is a direct extension (by recognition of multiple degrees of accessibility, conceivability, or reachability) of the najor modal logic concepts of possible and necessary truth.Given a function that maps pairs of possible worlds into a number between 0 and 1, generalizing the conventional concept of an equivalence relation, the major constructs of fuzzy logic (conditional and unconditioned possibility distributions) are defined in terms of this similarity relation using familiar concepts from the mathematical theory of metric spaces. This interpretation is different in nature and character from the typical, chance-oriented, meanings associated with probabilistic concepts, which are grounded on the mathematical notion of set measure. The similarity structure defines a topological notion of continuity in the space of possible worlds (and in that of its subsets, i.e., propositions) that allows a form of logical “extrapolation” between possible worlds.This logical extrapolation operation corresponds to the major deductive rule of fuzzy logic — the compositional rule of inference or generalized modus ponens of Zadeh — an inferential operation that generalizes its classical counterpart by virtue of its ability to be utilized when propositions representing available evidence match only approximately the antecedents of conditional propositions. The relations between the similarity-based interpretation of the role of conditional possibility distributions and the approximate inferential procedures of Baldwin are also discussed.A straightforward extension of the theory to the case where the similarity scale is symbolic rather than numeric is described. The problem of generating similarity functions from a given set of possibility distributions, with the latter interpreted as defining a number of (graded) discernibility relations and the former as the result of combining them into a joint measure of distinguishability between possible worlds, is briefly discussed

    Corporate Sustainability Reporting: Investigation of Assurance Process, Assurance Characteristics and Assurance Frameworks Used

    Get PDF
    This dissertation is on assured sustainability reporting. It has three parts that are titled as follows: Part 1. Planning Assurance Services for Sustainability Reporting: An Analysis of Cost versus Assurance in Audit Evidence, Part 2. The Development of Worldwide Assured Sustainability reporting, and, Part 3. Assurance on Sustainability Reports: A Study of Factors Influencing the Selection of an Assurance Framework. Of the above, Part 1 is complete and ready for submission to a journal and Part 2 has been accepted for publication in Australian Accounting Review. Part 1 investigates providing assurance on sustainability reporting and demonstrates how an evidential reasoning framework can enhance providing such a service. It develops a framework based on the Dempster-Shafer theory of belief functions for the purpose of audit program planning and cost analysis. A sensitivity analysis is used to demonstrate the value of the model based on seven scenarios. The cost to perform an audit procedure is assumed to increase exponentially with the increase in the targeted level of assurance and audit procedures are assumed to exhibit inherent limitations as to the maximum level of assurance they can be expected to provide. Results demonstrate as follows: i. the importance of the assurance provider selecting audit procedures that directly relate to the highest level assertions, ii. the effects of discovering during the audit that certain audit tests are less diagnostic than anticipated, iii. the effects of obtaining mixed audit evidence, iv. the effects of obtaining strong evidence that implies that certain assertions are not fairly stated and v. the effects of planning to provide different levels of assurance across assertions Each of these findings demonstrates the value of utilizing a formal evidential reasoning and cost minimization approach in providing assurance on sustainability reports. Part 2 investigates the development of assured sustainability reports (SRs) during this century's first decade. More specifically, it presents basic descriptive data on a sample of 148 SRs published in 2006 and 2007 and contrasts this sample with the sample discussed in Mock, Strohm, and Swartz (MSS 2007). The prior study examined a sample of 130 assured SRs issued between 2002 and 2004. Both samples provide information about the nature of sustainability reports, allowing us to investigate important questions such as which countries and industries are more likely to have an assurance statement, what levels of assurance are provided, and what factors affect the level of assurance provided. In addition to providing descriptive data relative to the above questions, we run logistic regressions where the dependent variable is whether a Big4 firm provided the assurance, for both periods being considered. Some important differences are observed related to whether the assurance provided applies to both the quantitative and qualitative assertions made in the report (significantly negatively associated with Big4 in the 2002-2004 period, but not significant in 2006-2007), whether the report uses symbols to identify assured statements (significantly positively associated with Big4 in the 2002- 2004 period, but not significant in 2006-2007), and whether the procedures used are disclosed (not significant in 2002-2004, but significantly positively associated with Big4 in 2006-2007). Part 3 examines the factors that influence the assurance provider in the selection of an assurance framework for the purpose of assuring sustainability reports where assurance is voluntarily sought by the organization issuing the sustainability report. These frameworks are not generally accepted and no authority mandates these frameworks. Audit-firm specific, client-company specific and country level factors are considered as explanatory variables. Multi-level modeling is used for analysis since companies are nested within countries. Analysis suggests that the following country levels factors have significant impact on the selection of the type of assurance frameworks (i.e. international frameworks or regional frameworks): level of disclosure, market capitalization and the level of carbon dioxide emissions. Further, analysis suggests that two client company characteristics also have a significant impact: whether a company has foreign operations, and, the level of growth opportunities. One of the important ways of adding credibility to sustainability reports published by companies is obtaining assurance on them (Simnett, Vanstraelen and Chua 2009). Hence, the type of assurance framework used (International versus Regional) may indicate assurance provider preferences. Use of international frameworks (ISAE3000 and AA1000AS) may indicate a trend towards standardization of assurance frameworks and ease of comparison. On the other hand, use of regional assurance frameworks may indicate a possible country-of-origin effect. Factors that influence the selection of assurance frameworks and the type of assurance framework selected are important because it offers insights into trends and opportunities that shape the growing assurance market in the sustainability area. This could aid companies, assurance providers, standard setting bodies and investors respond to a changing environment in a meaningful way
    • 

    corecore