12,969 research outputs found

    Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper

    Get PDF
    This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling. Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system. The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated.E

    Higher-Order Improvements of the Sieve Bootstrap for Fractionally Integrated Processes

    Full text link
    This paper investigates the accuracy of bootstrap-based inference in the case of long memory fractionally integrated processes. The re-sampling method is based on the semi-parametric sieve approach, whereby the dynamics in the process used to produce the bootstrap draws are captured by an autoregressive approximation. Application of the sieve method to data pre-filtered by a semi-parametric estimate of the long memory parameter is also explored. Higher-order improvements yielded by both forms of re-sampling are demonstrated using Edgeworth expansions for a broad class of statistics that includes first- and second-order moments, the discrete Fourier transform and regression coefficients. The methods are then applied to the problem of estimating the sampling distributions of the sample mean and of selected sample autocorrelation coefficients, in experimental settings. In the case of the sample mean, the pre-filtered version of the bootstrap is shown to avoid the distinct underestimation of the sampling variance of the mean which the raw sieve method demonstrates in finite samples, higher order accuracy of the latter notwithstanding. Pre-filtering also produces gains in terms of the accuracy with which the sampling distributions of the sample autocorrelations are reproduced, most notably in the part of the parameter space in which asymptotic normality does not obtain. Most importantly, the sieve bootstrap is shown to reproduce the (empirically infeasible) Edgeworth expansion of the sampling distribution of the autocorrelation coefficients, in the part of the parameter space in which the expansion is valid

    Using Search Term Positions for Determining Document Relevance

    Get PDF
    The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information. In particular, textual information is becoming increasingly available in electronic form. Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs. The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents. The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term. However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems. This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time. To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data. The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities. The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research

    Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval

    Full text link
    Query Expansion using Pseudo Relevance Feedback is a useful and a popular technique for reformulating the query. In our proposed query expansion method, we assume that relevant information can be found within a document near the central idea. The document is normally divided into sections, paragraphs and lines. The proposed method tries to extract keywords that are closer to the central theme of the document. The expansion terms are obtained by equi-frequency partition of the documents obtained from pseudo relevance feedback and by using tf-idf scores. The idf factor is calculated for number of partitions in documents. The group of words for query expansion is selected using the following approaches: the highest score, average score and a group of words that has maximum number of keywords. As each query behaved differently for different methods, the effect of these methods in selecting the words for query expansion is investigated. From this initial study, we extend the experiment to develop a rule-based statistical model that automatically selects the best group of words incorporating the tf-idf scoring and the 3 approaches explained here, in the future. The experiments were performed on FIRE 2011 Adhoc Hindi and English test collections on 50 queries each, using Terrier as retrieval engine

    Maximum Entropy for Gravitational Wave Data Analysis: Inferring the Physical Parameters of Core-Collapse Supernovae

    Full text link
    The gravitational wave signal arising from the collapsing iron core of a Type II supernova progenitor star carries with it the imprint of the progenitor's mass, rotation rate, degree of differential rotation, and the bounce depth. Here, we show how to infer the gravitational radiation waveform of a core collapse event from noisy observations in a network of two or more LIGO-like gravitational wave detectors and, from the recovered signal, constrain these source properties. Using these techniques, predictions from recent core collapse modeling efforts, and the LIGO performance during its S4 science run, we also show that gravitational wave observations by LIGO might have been sufficient to provide reasonable estimates of the progenitor mass, angular momentum and differential angular momentum, and depth of the core at bounce, for a rotating core collapse event at a distance of a few kpc.Comment: 44 pages, 12 figures; accepted version scheduled to appear in Ap J 1 April 200

    Evaluation of an evaluation list for model complexity

    Get PDF
    This study (‘WOt-werkdocument’) builds on the project ‘Evaluation model complexity’, in which a list has been developed to assess the ‘equilibrium’ of a model or database. This list compares the complexity of a model or database with the availability and quality of data and the application area. A model or database is said to be in equilibrium if the uncertainty in the predictions by the model or database is appropriately small for the intended application, while the data availability supports this complexity. In this study the prototype of the list is reviewed and tested by applying it to test cases. The review has been performed by modelling experts from within and outside Wageningen University & Research centre (Wageningen UR). The test cases have been selected form the scientific literature in order to evaluate the various elements of the list. The results are used to update the list to a new version

    Convective instability and transient growth in flow over a backward-facing step

    Get PDF
    Transient energy growths of two- and three-dimensional optimal linear perturbations to two-dimensional flow in a rectangular backward-facing-step geometry with expansion ratio two are presented. Reynolds numbers based on the step height and peak inflow speed are considered in the range 0–500, which is below the value for the onset of three-dimensional asymptotic instability. As is well known, the flow has a strong local convective instability, and the maximum linear transient energy growth values computed here are of order 80×103 at Re = 500. The critical Reynolds number below which there is no growth over any time interval is determined to be Re = 57.7 in the two-dimensional case. The centroidal location of the energy distribution for maximum transient growth is typically downstream of all the stagnation/reattachment points of the steady base flow. Sub-optimal transient modes are also computed and discussed. A direct study of weakly nonlinear effects demonstrates that nonlinearity is stablizing at Re = 500. The optimal three-dimensional disturbances have spanwise wavelength of order ten step heights. Though they have slightly larger growths than two-dimensional cases, they are broadly similar in character. When the inflow of the full nonlinear system is perturbed with white noise, narrowband random velocity perturbations are observed in the downstream channel at locations corresponding to maximum linear transient growth. The centre frequency of this response matches that computed from the streamwise wavelength and mean advection speed of the predicted optimal disturbance. Linkage between the response of the driven flow and the optimal disturbance is further demonstrated by a partition of response energy into velocity components
    corecore