10 research outputs found

    User Preference Web Search -- Experiments with a System Connecting Web and User

    Get PDF
    We present models, methods, implementations and experiments with a system enabling personalized web search for many users with different preferences. The system consists of a web information extraction part, a text search engine, a middleware supporting top-k answers and a user interface for querying and evaluation of search results. We integrate several tools (implementing our models and methods) into one framework connecting user with the web. The model represents user preferences with fuzzy sets and fuzzy logic, here understood as a scoring describing user satisfaction. This model can be acquired with explicit or implicit methods. Model-theoretic semantics is based on fuzzy description logic f-EL. User preference learning is based on our model of fuzzy inductive logic programming. Our system works both for English and Slovak resources. The primary application domain are job offers and job search, however we show extension to mutual investment funds search and a possibility of extension into other application domains. Our top-k search is optimized with own heuristics and repository with special indexes. Our model was experimentally implemented, the integration was tested and is web accessible. We focus on experiments with several users and measure their satisfaction according to correlation coefficients

    Investigation into the use of chicken manure to enhance the biodegradation of total petroleum hydrocarbons.

    Get PDF
    The use of chicken manure to enhance the biodegradation of total petroleum hydrocarbons (TPH) in composting bioremediation was investigated to help develop an improved understanding of the chemical, biological and toxicological processes involved. Treatability studies combined with an extensive suite of laboratory analyses were designed and undertaken whereby naturally contaminated oil refinery sludge was either amended with chicken manure or left unamended for a total duration of 90 days. The effects of chicken manure on the biodegradation of fractionated aliphatic and aromatic hydrocarbons, differentiation between biostimulation and bioaugmentation effects of chicken manure, and the potentially detrimental effects of chicken manure on the bioremediation process through the introduction and adverse proliferation of non-hydrocarbon degrading microorganisms and the potential introduction of compounds that may elicit toxic effects on hydrocarbon degrading microorganisms were monitored over the duration of the treatability studies using a combination of chemical, toxicity and microbial laboratory analyses. This study found that the addition of chicken manure enhanced the degradation of C9-C12 aliphatic hydrocarbons. It was found that this reflects a combination of biostimulation and bioaugmentation effects and that volatilisation was minimal. This investigation also found that the addition of chicken manure can have positive effects on bioremediation as evident by the enhancement of conditions for microbial growth and/or activity, introduction and enhanced growth of potential hydrocarbon degrading bacterial populations, and the enhanced reduction in toxicity of methanol extractable hydrocarbons. However, it was found that the addition of chicken manure was seen to cause an increase in toxicity of total leachable compounds, which may present a risk to TPH biodegradation through potential toxic effects on hydrocarbon degrading microorganisms. It is concluded from this study that there is a potential for the use of chicken manure to enhance TPH biodegradation, but that this is likely restricted to low molecular weight hydrocarbons

    Constructive Reasoning for Semantic Wikis

    Get PDF
    One of the main design goals of social software, such as wikis, is to support and facilitate interaction and collaboration. This dissertation explores challenges that arise from extending social software with advanced facilities such as reasoning and semantic annotations and presents tools in form of a conceptual model, structured tags, a rule language, and a set of novel forward chaining and reason maintenance methods for processing such rules that help to overcome the challenges. Wikis and semantic wikis were usually developed in an ad-hoc manner, without much thought about the underlying concepts. A conceptual model suitable for a semantic wiki that takes advanced features such as annotations and reasoning into account is proposed. Moreover, so called structured tags are proposed as a semi-formal knowledge representation step between informal and formal annotations. The focus of rule languages for the Semantic Web has been predominantly on expert users and on the interplay of rule languages and ontologies. KWRL, the KiWi Rule Language, is proposed as a rule language for a semantic wiki that is easily understandable for users as it is aware of the conceptual model of a wiki and as it is inconsistency-tolerant, and that can be efficiently evaluated as it builds upon Datalog concepts. The requirement for fast response times of interactive software translates in our work to bottom-up evaluation (materialization) of rules (views) ahead of time ā€“ that is when rules or data change, not when they are queried. Materialized views have to be updated when data or rules change. While incremental view maintenance was intensively studied in the past and literature on the subject is abundant, the existing methods have surprisingly many disadvantages ā€“ they do not provide all information desirable for explanation of derived information, they require evaluation of possibly substantially larger Datalog programs with negation, they recompute the whole extension of a predicate even if only a small part of it is affected by a change, they require adaptation for handling general rule changes. A particular contribution of this dissertation consists in a set of forward chaining and reason maintenance methods with a simple declarative description that are efficient and derive and maintain information necessary for reason maintenance and explanation. The reasoning methods and most of the reason maintenance methods are described in terms of a set of extended immediate consequence operators the properties of which are proven in the classical logical programming framework. In contrast to existing methods, the reason maintenance methods in this dissertation work by evaluating the original Datalog program ā€“ they do not introduce negation if it is not present in the input program ā€“ and only the affected part of a predicateā€™s extension is recomputed. Moreover, our methods directly handle changes in both data and rules; a rule change does not need to be handled as a special case. A framework of support graphs, a data structure inspired by justification graphs of classical reason maintenance, is proposed. Support graphs enable a unified description and a formal comparison of the various reasoning and reason maintenance methods and define a notion of a derivation such that the number of derivations of an atom is always finite even in the recursive Datalog case. A practical approach to implementing reasoning, reason maintenance, and explanation in the KiWi semantic platform is also investigated. It is shown how an implementation may benefit from using a graph database instead of or along with a relational database

    An efficient approach for high-fidelity modeling incorporating contour-based sampling and uncertainty

    Get PDF
    During the design process for an aerospace vehicle, decision-makers must have an accurate understanding of how each choice will affect the vehicle and its performance. This understanding is based on experiments and, increasingly often, computer models. In general, as a computer model captures a greater number of phenomena, its results become more accurate for a broader range of problems. This improved accuracy typically comes at the cost of significantly increased computational expense per analysis. Although rapid analysis tools have been developed that are sufficient for many design efforts, those tools may not be accurate enough for revolutionary concepts subject to grueling flight conditions such as transonic or supersonic flight and extreme angles of attack. At such conditions, the simplifying assumptions of the rapid tools no longer hold. Accurate analysis of such concepts would require models that do not make those simplifying assumptions, with the corresponding increases in computational effort per analysis. As computational costs rise, exploration of the design space can become exceedingly expensive. If this expense cannot be reduced, decision-makers would be forced to choose between a thorough exploration of the design space using inaccurate models, or the analysis of a sparse set of options using accurate models. This problem is exacerbated as the number of free parameters increases, limiting the number of trades that can be investigated in a given time. In the face of limited resources, it can become critically important that only the most useful experiments be performed, which raises multiple questions: how can the most useful experiments be identified, and how can experimental results be used in the most effective manner? This research effort focuses on identifying and applying techniques which could address these questions. The demonstration problem for this effort was the modeling of a reusable booster vehicle, which would be subject to a wide range of flight conditions while returning to its launch site after staging. Contour-based sampling, an adaptive sampling technique, seeks cases that will improve the prediction accuracy of surrogate models for particular ranges of the responses of interest. In the case of the reusable booster, contour-based sampling was used to emphasize configurations with small pitching moments; the broad design space included many configurations which produced uncontrollable aerodynamic moments for at least one flight condition. By emphasizing designs that were likely to trim over the entire trajectory, contour-based sampling improves the predictive accuracy of surrogate models for such designs while minimizing the number of analyses required. The simplified models mentioned above, although less accurate for extreme flight conditions, can still be useful for analyzing performance at more common flight conditions. The simplified models may also offer insight into trends in the response behavior. Data from these simplified models can be combined with more accurate results to produce useful surrogate models with better accuracy than the simplified models but at less cost than if only expensive analyses were used. Of the data fusion techniques evaluated, Ghoreyshi cokriging was found to be the most effective for the problem at hand. Lastly, uncertainty present in the data was found to negatively affect predictive accuracy of surrogate models. Most surrogate modeling techniques neglect uncertainty in the data and treat all cases as deterministic. This is plausible, especially for data produced by computer analyses which are assumed to be perfectly repeatable and thus truly deterministic. However, a number of sources of uncertainty, such as solver iteration or surrogate model prediction accuracy, can introduce noise to the data. If these sources of uncertainty could be captured and incorporated when surrogate models are trained, the resulting surrogate models would be less susceptible to that noise and correspondingly have better predictive accuracy. This was accomplished in the present effort by capturing the uncertainty information via nuggets added to the Kriging model. By combining these techniques, surrogate models could be created which exhibited better predictive accuracy while selecting the most informative experiments possible. This significantly reduced the computational effort expended compared to a more standard approach using space-filling samples and data from a single source. The relative contributions of each technique were identified, and observations were made pertaining to the most effective way to apply the separate and combined methods.Ph.D

    Measurement and modeling of advanced coal conversion processes, Volume III

    Full text link

    Fusion of Remote Sensing Images and Social Media Text Messages for Building Function Classification

    Get PDF
    Genaue Daten Ć¼ber GebƤudefunktionen sind fĆ¼r lokale Regierungen wichtig, um Ressourcen planen zu kƶnnen. Satellitenbilder kƶnnten zur Bestimmung dieser Funktionen zu grob aufgelƶst sein. Daher werden in dieser Arbeit individuelle GebƤude aus 42 StƤdten zusƤtzlich mit mehrsprachigen Tweets klassifiziert und mit hochauflƶsenden Luftbildern fusioniert. Dadurch wird eine Genauigkeit von 75% erreicht. Bild- und Textmerkmale scheinen komplementƤr. Deshalb kann die Fusion die Ergebnisse verbessern

    Multimodal Approach for Big Data Analytics and Applications

    Get PDF
    The thesis presents multimodal conceptual frameworks and their applications in improving the robustness and the performance of big data analytics through cross-modal interaction or integration. A joint interpretation of several knowledge renderings such as stream, batch, linguistics, visuals and metadata creates a unified view that can provide a more accurate and holistic approach to data analytics compared to a single standalone knowledge base. Novel approaches in the thesis involve integrating multimodal framework with state-of-the-art computational models for big data, cloud computing, natural language processing, image processing, video processing, and contextual metadata. The integration of these disparate fields has the potential to improve computational tools and techniques dramatically. Thus, the contributions place multimodality at the forefront of big data analytics; the research aims at mapping and under- standing multimodal correspondence between different modalities. The primary contribution of the thesis is the Multimodal Analytics Framework (MAF), a collaborative ensemble framework for stream and batch processing along with cues from multiple input modalities like language, visuals and metadata to combine benefits from both low-latency and high-throughput. The framework is a five-step process: Data ingestion. As a first step towards Big Data analytics, a high velocity, fault-tolerant streaming data acquisition pipeline is proposed through a distributed big data setup, followed by mining and searching patterns in it while data is still in transit. The data ingestion methods are demonstrated using Hadoop ecosystem tools like Kafka and Flume as sample implementations. Decision making on the ingested data to use the best-fit tools and methods. In Big Data Analytics, the primary challenges often remain in processing heterogeneous data pools with a one-method-fits all approach. The research introduces a decision-making system to select the best-fit solutions for the incoming data stream. This is the second step towards building a data processing pipeline presented in the thesis. The decision-making system introduces a Fuzzy Graph-based method to provide real-time and offline decision-making. Lifelong incremental machine learning. In the third step, the thesis describes a Lifelong Learning model at the processing layer of the analytical pipeline, following the data acquisition and decision making at step two for downstream processing. Lifelong learning iteratively increments the training model using a proposed Multi-agent Lambda Architecture (MALA), a collaborative ensemble architecture between the stream and batch data. As part of the proposed MAF, MALA is one of the primary contributions of the research.The work introduces a general-purpose and comprehensive approach in hybrid learning of batch and stream processing to achieve lifelong learning objectives. Improving machine learning results through ensemble learning. As an extension of the Lifelong Learning model, the thesis proposes a boosting based Ensemble method as the fourth step of the framework, improving lifelong learning results by reducing the learning error in each iteration of a streaming window. The strategy is to incrementally boost the learning accuracy on each iterating mini-batch, enabling the model to accumulate knowledge faster. The base learners adapt more quickly in smaller intervals of a sliding window, improving the machine learning accuracy rate by countering the concept drift. Cross-modal integration between text, image, video and metadata for more comprehensive data coverage than a text-only dataset. The final contribution of this thesis is a new multimodal method where three different modalities: text, visuals (image and video) and metadata, are intertwined along with real-time and batch data for more comprehensive input data coverage than text-only data. The model is validated through a detailed case study on the contemporary and relevant topic of the COVID-19 pandemic. While the remainder of the thesis deals with text-only input, the COVID-19 dataset analyzes both textual and visual information in integration. Post completion of this research work, as an extension to the current framework, multimodal machine learning is investigated as a future research direction

    Cyclic Analysis of Laterally Loaded Pile Foundations

    Get PDF
    A three-dimensional numerical analysis, based on the indirect boundary element method, is developed to model the cyclic behaviour of laterally loaded pile foundations embedded in cohesive soils. Phenomena observed in cyclic pile-load tests, such as gapping, backsliding and soil strength degradation effects are accounted for in the analysis. The analysis is capable of solving one-way and two-way cyclic loading problems subjected to load-controlled and displacement-controlled conditions

    Observation of Higgs Boson Decays to WW* with Dilepton and Missing Transverse Momentum Events in the ATLAS Detector

    Get PDF
    This thesis presents a wide range of studies that go from energy reconstruction at the cell level in TileCal performance reconstruction of the E/T measurement in ATLAS, and finishing with the discovery of the Higgs boson decaying into a pair of W bosons. The reconstruction of the energy, as well as the time, in the TileCal cells is provided by the OF algorithm. These measurements are the inputs for object reconstruction algorithms which base their logic on the signal over noise ratio. In this light, the determination of the cell noise is crucial for an accurate event recons- truction in ATLAS. The impact of the TileCal noise constants has been evaluated through the performance of topoclusters since they clearly allow to identify anoma- lies in the TileCal noise. The results show the evidence of a coherent source of noise which is not properly described by the One-Gaussian approach used so far. This effect produces larger and wider structures for topoclusters in the first ATLAS data collected during 2008 and 2009 compared with the expectations from MC. The results motivated a new description of the TileCal noise constants using a Two- Gaussian method instead. The improvement using the Two-Gaussian description reduces the number of large topoclusters by a factor āˆ¼10 in randomly triggered ATLAS events. The better description of the noise reduces the discrepancies between data and simulation as well. During these investigations, there were also found extremely energetic areas in TileCal. These hot spots are mainly originated by cells, probably affected by electronic damage, which poorly reconstruct the energy of the genuine signal. Bad cells producing hot spots were identified and properly treated at the detector operation level in order to ensure the quality of TileCal reconstruc- tion. In addition to the improvements on topoclusters, the Two-Gaussian method also leads to a better missing transverse momentum measurement. The RMS of the spectrum in data is reduced by a factor 2 and the contribution on the tails highly de- creases from 16% to 0.1%. All these results confirm and validate the Two-Gaussian description of the TileCal noise which has been used for the whole Run I ATLAS data reconstruction. The E/T measurement relies on the momentum conservation law in the transverse plane to the beam axis. For a specific process, this measures the unbalanced transverse momentum from all the particles in the final state, so it is sensitive to the presence of undetectable particles, such as neutrinos. The results on the peformance for the different E/T reconstructions developed in ATLAS are crucial as this measurement plays an important role in many analysis searches. The energetic measurements from the particles produced in the LHC collisions are taken from the ATLAS calorimetric system in the E T miss measurement. The E T miss measurement depends on the number of pile-up interactions since their final products may also deposit energy in the calorimeters. These extra energetic contributions are included in the ETmiss computation degrading the genuine measurement. The increasing pile-up environment at the LHC during 2012 motivated investigations on new approaches for improving the E/T reconstruction in ATLAS. Two pile-up suppressed alternatives based on track information and vertex association were developed: E T miss, STVF and E T miss,track, respectively. The former follows the calorimeter-based approach of the E T miss but scaling down the soft term and rejecting pile-up jets. The latter relies on the energy measured from well reconstructed tracks in the inner detector which are associated to the primary vertex of the event. The performance of the several approaches based on the object information to build the unbalanced transverse momentum have been evaluated in terms of resolution, scale and linearity. Results show that besides the better stability against pile-up of the E T miss, STVF and Emiss,track reconstructions, these approaches come with new features. For the E T miss, STVF, the poor modelling of tracks coming from pile-up interactions produces an under-calibrated soft term in MC. This results in discrepancies between data and simulation, specially in events without jets. In events with jets, the E T miss and E T miss,STVF perform very similar because the dominant component is the jet-term. The E T miss,track measurement is very robust against extra interactions since only tracks associated to the vertex of the hardest process are included. However, limited ID coverage and missing high-pT neutral particles lead to large degradation in the E T miss,track linearity and scale, specially in event topologies with high jet activity. Due to the variety on E/T reconstructions and their behaviours depend- ing on the event topology, the optimal measurement may be different based on the characteristics of the physics process to study. The second part of the thesis describes the strategy of the Hā†’ W W (āˆ—)ā†’ lĪ½lĪ½ analysis and reports the results using the complete ATLAS Run I data. This corres- ponds to about 25fbāˆ’1 at āˆšs=7and8TeV collected with the ATLAS detector at the LHC. The Higgs boson decaying into a pair of W bosons benefits from a larger BR compared with other final states for a wide range of the Higgs boson mass. This makes the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis one of the most important channels for the Higgs boson search. However, this analysis suffers from high background contamination, which difficulties the distinction between the Higgs boson signal and other processes that may have the same reconstructed final state. In addition, the analysis is not sensitive to the Higgs boson mass due to the presence of the two neutrinos coming from the W bosons. These two facts define the strategy of the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis. The selection criteria should find an optimal compro- mise to be hard enough for rejecting as many background contributions as possible and, at the same time, soft enough for still keeping the Higgs boson signal. The anal- ysis selects events with exactly two high-pT well reconstructed leptons (electrons or muons)oppositelychargedandwithE/T measurementoriginatedbythefinalneutri- nos. In order to deal with different background contributions the events are divided by the number of jets as well as by the flavour of the two leptons. This separation al- lows to adequate the selection since the background composition is different in each category. In general, final states with same flavour leptons are mostly populated by Z/Ī³āˆ— background while events with different flavoured leptons are mainly originated by top quark processes. For the former, the analysis applies a combined requirement using several E/T reconstructions in order to further suppress Z/Ī³āˆ— contributions, for which non genuine E/T measurement is expected. The latter vetoes jets which are considered as produced by a b quark from reconstruction algorithms. In addition, the division on the jet multiplicity also allows to distinguish Higgs candidates as originated by gluon-gluon fusion (with zero or up to one jet) or vector boson fusion (at least two jets) production mechanisms. This distinction leads to better separate Higgs signal from the remaining backgrounds in each case by exploiting the differ- ences in dilepton kinematics and, when relevant, in jet based magnitudes. After all selection is applied, the transverse mass of the dilepton system and the E/T of the Higgs candidate events is used as final discriminants in a statistical test. Given the importance of simulating all background processes correctly, the analysis builds dif- ferent control regions to check the agreement between data and MC. The differences in the control regions are also inputs to the statistical procedure. This ensures that the likelihood fit includes them properly as associated uncertainties in the final results. The first results of the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis showed an excess of events over the expected background observed for mH = 125 GeV with a signal significance of 3.8 Ļƒ, for which the expectation is 3.7 Ļƒ. The best fit signal strength at that mass is Ī¼ = 1.01 Ā± 0.31. The expected VBF signal significance at mH = 125 GeV is 1.6 Ļƒ and the observation results in 2.5 Ļƒ. The first Hā†’WW(āˆ—)ā†’lĪ½lĪ½ results are consis- tent with the measurements from the H ā†’ Ī³Ī³ and H ā†’ ZZ ā†’ 4l searches. All ATLAS measurements from Higgs decaying into boson pair searches are combined allowing to observe an excess over the expectation with a local significance of 5 Ļƒ. After these first results several studies were focussing on optimising the selection of the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis in order to enhance the sensitivity of the search. The final optimised results mainly benefit from the development of a new E/T reconstruction, represented by the symbols E T miss,track,jetCorr or p T miss. This new reconstruction is based on the E T miss,track approach but replacing tracks by the calorimetric measurements of the objects associated to them and adding jets which are missing in the original E T miss,track computation. Although this may create a higher dependence with pile-up, the new approach still profits from pile-up rejection from the original track-based selection and with a much more accurate measurement in topologies with neutral particles in the final state. The results show that the p T miss is able to recover the resolution in events with jets while still maintains a good stability with pile-up and smaller tails in Z ā†’ ll process. Additional investigations using event topologies with genuine E/T also point to a more reliable measurements of the expected E/T when using the p T miss reconstruction. The strategy for optimising the E/T criteria in the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis is based on simulated final candidate events evaluated through the statistical likelihood fit. Given the composition and contribution of the different backgrounds depend on the final state , theE/T optimisation is evaluated in each analysis category. The different E/T measurements perform very similarly at the end of the event selection for eĪ¼+Ī¼e final states. The low region of the spectrums are almost not populated since the main backgrounds, as well as the Higgs boson signal, are expected to have genuine E/T . In addition, the analysis requirements on mll and p T ll, which are correlated with the E/T measurement, sculpt the E/T shapes at the end of the selection. Hence, there are almost no differences in the expected significance values using any of the E/T reconstructions. However, the pmiss is preferred because of its better performance and resolution. A conservative threshold of 20GeV is used to deal with possible mis-measurements from multi- jets background in H+0j and H+1j analyses. Since the Higgs boson produced via VBF is typically characterised by two emerging quarks, the E/T measurement for the Higgs signal is expected to be smaller than in the ggF production mode. In this light, the VBF strategy does not apply any threshold on the E/T measurement since the low region of the spectrum is mainly populated by signal events. Final states with same flavoured leptons are affected by a huge Z/Ī³āˆ— contribution, so combining several E/T reconstructions achieves further Z/Ī³āˆ— rejection. In this case when the final state contains up to one jet, the requirement is done using the projected E/T,Rel magnitude for the Emiss and E T miss,track measurements. Investigations on the direction of the new pmiss conclude that the rejection power of the original E T miss,track is still higher. This is due to the fact that the latter tends to point to the mismea- sured jets, hence the E/T,Rel computation using E T miss,track benefits to highly reject the Z/Ī³āˆ— contribution. E T miss computed with the E T miss,track still provides the best significance. For the VBF-enriched analysis, however, the E/T,Rel magnitude may be biased because of the probability to randomly project the nominal measurement to any reconstructed jet. This points back to the usage of the p T miss measurement, complemented with a purely calorimeter-based E T miss threshold. The better p T miss performance can be exploited to also benefit other E/T -dependent quantities used in the Hā†’WW(āˆ—)ā†’lĪ½lĪ½, as the mT. Results show a better resolution of the mT measurement obtained by using the p T miss in the computation. The usage of the p T miss in the mT leads to a better separation between the Higgs signal and the remaining backgrounds, specially for multi-jet and non-W W diboson processes. These optimal thresholds using the p T miss measurement increase the expected significance by 7% in the ggF-enriched analysis of the Hā†’ W W (āˆ—)ā†’ lĪ½lĪ½ search. The introduction of the pmiss for the transverse mass computation enhances the expected significance by 9%. For the VBF-enriched search, the overall improvement due to the optimised E/T selection is observed up to 14% in the expected significance results. Finally, the optimisation of the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis has been developed using the complete 8 TeV data sample. The main improvements rely on the introduction of more performant variables as the pmiss, new techniques for background estimation, and extensions of the Higgs signal phase space to enhance the sensitivity of the search. After the whole optimisation, the expected significance of the ggF production mode increases from 2.8 Ļƒ to 4.36 Ļƒ just in eĪ¼+Ī¼e final states. For the VBF production mode of the Higgs boson the overall gain is up to 70% due to the BDT technique applied now for the H+2j category. The last Hā†’WW(āˆ—)ā†’lĪ½lĪ½ results using Run I ATLAS data are reported at mH = 125.36 GeV. There is an excess over background of 6.1 Ļƒ observed for the Hā†’WW(āˆ—)ā†’lĪ½lĪ½ analysis for which the SM expectation is 5.8 Ļƒ. Evidence of the VBF production mode is also obtained with a significance of 3.2 Ļƒ. All the measurements are consistent with SM Higgs boson expectations and state the observation of the Higgs boson decaying to WWāˆ— in ATLAS
    corecore