2,611 research outputs found

    Colombus: providing personalized recommendations for drifting user interests

    Get PDF
    The query formulationg process if often a problematic activity due to the cognitive load that it imposes to users. This issue is further ampliļ¬ed by the uncertainty of searchers with regards to their searching needs and their lack of training on eļ¬€ective searching techniques. Also, given the tremendous growth of the world wide web, the amount of imformation users ļ¬nd during their daily search episodes is often overwhelming. Unfortunatelly, web search engines do not follow the trends and advancements in this area, while real personalization features have yet to appear. As a result, keeping up-to-date with recent information about our personal interests is a time-consuming task. Also, often these information requirements change by sliding into new topics. In this case, the rate of change can be sudden and abrupt, or more gradual. Taking into account all these aspects, we believe that an information assistant, a proļ¬le-aware tool capable of adapting to usersā€™ evolving needs and aiding them to keep track of their personal data, can greatly help them in this endeavor. Information gathering from a combination of explicit and implicit feedback could allow such systems to detect their search requirements and present additional information, with the least possible eļ¬€ort from them. In this paper, we describe the design, development and evaluation of Colombus, a system aiming to meet individual needs of the searchers. The systemā€™s goal is to pro-actively fetch and present relevant, high quality documents on regular basis. Based entirely on implicit feedback gathering, our system concentrates on detecting drifts in user interests and accomodate them eļ¬€ectively in their proļ¬les with no additional interaction from their side. Current methodologies in information retrieval do not support the evaluation of such systems and techniques. Lab-based experiments can be carried out in large batches but their accuracy often questione. On the other hand, user studies are much more accurate, but setting up a user base for large-scale experiments is often not feasible. We have designed a hybrid evaluation methodology that combines large sets of lab experiments based on searcher simulations together with user experiments, where ļ¬fteen searchers used the system regularly for 15 days. At the ļ¬rst stage, the simulation experiments were aiming attuning Colombus, while the various component evaluation and results gathering was carried out at the second stage, throughout the user study. A baseline system was also employed in order to make a direct comparison of Colombus against a current web search engine. The evaluation results illustrate that the Personalized Information Assistant is eļ¬€ective in capturing and satisfying usersā€™ evolving information needs and providing additional information on their behalf

    Performance Envelopes of Adaptive Ensemble Data Stream Classifiers

    Get PDF
    This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as ā€œconcept drift,ā€ often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities of each in terms of accuracy and throughput. For each\u3c run of the experiment, the percentage of correct classifications was measured using prequential analysis, a well-established methodology in the evaluation of streaming classifiers. Throughput was measured in classifications performed per second as timed by the CPU clock. Two main experimental variables were manipulated to investigate and compare the range of accuracy and throughput exhibited by each algorithm under various conditions. The number of attributes in the instances to be classified and the speed at which the definitions of labeled data drifted were varied across six total combinations of drift-speed and dimensionality. The implications of results are used to recommend improved methods for working with stream-based data sources. The typical approach to counteract concept drift is to update the classification models with new data. In the stream paradigm, classifiers are continuously exposed to new data that may serve as representative examples of the current situation. However, updating the ensemble classifier in order to maintain or improve accuracy can be computationally costly and will negatively impact throughput. In a real-time system, this could lead to an unacceptable slow-down. The results of this research showed that,among several algorithms for reducing the effect of concept drift, adaptive decision trees maintained the highest accuracy without slowing down with respect to the no-drift condition. Adaptive ensemble techniques were also able to maintain reasonable accuracy in the presence of drift without much change in the throughput. However, the overall throughput of the adaptive methods is low and may be unacceptable for extremely time-sensitive applications. The performance visualization methodology utilized in this study gives a clear and intuitive visual summary that allows system designers to evaluate candidate algorithms with respect to their performance needs

    Using contextual information to understand searching and browsing behavior

    Get PDF
    There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications

    A Hierarchical Temporal Memory Sequence Classifier for Streaming Data

    Get PDF
    Real-world data streams often contain concept drift and noise. Additionally, it is often the case that due to their very nature, these real-world data streams also include temporal dependencies between data. Classifying data streams with one or more of these characteristics is exceptionally challenging. Classification of data within data streams is currently the primary focus of research efforts in many fields (i.e., intrusion detection, data mining, machine learning). Hierarchical Temporal Memory (HTM) is a type of sequence memory that exhibits some of the predictive and anomaly detection properties of the neocortex. HTM algorithms conduct training through exposure to a stream of sensory data and are thus suited for continuous online learning. This research developed an HTM sequence classifier aimed at classifying streaming data, which contained concept drift, noise, and temporal dependencies. The HTM sequence classifier was fed both artificial and real-world data streams and evaluated using the prequential evaluation method. Cost measures for accuracy, CPU-time, and RAM usage were calculated for each data stream and compared against a variety of modern classifiers (e.g., Accuracy Weighted Ensemble, Adaptive Random Forest, Dynamic Weighted Majority, Leverage Bagging, Online Boosting ensemble, and Very Fast Decision Tree). The HTM sequence classifier performed well when the data streams contained concept drift, noise, and temporal dependencies, but was not the most suitable classifier of those compared against when provided data streams did not include temporal dependencies. Finally, this research explored the suitability of the HTM sequence classifier for detecting stalling code within evasive malware. The results were promising as they showed the HTM sequence classifier capable of predicting coding sequences of an executable file by learning the sequence patterns of the x86 EFLAGs register. The HTM classifier plotted these predictions in a cardiogram-like graph for quick analysis by reverse engineers of malware. This research highlights the potential of HTM technology for application in online classification problems and the detection of evasive malware

    A Survey on Concept Drift Adaptation

    Get PDF
    Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

    Performance-based Seismic Design of Multi-Story Light-frame Wood Buildings using Adaptive Displacement-based Design Procedure

    Get PDF
    Light-frame wood construction is one of the most common types of construction in North America, particularly for low-rise residential dwellings and apartment buildings. Light-frame wood buildings were found to perform well during recent earthquakes. However, past earthquake events also revealed a common deficiency in many light-frame wood buildings, namely soft first-story damage, and, in some extreme cases, pancake collapse. Many buildings have a soft first-story because of an open-space floor plan used for retail or parking with minimal partition walls while the upper stories are apartment units. Typically, partition walls are considered as non-structural elements, however, they add strength to the overall lateral load resisting system. When both the structural elements (prescribed by engineers) and non-structural elements (partition walls sheathed with gypsums) are considered, vertical irregularities in strength and stiffness often occur in buildings with open floor plan in the first story. The current force-based design procedure, namely the Equivalent Lateral Force (ELF) procedure, does not explicitly consider the contribution of non-structural elements. This research (1) studied soft-story deficiency in light-frame wood buildings due to unintended stiffness and strength contributions from non-structural elements and (2) developed a strategy through the use of an adaptive displacement-based design (ADD) method in which the demand (required story shears) of the as-designed building is revised continually as the design progresses from one story to another. Nonlinear time history and incremental dynamic analyses were performed for the as-designed buildings using both ELF and ADD methods. The seismic performance in terms of (1) collapse probability at the Risk-targeted Maximum Considered Earthquake (MCER) level, and (2) peak median story drift ratios at various hazard levels were used to evaluate the overall performance of a soft-story building designed using both the ELF and the ADD procedure. It was observed that for a building designed using the ELF procedure, the collapse probability increased on the inclusion of non-structural elements in the model, signaling the detrimental effects of non-structural elements due to the inability of the ELF procedure to quantify the contribution of these elements. In contrast, the ADD procedure took into account the contribution of these elements and was able to provide a structural design for which the collapse probability actually decreased on the inclusion of nonstructural elements. In addition, a parametric study was carried out to compare the differences in MCER collapse probabilities obtained using a 3D building model with biaxial ground motions and an equivalent 2D building model with uniaxial ground motion. The result of this parametric study was a factor that can be used to relate the MCER collapse probabilities between the 3D and 2D models, referred to as the 3D factor. The study confirmed that if the collapse results from both directions were used in calculating the overall collapse probability for a 2D building model, the 3D factor is 1.2 whether the building is designed for equal strengths or unequal strengths in its two lateral directions

    Dynamic user profiles for web personalisation

    Get PDF
    Web personalisation systems are used to enhance the user experience by providing tailor-made services based on the userā€™s interests and preferences which are typically stored in user profiles. For such systems to remain effective, the profiles need to be able to adapt and reflect the usersā€™ changing behaviour. In this paper, we introduce a set of methods designed to capture and track user interests and maintain dynamic user profiles within a personalisation system. User interests are represented as ontological concepts which are constructed by mapping web pages visited by a user to a reference ontology and are subsequently used to learn short-term and long-term interests. A multi-agent system facilitates and coordinates the capture, storage, management and adaptation of user interests. We propose a search system that utilises our dynamic user profile to provide a personalised search experience. We present a series of experiments that show how our system can effectively model a dynamic user profile and is capable of learning and adapting to different user browsing behaviours
    • ā€¦
    corecore