2,598 research outputs found

    Computer-language based data prefetching techniques

    Get PDF
    Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data. However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead. In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead. The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach. The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché.Postprint (published version

    User-oriented recommender systems in retail

    Get PDF
    User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history

    User-oriented recommender systems in retail

    Get PDF
    User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history

    IMPUTING OR SMOOTHING? MODELLING THE MISSING ONLINE CUSTOMER JOURNEY TRANSITIONS FOR PURCHASE PREDICTION

    Get PDF
    Online customer journeys are at the core of e-commerce systems and it is therefore important to model and understand this online customer behaviour. Clickstream data from online journeys can be modelled using Markov Chains. This study investigates two different approaches to handle missing transition probabilities in constructing Markov Chain models for purchase prediction. Imputing the transition probabilities by using Chapman-Kolmogorov (CK) equation addresses this issue and achieves high prediction accuracy by approximating them with one step ahead probability. However, it comes with the problem of a high computational burden and some probabilities remaining zero after imputation. An alternative approach is to smooth the transition probabilities using Bayesian techniques. This ensures non-zero probabilities but this approach has been criticized for not being as accurate as the CK method, though this has not been fully evaluated in the literature using realistic, commercial data. We compare the accuracy of the purchase prediction of the CK and Bayesian methods, and evaluate them based on commercial web server data from a major European airline

    Web Mining for Web Personalization

    Get PDF
    Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented

    Instruction prefetching techniques for ultra low-power multicore architectures

    Get PDF
    As the gap between processor and memory speeds increases, memory latencies have become a critical bottleneck for computing performance. To reduce this bottleneck, designers have been working on techniques to hide these latencies. On the other hand, design of embedded processors typically targets low cost and low power consumption. Therefore, techniques which can satisfy these constraints are more desirable for embedded domains. While out-of-order execution, aggressive speculation, and complex branch prediction algorithms can help hide the memory access latency in high-performance systems, yet they can cost a heavy power budget and are not suitable for embedded systems. Prefetching is another popular method for hiding the memory access latency, and has been studied very well for high-performance processors. Similarly, for embedded processors with strict power requirements, the application of complex prefetching techniques is greatly limited, and therefore, a low power/energy solution is mostly desired in this context. In this work, we focus on instruction prefetching for ultra-low power processing architectures and aim to reduce energy overhead of this operation by proposing a combination of simple, low-cost, and energy efficient prefetching techniques. We study a wide range of applications from cryptography to computer vision and show that our proposed mechanisms can effectively improve the hit-rate of almost all of them to above 95%, achieving an average performance improvement of more than 2X. Plus, by synthesizing our designs using the state-of-the-art technologies we show that the prefetchers increase system’s power consumption less than 15% and total silicon area by less than 1%. Altogether, a total energy reduction of 1.9X is achieved, thanks to the proposed schemes, enabling a significantly higher battery life

    A Work-Pattern Centric Approach to Building a Personal Knowledge Advantage Machine

    Get PDF
    A work pattern, also known as a usage pattern, can be broadly defined as the methods by which a user typically utilizes a particular system. Data mining has been applied to web usage patterns for a variety of purposes. This thesis presents a framework by which data mining techniques could be used to extract patterns from an individual\u27s work flow data in order facilitate a new type of architecture known as a knowledge advantage machine. This knowledge advantage machine is a type of semantic desktop and semantic web application that would assist people in constructing their own personal knowledge networks, as well as sharing that information in an efficient manner with colleagues using the same system. A knowledge advantage machine would be capable of automatically discovering new knowledge which is relevant to the user\u27s personal ontology.;Through experimentation, we demonstrate that a user\u27s file usage patterns can be utilized by software in order to automatically and seamlessly learn what is important as defined by the user. Further research is necessary to apply this principle to a more realized knowledge advantage machine such that decisions can be fueled by work patterns as well as semantic or contextual information

    Computer-language based data prefetching techniques

    Get PDF
    Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data. However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead. In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead. The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach. The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché

    Web page access prediction using hierarchical clustering based on modified levenshtein distance and higher order Markov model

    Get PDF
    Web Page access prediction is a challenging task in the current scenario, which draws the attention of many researchers. Predictions need to keep track of history data to analyze the usage behavior of the users. Web Usage behavior of a user can be analyzed using the web log file of a specific website. User behavior can be analyzed by observing the navigation patterns. This approach requires user session identification, clustering the sessions into similar clusters and developing a model for prediction using the current and earlier accesses. Most of the previous works in this field have used K-Means clustering technique with Euclidean distance for computation. The drawbacks of K-Means is that deciding on the number of clusters, choosing the initial random center are difficult and the order of page visits are not considered. The proposed research work uses hierarchical clustering technique with modified Levenshtein distance, Page Rank using access time length, frequency and higher order Markov model for prediction. Experimental results prove that the proposed approach for prediction gives better accuracy over the existing techniques

    Modeling usage of an online research community

    Get PDF
    Although online communities have been thought of as a new way for collaboration across geographic boundaries in the scientific world, they have a problem attracting people to keep visiting. The main purpose of this study is to understand how people behave in such communities, and to build and evaluate tools to stimulate engagement in a research community. These tools were designed based on a research framework of factors that influence online participation and relationship development. There are two main objectives for people to join an online community, information sharing and interpersonal relationship development, such as friends or colleagues. The tools designed in this study are to serve both information sharing and interpersonal relationship development needs. The awareness tool is designed to increase the sense of a community and increase the degree of social presence of members in the community. The recommender system is designed to help provide higher quality and personalized information to community members. It also helps to match community members into subgroups based on their interests. The designed tools were implemented in a field site - the Asynchronous Learning Networks (ALN) Research community. A longitudinal field study was used to evaluate the effectiveness of the designed tools. This research explored people\u27s behavior inside a research community by analyzing web server logs. The results show that although there are not many interactions in the community space, the WebCenter has been visited extensively by its members. There are over 2,000 hits per day on average and over 5,000 article accesses during the observation period. This research also provided a framework to identify factors that affect people\u27s engagement in an online community. The research framework was tested using the PLS modeling method with online survey responses. The results show that perceived usefulness performs a very significant role in members\u27 intention to continue using the system and their perceived preliminary networking. The results also show that the quality of the content of the system is a strong indicator for both perceived usefulness of the community space and perceived ease of use of the community system. Perceived ease of use did not show a strong correlation with intention to continue use which was consistent with other studies of Technology Acceptance Model (TAM). For the ALN research community, this online community helps its members to broaden their contacts, improve the quality and quantity of their research, and increase the dissemination of knowledge among community members
    corecore