Search CORE

87 research outputs found

Analysing Compression Techniques for In-Memory Collaborative Filtering

Author: Macdonald Craig
Ounis Iadh
Vargas Saul
Publication venue
Publication date: 01/01/2015
Field of study

Following the recent trend of in-memory data processing, it is a usual practice to maintain collaborative filtering data in the main memory when generating recommendations in academic and industrial recommender systems. In this paper, we study the impact of integer compression techniques for in-memory collaborative filtering data in terms of space and time efficiency. Our results provide relevant observations about when and how to compress collaborative filtering data. First, we observe that, depending on the memory constraints, compression techniques may speed up or slow down the performance of state-of-the art collaborative filtering algorithms. Second, after comparing different compression techniques, we find the Frame of Reference (FOR) technique to be the best option in terms of space and time efficiency under different memory constraints

Enlighten

Recommended from our members

Logging versus Soft Updates: Asynchronous Meta-data Protection in File Systems

Author: Granger Gregory R.
McKusick M. Kirk
Seltzer Margo I.
Smith Keith A.
Soules Craig A. N.
Stein Christopher A.
Publication venue
Publication date: 21/01/2016
Field of study

The UNIX Fast File System (FFS) is probably the most widely-used file system for performance comparisons. However, such comparisons frequently overlook many of the performance enhancements that have been added over the past decade. In this paper, we explore the two most commonly used approaches for improving the performance of meta-data operations and recovery: logging and Soft Updates. The commercial sector has moved en masse to logging file systems, as evidenced by their presence on nearly every server platform available today: Solaris, AIX, Digital UNIX, HP-UX, Irix, and Windows NT. On all but Solaris, the default file system uses logging. In the meantime, Soft Updates holds the promise of providing stronger reliability guarantees than logging, with faster recovery and superior performance in certain boundary cases. In this paper, we explore the benefits of both Soft Updates and logging, comparing their behavior on both microbenchmarks and workload-based macrobenchmarks. We find that logging alone is not sufficient to “solve” the meta-data update problem. If synchronous semantics are required (i.e., meta-data operations are durable once the system call returns), then the logging systems cannot realize their full potential. Only when this synchronicity requirement is relaxed can logging systems approach the performance of systems like Soft Updates. Our asynchronous logging and Soft Updates systems perform comparably in most cases. While Soft Updates excels in some meta-data intensive microbenchmarks, it outperforms logging on only two of the four workloads we examined and performs less well on one.Engineering and Applied Science

Harvard University - DASH

Active caching for recommender systems

Author: Qasim Muhammad Umar
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2011
Field of study

Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further

Digital Commons @ New Jersey Institute of Technology (NJIT)

TOWARDS ARTIFICIAL NEURAL NETWORK MODEL TO DIAGNOSE THYROID PROBLEMS

Author: Dr. V.Sarasvathi
Publication venue: Global Journals Inc. (US)
Publication date: 09/03/2011
Field of study

Medical diagnosis can be viewed as a pattern classification problem: based a set of input features the goal is to classify a patient as having a particular disorder or as not having it. Thyroid hormone problems are the most prevalent problems nowadays. In this paper an artificial neural network approach is developed using a back propagation algorithm in order to diagnose thyroid problems. It gets a number of factors as input and produces an output which gives the result of whether a person has the problem or is healthy. It is found that back propagation algorithm is proved to be having high sensitivity and specificity

Global Journal of Computer Science and Technology (GJCST)

Handling Information Overload on Usenet : Advanced Caching Methods for News

Author: Ingvoldstad Jan
Publication venue
Publication date: 01/01/2001
Field of study

Usenet is the name of a world wide network of servers for group communication between people. From 1979 and onwards, it has seen a near exponential growth in the amount of data transported, which has been a strain on bandwidth and storage. There has been a wide range of academic research with focus on the WWW, but Usenet has been neglected. Instead, Usenet's evolution has been dominated by practical solutions. This thesis describes the history of Usenet in a growth perspective, and introduces methods for collection and analysis of statistical data for testing the usefulness of various caching strategies. A set of different caching strategies are proposed and examined in light of bandwidth and storage demands as well as user perceived performance. I have shown that advanced caching methods for news offers relief for reading servers' storage and bandwidth capacity by exploiting usage patterns for fetching or pre\-fetching articles the users may want to read, but it will not solve the problem of near exponential growth nor the problems of Usenet's backbone peers

NORA - Norwegian Open Research Archives

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Cluster based collaborative filtering with inverted indexing

Author: Subakan Özlem Nurcan
Publication venue: Bilkent University
Publication date: 01/01/2005
Field of study

Cataloged from PDF version of article.Collectively, a population contains vast amounts of knowledge and modern communication technologies that increase the ease of communication. However, it is not feasible for a single person to aggregate the knowledge of thousands or millions of data and extract useful information from it. Collaborative information systems are attempts to harness the knowledge of a population and to present it in a simple, fast and fair manner. Collaborative filtering has been successfully used in domains where the information content is not easily parse-able and traditional information filtering techniques are difficult to apply. Collaborative filtering works over a database of ratings for the items which are rated by users. The computational complexity of these methods grows linearly with the number of customers which can reach to several millions in typical commercial applications. To address the scalability concern, we have developed an efficient collaborative filtering technique by applying user clustering and using a specific inverted index structure (so called cluster-skipping inverted index structure) that is tailored for clustered environments. We show that the predictive accuracy of the system is comparable with the collaborative filtering algorithms without clustering, whereas the efficiency is far more improved.Subakan, Özlem NurcanM.S

Bilkent University Institutional Repository

Personalised online sales using web usage data mining

Author: Jennifer Harding (1258389)
John Edwards (1248534)
Xuejun Zhang (68138)
Publication venue
Publication date: 01/01/2007
Field of study

Practically every major company with a retail operation has its own web site and online sales facilities. This paper describes a toolset that exploits web usage data mining techniques to identify customer Internet browsing patterns. These patterns are then used to underpin a personalised product recommendation system for online sales. Within the architecture, a Kohonen neural network or self-organizing map (SOM) has been trained for use both offline, to discover user group profiles, and in real-time to examine active user click stream data, make a match to a specific user group, and recommend a unique set of product browsing options appropriate to an individual user. Our work demonstrates that this approach can overcome the scalability problem that is common among these types of system. Our results also show that a personalised recommender system powered by the SOM predictive model is able to produce consistent recommendations

Loughborough University Institutional Repository