87 research outputs found
Analysing Compression Techniques for In-Memory Collaborative Filtering
Following the recent trend of in-memory data processing, it is a usual practice to maintain collaborative filtering data in the main memory when generating recommendations in academic and industrial recommender systems.
In this paper, we study the impact of integer compression techniques for in-memory collaborative filtering data in terms of space and time efficiency. Our results provide relevant observations about when and how to compress collaborative filtering data. First, we observe that, depending on the memory constraints, compression techniques may speed up or slow down the performance of state-of-the art collaborative filtering algorithms. Second, after comparing different compression techniques, we find the Frame of Reference (FOR) technique to be the best option in terms of space and time efficiency under different memory constraints
Recommended from our members
Logging versus Soft Updates: Asynchronous Meta-data Protection in File Systems
The UNIX Fast File System (FFS) is probably the most widely-used file system for performance comparisons. However, such comparisons frequently overlook many of the performance enhancements that have been added over the past decade. In this paper, we explore the two most commonly used approaches for improving the performance of meta-data operations and recovery: logging and Soft Updates. The commercial sector has moved en masse to logging file systems, as evidenced by their presence on nearly every server platform available today: Solaris, AIX, Digital UNIX, HP-UX, Irix, and Windows NT. On all but Solaris, the default file system uses logging. In the meantime, Soft Updates holds the promise of providing stronger reliability guarantees than logging, with faster recovery and superior performance in certain boundary cases. In this paper, we explore the benefits of both Soft Updates and logging, comparing their behavior on both microbenchmarks and workload-based macrobenchmarks. We find that logging alone is not sufficient to “solve” the meta-data update problem. If synchronous semantics are required (i.e., meta-data operations are durable once the system call returns), then the logging systems cannot realize their full potential. Only when this synchronicity requirement is relaxed can logging systems approach the performance of systems like Soft Updates. Our asynchronous logging and Soft Updates systems perform comparably in most cases. While Soft Updates excels in some meta-data intensive microbenchmarks, it outperforms logging on only two of the four workloads we examined and performs less well on one.Engineering and Applied Science
Active caching for recommender systems
Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further
TOWARDS ARTIFICIAL NEURAL NETWORK MODEL TO DIAGNOSE THYROID PROBLEMS
Medical diagnosis can be viewed as a pattern classification problem: based a set of input features the goal is to classify a patient as having a particular disorder or as not having it. Thyroid hormone problems are the most prevalent problems nowadays. In this paper an artificial neural network approach is developed using a back propagation algorithm in order to diagnose thyroid problems. It gets a number of factors as input and produces an output which gives the result of whether a person has the problem or is healthy. It is found that back propagation algorithm is proved to be having high sensitivity and specificity
Handling Information Overload on Usenet : Advanced Caching Methods for News
Usenet is the name of a world wide network of servers for group
communication between people. From 1979 and onwards, it has seen a
near exponential growth in the amount of data transported, which has
been a strain on bandwidth and storage. There has been a wide range
of academic research with focus on the WWW, but Usenet has been
neglected. Instead, Usenet's evolution has been dominated by
practical solutions.
This thesis describes the history of Usenet in a growth perspective,
and introduces methods for collection and analysis of statistical
data for testing the usefulness of various caching strategies. A
set of different caching strategies are proposed and examined in
light of bandwidth and storage demands as well as user perceived
performance.
I have shown that advanced caching methods for news offers relief
for reading servers' storage and bandwidth capacity by exploiting
usage patterns for fetching or pre\-fetching articles the users may
want to read, but it will not solve the problem of near exponential
growth nor the problems of Usenet's backbone peers
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Cluster based collaborative filtering with inverted indexing
Cataloged from PDF version of article.Collectively, a population contains vast amounts of knowledge and modern
communication technologies that increase the ease of communication. However,
it is not feasible for a single person to aggregate the knowledge of thousands
or millions of data and extract useful information from it. Collaborative information
systems are attempts to harness the knowledge of a population and to
present it in a simple, fast and fair manner. Collaborative filtering has been successfully
used in domains where the information content is not easily parse-able
and traditional information filtering techniques are difficult to apply. Collaborative
filtering works over a database of ratings for the items which are rated by
users. The computational complexity of these methods grows linearly with the
number of customers which can reach to several millions in typical commercial
applications. To address the scalability concern, we have developed an efficient
collaborative filtering technique by applying user clustering and using a specific
inverted index structure (so called cluster-skipping inverted index structure) that
is tailored for clustered environments. We show that the predictive accuracy
of the system is comparable with the collaborative filtering algorithms without
clustering, whereas the efficiency is far more improved.Subakan, Özlem NurcanM.S
Personalised online sales using web usage data mining
Practically every major company with a retail operation has its own web site and online sales facilities. This paper describes a toolset that exploits web usage data mining techniques to identify customer Internet browsing patterns. These patterns are then used to underpin a personalised product recommendation system for online sales. Within the architecture, a Kohonen neural network or self-organizing map (SOM) has been trained for use both offline, to discover user group profiles, and in real-time to examine active user click stream data, make a match to a specific user group, and recommend a unique set of product browsing options appropriate to an individual user. Our work demonstrates that this approach can overcome the scalability problem that is common among these types of system. Our results also show that a personalised recommender system powered by the SOM predictive model is able to produce consistent recommendations
- …