1,457 research outputs found

    An Optimal Scaling Approach to Collaborative Filtering using Categorical Principal Component Analysis and Neighborhood Formation

    Get PDF
    Abstract. Collaborative Filtering (CF) is a popular technique employed by Recommender Systems, a term used to describe intelligent methods that generate personalized recommendations. The most common and accurate approaches to CF are based on latent factor models. Latent factor models can tackle two fundamental problems of CF, data sparsity and scalability and have received considerable attention in recent literature. In this work, we present an optimal scaling approach to address both of these problems using Categorical Principal Component Analysis for the low-rank approximation of the user-item ratings matrix, followed by a neighborhood formation step. The optimal scaling approach has the advantage that it can be easily extended to the case when there are missing data and restrictions for ordinal and numerical variables can be easily imposed. We considered different measurement levels for the user ratings on items, starting with a multiple nominal and consecutively applying nominal, ordinal and numeric levels. Experiments were executed on the MovieLens dataset, aiming to evaluate the aforementioned options in terms of accuracy. Results indicated that a combined approach (multiple nominal measurement level, "passive" missing data strategy) clearly outperformed the other tested options

    A Comparative Study of Dimensionality Reduction Techniques to Enhance Trace Clustering Performances

    Get PDF
    Technology Management/ Information System/ EntrepreneurshipProcess mining aims at extracting useful information from event logs. Recently, in order to improve processes, several organizations such as high-tech companies, hospitals, and municipalities utilize process mining techniques. Real-life process logs from such organizations are usually very large and complicated, since the process logs in general contain numerous activities which are executed by many employees. Furthermore, lots of real-life process logs generate spaghetti-like process models due to the complexity of processes. Traditional process mining techniques have problems with discovering and analyzing real-life process logs which come from less structured processes. To overcome the weaknesses of traditional process mining techniques, a trace clustering has been developed. The trace clustering splits an event log into several subsets, and each subset contains homogenous cases. Even though the trace clustering is useful to handle complex process logs, it is time-consuming and computationally expensive due to a large number of features generated from complex logs. In this thesis, we applied dimensionality reduction (preprocessing) techniques to the trace clustering in order to reduce the number of features. To validate our approach, we conducted experiments to discover relationships between dimensionality reduction techniques and clustering algorithms, and we performed a case study which involves patient treatment processes of a hospital. Among many dimensionality reduction techniques, we used three techniques namely singular value decomposition (SVD), random projection, and principal components analysis (PCA). The result shows that the trace clustering with dimensionality reduction techniques produce higher average fitness values. Furthermore, processing time of trace clustering is effectively reduced with dimensionality reduction techniques. Moreover, we measured similarity between clustering results to observe the degree of changes in clustering results while applying dimensionality reduction techniques. The similarity is resulted differently according to used clustering algorithm.ope

    Nowcasting for a high-resolution weather radar network

    Get PDF
    2010 Fall.Includes bibliographical references.Short-term prediction (nowcasting) of high-impact weather events can lead to significant improvement in warnings and advisories and is of great practical importance. Nowcasting using weather radar reflectivity data has been shown to be particularly useful. The Collaborative Adaptive Sensing of the Atmosphere (CASA) radar network provides high-resolution reflectivity data amenable to producing valuable nowcasts. The high-resolution nature of CASA data requires the use of an efficient nowcasting approach, which necessitated the development of the Dynamic Adaptive Radar Tracking of Storms (DARTS) and sinc kernel-based advection nowcasting methodology. This methodology was implemented operationally in the CASA Distributed Collaborative Adaptive Sensing (DCAS) system in a robust and efficient manner necessitated by the high-resolution nature of CASA data and distributed nature of the environment in which the nowcasting system operates. Nowcasts up to 10 min to support emergency manager decision-making and 1-5 min to steer the CASA radar nodes to better observe the advecting storm patterns for forecasters and researchers are currently provided by this system. Results of nowcasting performance during the 2009 CASA IP experiment are presented. Additionally, currently state-of-the-art scale-based filtering methods were adapted and evaluated for use in the CASA DCAS to provide a scale-based analysis of nowcasting. DARTS was also incorporated in the Weather Support to Deicing Decision Making system to provide more accurate and efficient snow water equivalent nowcasts for aircraft deicing decision support relative to the radar-based nowcasting method currently used in the operational system. Results of an evaluation using data collected from 2007-2008 by the Weather Service Radar-1988 Doppler (WSR-88D) located near Denver, Colorado, and the National Center for Atmospheric Research Marshall Test Site near Boulder, Colorado, are presented. DARTS was also used to study the short-term predictability of precipitation patterns depicted by high-resolution reflectivity data observed at microalpha (0.2-2 km) to mesobeta (20-200 km) scales by the CASA radar network. Additionally, DARTS was used to investigate the performance of nowcasting rainfall fields derived from specific differential phase estimates, which have been shown to provide more accurate and robust rainfall estimates compared to those made from radar reflectivity data

    Cross domain recommender systems using matrix and tensor factorization

    Get PDF
    Today, the amount and importance of available data on the internet are growing exponentially. These digital data has become a primary source of information and the peopleā€™s life bonded to them tightly. The data comes in diverse shapes and from various resources and users utilize them in almost all their personal or social activities. However, selecting a desirable option from the huge list of available options can be really frustrating and time-consuming. Recommender systems aim to ease this process by finding the proper items which are more likely to be interested by users. Undoubtedly, there is not even one social media or online service which can continue itsā€™ work properly without using recommender systems. On the other hand, almost all available recommendation techniques suffer from some common issues: the data sparsity, the cold-start, and the new-user problems. This thesis tackles the mentioned problems using different methods. While, most of the recommender methods rely on using single domain information, in this thesis, the main focus is on using multi-domain information to create cross-domain recommender systems. A cross-domain recommender system is not only able to handle the cold-start and new-user situations much better, but it also helps to incorporate different features exposed in diverse domains together and capture a better understanding of the usersā€™ preferences which means producing more accurate recommendations. In this thesis, a pre-clustering stage is proposed to reduce the data sparsity as well. Various cross-domain knowledge-based recommender systems are suggested to recommend items in two popular social media, the Twitter and LinkedIn, by using different information available in both domains. The state of art techniques in this field, namely matrix factorization and tensor decomposition, are implemented to develop cross-domain recommender systems. The presented recommender systems based on the coupled nonnegative matrix factorization and PARAFAC-style tensor decomposition are evaluated using real-world datasets and it is shown that they superior to the baseline matrix factorization collaborative filtering. In addition, network analysis is performed on the extracted data from Twitter and LinkedIn

    Network-guided data integration and gene prioritization

    Get PDF

    Exploratory Cluster Analysis from Ubiquitous Data Streams using Self-Organizing Maps

    Get PDF
    This thesis addresses the use of Self-Organizing Maps (SOM) for exploratory cluster analysis over ubiquitous data streams, where two complementary problems arise: first, to generate (local) SOM models over potentially unbounded multi-dimensional non-stationary data streams; second, to extrapolate these capabilities to ubiquitous environments. Towards this problematic, original contributions are made in terms of algorithms and methodologies. Two different methods are proposed regarding the first problem. By focusing on visual knowledge discovery, these methods fill an existing gap in the panorama of current methods for cluster analysis over data streams. Moreover, the original SOM capabilities in performing both clustering of observations and features are transposed to data streams, characterizing these contributions as versatile compared to existing methods, which target an individual clustering problem. Also, additional methodologies that tackle the ubiquitous aspect of data streams are proposed in respect to the second problem, allowing distributed and collaborative learning strategies. Experimental evaluations attest the effectiveness of the proposed methods and realworld applications are exemplified, namely regarding electric consumption data, air quality monitoring networks and financial data, motivating their practical use. This research study is the first to clearly address the use of the SOM towards ubiquitous data streams and opens several other research opportunities in the future

    Next Generation of Product Search and Discovery

    Get PDF
    Online shopping has become an important part of peopleā€™s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumersā€™ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize usersā€™ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users. This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the userā€™s overhead in locating the information of value is reduced, and the userā€™s experience of seeking for useful product information is optimized
    • ā€¦
    corecore