48,766 research outputs found
Ultra-Scalable Spectral Clustering and Ensemble Clustering
This paper focuses on scalability and robustness of spectral clustering for
extremely large-scale datasets with limited resources. Two novel algorithms are
proposed, namely, ultra-scalable spectral clustering (U-SPEC) and
ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative
selection strategy and a fast approximation method for K-nearest
representatives are proposed for the construction of a sparse affinity
sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the
transfer cut is then utilized to efficiently partition the graph and obtain the
clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated
into an ensemble clustering framework to enhance the robustness of U-SPEC while
maintaining high efficiency. Based on the ensemble generation via multiple
U-SEPC's, a new bipartite graph is constructed between objects and base
clusters and then efficiently partitioned to achieve the consensus clustering
result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time
and space complexity, and are capable of robustly and efficiently partitioning
ten-million-level nonlinearly-separable datasets on a PC with 64GB memory.
Experiments on various large-scale datasets have demonstrated the scalability
and robustness of our algorithms. The MATLAB code and experimental data are
available at https://www.researchgate.net/publication/330760669.Comment: To appear in IEEE Transactions on Knowledge and Data Engineering,
201
On Recommendation of Learning Objects using Felder-Silverman Learning Style Model
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The e-learning recommender system in learning institutions is increasingly becoming the preferred mode of delivery, as it enables learning anytime, anywhere. However, delivering personalised course learning objects based on learner preferences is still a challenge. Current mainstream recommendation algorithms, such as the Collaborative Filtering (CF) and Content-Based Filtering (CBF), deal with only two types of entities, namely users and items with their ratings. However, these methods do not pay attention to student preferences, such as learning styles, which are especially important for the accuracy of course learning objects prediction or recommendation. Moreover, several recommendation techniques experience cold-start and rating sparsity problems. To address the challenge of improving the quality of recommender systems, in this paper a novel recommender algorithm for machine learning is proposed, which combines students actual rating with their learning styles to recommend Top-N course learning objects (LOs). Various recommendation techniques are considered in an experimental study investigating the best technique to use in predicting student ratings for e-learning recommender systems. We use the Felder-Silverman Learning Styles Model (FSLSM) to represent both the student learning styles and the learning object profiles. The predicted rating has been compared with the actual student rating. This approach has been experimented on 80 students for an online course created in the MOODLE Learning Management System, while the evaluation of the experiments has been performed with the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The results of the experiment verify that the proposed approach provides a higher prediction rating and significantly increases the accuracy of the recommendation
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
- …