Search CORE

78 research outputs found

DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks

Author: Ghadiri Nasser
Maleki Erfan Farhangi
Maleki Zeinab
Shahreza Maryam Lotfi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo

arXiv.org e-Print Archive

Western Sydney ResearchDirect

Towards platforms for improved recommender systems at social media scale

Author: Sowinski Christina Diedhiou
Publication venue
Publication date: 01/09/2019
Field of study

Portsmouth University Research Portal (Pure)

Scalable Topic Detection Approaches fromTwitter Streams

Author: Ibrahim Rania
Publication venue: 'University of Waterloo'
Publication date: 29/04/2016
Field of study

Real time topic detection in Twitter streams is an important task that helps discovering natural disasters in a real time from users’ posts and helps political parties and companies understand users’ opinions and needs. In 2014 the number of active users on Twitter is reported to be more than 288 million users who are posting around 500 million tweets daily. Therefore, detecting topics from Twitter streams in a real time becomes a challenging task that needs scalable and efficient techniques to handle this large amount of data. In this work, we scale an Exemplar-based technique that detects topics from Twitter streams, where each of the detected topics is represented by one tweet (i.e, exemplar). Using exemplar tweets to represent the detected topics, makes these topics easier to interpret as opposed to representing them by uncorrelated terms as in other topic detection algorithms. The approach is implemented using Apache Giraph and is being extended here to efficiently support sliding windows. Experimental results on four datasets show that the optimized Giraph implementation achieves a speedup of up to nineteen times over the native implementation, while maintaining good quality of the detected topics. In addition, Giraph Exemplar-based approach achieves the best topic recall and term precision against K-means, Latent Dirichlet Allocation (LDA), Non-negative matrix factorization (NMF) and Latent Semantic Analysis (LSA), while maintaining a good term recall and running time. The approach is also deployed for detecting topics from real-time Twitter streams and its scalability is demonstrated. Moreover, another clustering technique called Local Variance-based Clustering (LVC) is proposed in this thesis for detecting topics from Twitter streams. Local Variance-based Clustering (LVC) defines the data points densities based on their similarities. The proposed local variance measure is calculated based on the variance of the data points similarity histogram and is shown to well distinguish between core, border, connecting and outliers points. Experimental results show that LVC outperforms spectral clustering and affinity propagation in clustering quality using control charts, Ecoli and images datasets, while maintaining a good running time. In addition, results show that LVC can detect topics from Twitter with higher topic recall by 15% and higher term precision by 3% over DBSCAN

University of Waterloo's Institutional Repository

Stress-testing clouds for big data applications

Author: Sutii A.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

ADDRESSING GEOGRAPHICAL CHALLENGES IN THE BIG DATA ERA UTILIZING CLOUD COMPUTING

Author: Lan Hai
Publication venue
Publication date: 01/01/2020
Field of study

Processing, mining and analyzing big data adds significant value towards solving previously unverified research questions or improving our ability to understand problems in geographical sciences. This dissertation contributes to developing a solution that supports researchers who may not otherwise have access to traditional high-performance computing resources so they benefit from the “big data” era, and implement big geographical research in ways that have not been previously possible. Using approaches from the fields of geographic information science, remote sensing and computer science, this dissertation addresses three major challenges in big geographical research: 1) how to exploit cloud computing to implement a universal scalable solution to classify multi-sourced remotely sensed imagery datasets with high efficiency; 2) how to overcome the missing data issue in land use land cover studies with a high-performance framework on the cloud through the use of available auxiliary datasets; and 3) the design considerations underlying a universal massive scale voxel geographical simulation model to implement complex geographical systems simulation using a three dimensional spatial perspective. This dissertation implements an in-memory distributed remotely sensed imagery classification framework on the cloud using both unsupervised and supervised classifiers, and classifies remotely sensed imagery datasets of the Suez Canal area, Egypt and Inner Mongolia, China under different cloud environments. This dissertation also implements and tests a cloud-based gap filling model with eleven auxiliary datasets in biophysical and social-economics in Inner Mongolia, China. This research also extends a voxel-based Cellular Automata model using graph theory and develops this model as a massive scale voxel geographical simulation framework to simulate dynamic processes, such as air pollution particles dispersal on cloud

Digital Repository at the University of Maryland