29 research outputs found

    Temporal workload-aware replicated partitioning for social networks

    Get PDF
    Most frequent and expensive queries in social networks involve multi-user operations such as requesting the latest tweets or news-feeds of friends. The performance of such queries are heavily dependent on the data partitioning and replication methodologies adopted by the underlying systems. Existing solutions for data distribution in these systems involve hashor graph-based approaches that ignore the multi-way relations among data. In this work, we propose a novel data partitioning and selective replication method that utilizes the temporal information in prior workloads to predict future query patterns. Our method utilizes the social network structure and the temporality of the interactions among its users to construct a hypergraph that correctly models multi-user operations. It then performs simultaneous partitioning and replication of this hypergraph to reduce the query span while respecting load balance and I/O load constraints under replication. To test our model, we enhance the Cassandra NoSQL system to support selective replication and we implement a social network application (a Twitter clone) utilizing our enhanced Cassandra. We conduct experiments on a cloud computing environment (Amazon EC2) to test the developed systems. Comparison of the proposed method with hash- and enhanced graph-based schemes indicate that it significantly improves latency and throughput

    Detecting and date-stamping bubbles in fan tokens

    No full text
    We focus on the existence of bubbles in fan tokens, utilizing the Supremum Augmented Dickey-Fuller (SADF) and Generalized Supremum Augmented Dickey-Fuller (GSADF) tests. We use daily closing prices of the top 20 fan tokens according to their market capitalization, along with Bitcoin, Ethereum, and Chiliz. The evidence from the GSADF test results indicates that the prices of 13 out of 20 fan tokens and the three cryptocurrencies have explosive periods associated with bubbles. Our results also show that the percentage of bubble days is between 0 % and 5% for all fan tokens. Among the 13 fan tokens exhibiting bubble behavior in their prices, nine have multiple sub-periods associated with bubbles, while only four tokens have a single sub-period with explosive prices. Bubbles in token prices are short-lived bubbles; most last for a few days. As a robustness analysis, we also perform LPPLS (Log-Periodic Power Law Singularity), providing similar results. Further analysis shows that trading volume, fan token return, Economic Policy Uncertainty (EPU), Daily Infectious Disease Equity Market Volatility (EMVID) are positively associated with the presence of bubbles in fan token prices, while oil return is negatively associated with bubbles

    Connectedness among fan tokens and stocks of football clubs

    No full text
    This paper examines the dynamic connectedness among the fan tokens and their corresponding stocks using the TVP-VAR approach. We use daily data from December 11, 2020, to January 31, 2022, for the Juventus FC, AS Roma, Galatasaray, and Trabzonspor tokens and stocks. Our results indicate that shocks transmitted to any token are larger than the ones to the stocks, with the tokens being the net transmitters of shocks to both the tokens and stocks. Then, our results indicate that the two asset classes are considered independent of each other, with the total connectedness decreasing over time, and indicating that less than 10% of the contributions in any token (stock) is from the stocks (remaining stocks). This implies that the idiosyncratic contri-butions to the variations in the utilized group of assets are considerably low when compared to the system contributions. Finally, we provide some implications for investment and portfolio management

    Locality-aware and load-balanced static task scheduling for MapReduce

    No full text
    Task scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality and balance the processors’ loads in both map and reduce phases. Our approach is based on graph and hypergraph models which correctly encode the interactions between map and reduce tasks. Partitions produced by these models are decoded to schedule map and reduce tasks. A two-constraint formulation utilized in these models enables balancing processors’ loads in both map and reduce phases. The partitioning objective in the hypergraph models correctly encapsulates the minimization of data transfer when a local combine step is performed prior to shuffle, whereas the partitioning objective in the graph models achieve the same feat when a local combine is not performed. We show the validity of our scheduling on the MapReduce parallelizations of two important kernel operations – sparse matrix–vector multiplication (SpMV) and generalized sparse matrix–matrix multiplication (SpGEMM) – that are widely encountered in big data analytics and scientific computations. Compared to random scheduling, our models lead to tremendous savings in data transfer by reducing data traffic from several hundreds of megabytes to just a few megabytes in the shuffle phase and consequently leading up to 2.6x and 4.2x speedup for SpMV and SpGEMM, respectively.Research Council of Turkey (TUBITAK

    Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization

    No full text
    Background and Objectives: Recently, many studies have focused on the early diagnosis of coronary artery disease (CAD), which is one of the leading causes of cardiac-associated death worldwide. The effectiveness of the most important features influencing disease diagnosis determines the performance of machine learning systems that can allow for timely and accurate treatment. We performed a Hybrid ML framework based on hard ensemble voting optimization (HEVO) to classify patients with CAD using the Z-Alizadeh Sani dataset. All categorical features were converted to numerical forms, the synthetic minority oversampling technique (SMOTE) was employed to overcome imbalanced distribution between two classes in the dataset, and then, recursive feature elimination (RFE) with random forest (RF) was used to obtain the best subset of features. Materials and Methods: After solving the biased distribution in the CAD data set using the SMOTE method and finding the high correlation features that affected the classification of CAD patients. The performance of the proposed model was evaluated using grid search optimization, and the best hyperparameters were identified for developing four applications, namely, RF, AdaBoost, gradient-boosting, and extra trees based on an HEV classifier. Results: Five fold cross-validation experiments with the HEV classifier showed excellent prediction performance results with the 10 best balanced features obtained using SMOTE and feature selection. All evaluation metrics results reached > 98% with the HEV classifier, and the gradient-boosting model was the second best classification model with accuracy = 97% and F1-score = 98%. Conclusions: When compared to modern methods, the proposed method perform well in diagnosing coronary artery disease, and therefore, the proposed method can be used by medical personnel for supplementary therapy for timely, accurate, and efficient identification of CAD cases in suspected patients

    Temporal Workload-Aware Replicated Partitioning for Social Networks

    No full text

    SMOP: A semantic web and service driven information gathering environment for mobile platforms

    No full text
    On the Move Confederated International Conference on CoopIS/DOA/GADA, and ODBASE -- OCT 29-NOV 03, 2006 -- Montpellier, FRANCEWOS: 000243131600058In this paper, we introduce a mobile services environment, namely SMOP, in which semantic web based service capability matching and location-aware information gathering are both used to develop mobile applications. Domain independency and support on semantic matching in mobile service capabilities are the innovative features of the proposed environment. Built-in semantic matching engine of the environment provides the addition of new service domain ontologies which is critical in terms of system extensibility. Therefore the environment is generic in terms of developing various mobile applications and provides most relevant services for mobile users by applying semantic capability matching in service lookups. GPS (Global Positioning System) and map service utilization cause to find near services in addition to capability relevancy. The software architecture and system extensibility support of the environment are discussed in the paper. The real life implementation of the environment for the estate domain is also given as a case study in the evaluation section of the paper.Ctr Natl Res Sci, City Montpellier, Ecole Polytech Univ Montepellier, Univ Montpellier II, Lab Informat Robot Microelect Montpllier, RMIT Univ, Sch Comp Sci & Informat Technol, Vrije Univ Brussel, Dept Comp Sc

    Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem

    No full text
    Background and Objectives: Recently, many studies have focused on the early detection of Parkinson’s disease (PD). This disease belongs to a group of neurological problems that immediately affect brain cells and influence the movement, hearing, and various cognitive functions. Medical data sets are often not equally distributed in their classes and this gives a bias in the classification of patients. We performed a Hybrid feature selection framework that can deal with imbalanced datasets like PD. Use the SOMTE algorithm to deal with unbalanced datasets. Removing the contradiction from the features in the dataset and decrease the processing time by using Recursive Feature Elimination (RFE), and Principle Component Analysis (PCA). Materials and Methods: PD acoustic datasets and the characteristics of control subjects were used to construct classification models such as Bagging, K-nearest neighbour (KNN), multilayer perceptron, and the support vector machine (SVM). In the prepressing stage, the synthetic minority over-sampling technique (SMOTE) with two-feature selection RFE and PCA were used. The PD dataset comprises a large difference between the numbers of the infected and uninfected patients, which causes the classification bias problem. Therefore, SMOTE was used to resolve this problem. Results: For model evaluation, the train–test split technique was used for the experiment. All the models were Grid-search tuned, the evaluation results of the SVM model showed the highest accuracy of 98.2%, and the KNN model exhibited the highest specificity of 99%. Conclusions: the proposed method is compared with the current modern methods of detecting Parkinson’s disease and other methods for medical diseases, it was noted that our developed system could treat data bias and reach a high prediction of PD and this can be beneficial for health organizations to properly prioritize assets
    corecore