20 research outputs found

    Cohesive subgraph identification in large graphs

    Get PDF
    Graph data is ubiquitous in real world applications, as the relationship among entities in the applications can be naturally captured by the graph model. Finding cohesive subgraphs is a fundamental problem in graph mining with diverse applications. Given the important roles of cohesive subgraphs, this thesis focuses on cohesive subgraph identification in large graphs. Firstly, we study the size-bounded community search problem that aims to find a subgraph with the largest min-degree among all connected subgraphs that contain the query vertex q and have at least l and at most h vertices, where q, l, h are specified by the query. As the problem is NP-hard, we propose a branch-reduce-and-bound algorithm SC-BRB by developing nontrivial reducing techniques, upper bounding techniques, and branching techniques. Secondly, we formulate the notion of similar-biclique in bipartite graphs which is a special kind of biclique where all vertices from a designated side are similar to each other, and aim to enumerate all maximal similar-bicliques. We propose a backtracking algorithm MSBE to directly enumerate maximal similar-bicliques, and power it by vertex reduction and optimization techniques. In addition, we design a novel index structure to speed up a time-critical operation of MSBE, as well as to speed up vertex reduction. Efficient index construction algorithms are developed. Thirdly, we consider balanced cliques in signed graphs --- a clique is balanced if its vertex set can be partitioned into CL and CR such that all negative edges are between CL and CR --- and study the problem of maximum balanced clique computation. We propose techniques to transform the maximum balanced clique problem over G to a series of maximum dichromatic clique problems over small subgraphs of G. The transformation not only removes edge signs but also sparsifies the edge set

    Incremental and parallel algorithms for dense subgraph mining

    Get PDF
    The task of maintaining densely connected subgraphs from a continuously evolving graph is important because it solves many practical problems that require constant monitoring over the continuous stream of linked data often represented as a graph. For example, continuous maintenance of a certain group of closely connected nodes can reveal unusual activity over the transaction network, identification, and evolution of active groups in the social network, etc. On the other hand, mining these structures from graph data is often expensive because of the complexity of the computation and the volume of the structures (the number of densely connected structures can be of exponential order on the number of vertices in the graph). One way to deal with the expensive computations is to consider parallel computation. In this thesis, we advance the state of the art by developing provably efficient algorithms for mining maximal cliques and maximal bicliques; two fundamental dense structures. First, we consider the design of efficient algorithms for the maintenance of maximal cliques and maximal bicliques in an evolving network. We observe that it is important to locate the region of the graph in the event of the update so that we can maintain the structures by computing the changes exactly where it is located. Following this observation, we design efficient techniques that find appropriate subgraphs for identifying the changes in the structures. We prove that our algorithms can maintain dense structures efficiently. More specifically, we show that our algorithms can quickly compute the changes when it is small irrespective of the size of the graph. We empirically evaluate our algorithms and show that our algorithms significantly outperform the state of the art algorithms. Next, we consider parallel computation for efficient utilization of the multiple cores in a multi-core computing system so that the expensive mining tasks can be eased off and we can achieve better speedup than their efficient sequential counterparts. We design shared memory parallel algorithms for the mining of maximal cliques and maximal bicliques and we prove the efficiency of the parallel algorithms through showing that the total work performed by the parallel algorithm is equivalent to the time complexity of the best sequential algorithm for doing the same task. Our experimental study shows that we achieve good speedup over the prior state of the art parallel algorithms and significant speedup over the state of the art sequential algorithms. We also show that our parallel algorithms scale almost linearly with the increase in the processor cores

    Real-time analytics on large dynamic graphs

    Get PDF
    In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets

    The Role of Financial Ratios in Determining the Stock Prices

    Get PDF
    Bu çalışma ile hisse senedi fiyatlarının belirlenmesinde etkili olan finansal oranlar panel veri yöntemiyle incelenmektedir. Bu amaçla İMKB’de işlem gören ve imalat sektöründe faaliyet gösteren 73 şirkete ait 1990-2009 yılları arasındaki veri seti kullanılmaktadır. Ampirik sonuçlar kârlılık ve likidite oranlarının hisse senedi getirileri üzerinde pozitif bir etkiye sahip olduklarını ortaya koymaktadır. Bununla birlikte borçluluk göstergesi olarak ele alınan kaldıraç oranı da benzer etkiye sahiptir. Ancak faaliyet oranlarının hisse senedi getirisini etkilemediği görülmektedir. Sonuç olarak ise finansal oranların hisse senedi getirisini belirlemedeki rolünün düşük olduğu söylenebilir .In this study, financial ratios being effective in determining stock prices are investigated by panel data analysis. In this aim, data set belonging to 73 companies indexed in Istanbul Stock Exchange (ISE) and operating in manufacturing sector over the period of 1990-2000 is used. Empirical results suggest that profitability and liquidity ratios have a positive effect on stock returns. Moreover, leverage ratio taken as an indicator of indebtedness has the same effect. However, it is seen that operating ratios have no impact on stock returns. Consequently, it may be said that the role of financial ratios in determining the stock returns is low

    Evaluation on rapid profiling with clustering algorithms for plantation stocks on Bursa Malaysia

    Get PDF
    Building a stock portfolio often requires extensive financial knowledge and Herculean efforts looking at the amount of financial data to analyse. In this study, we utilized Expectation Maximization (EM), K-Means (KM), and Hierarchical Clustering (HC) algorithms to cluster the 38 plantation stocks listed on Bursa Malaysia using 14 financial ratios derived from the fundamental analysis.The clustering allows investors to profile each resulted cluster statistically and assists them in selecting stocks for their stock portfolios rapidly.The performance of each cluster was then assessed using 1-year stock price movement.The result showed that a cluster resulted from EM had a better profile and obtained a higher average capital gain as compared with the other clusters

    The Effect of Financial Risks of Companies Listed in the BIST Sustainability Index on Stock Prices

    Get PDF
    This study is aimed to investigate the impact of the financial risks of the enterprises in the sustainability index on the value of the stock. In line with this objective, financial risk values were calculated with the Altman Zskor model by using the financial statement data of the enterprises for the 2011-2020 periods. The impact of the financial risks of the enterprises in the sustainability index on the stock value was perused by panel data analysis in the study. As a result of econometric analysis, it was determined that the financial risk values of the enterprises affect the stock return rates negatively. In other words, as the financial risks of the enterprises decreased, the stock return rate increased

    FACTORS AFFECTING THE BUSINESS LEVEL OF STOCK PRICES: APPLICATION IN ISTANBUL STOCK EXCHANGE

    Get PDF
    Hisse senedi yatırımcılarının kararlarını etkileyen en önemli etkenlerden birisi hiç kuşkusuz hisse senedi fiyatlarıdır. Yatırımcıların sağlıklı karar alabilmesi hisse senedi fiyatlarını etkileyen faktörlerin doğru ve anlamlı bir şekilde ortaya konulmasıyla mümkündür. Bu çalışmanın amacı, hisse senedi fiyatını belirleyen işletme düzeyindeki faktörleri belirleyebilmektir. Çalışmada 2009:1 ile 2015:2 dönemleri arasında BİST Teknoloji endeksinde işlem gören şirketlerin hisse senedi fiyatları bağımlı değişken olarak ele alınmıştır. Bağımsız değişkenler olarak ise kaldıraç oranı, temettü ödeme oranı, hisse başına kar oranı, aktif karlılığı oranı, fiyat/kazanç oranı, net kar büyüme hızı, öz sermaye artış hızı, işlem görme oranı ve piyasa değeri/defter değeri oranları belirlenmiştir. Çalışmada çoklu regresyon modeli olarak en küçük kareler tahmin yöntemi kullanılmıştır. Araştırma sonuçlarına göre hisse senedi fiyatına etki eden işletme düzeyindeki en önemli faktörlerin piyasa değeri/defter değeri ve hisse başına kar oranı olduğu ileri sürülebilir.Stock prices are certainly one of the most important factors which affects the decision of the shareholders. Investor’s deciding easily is possible by presenting the factors affecting stock prices correctly and meaningfully. The aim of this study is to determine firm concerned factors affecting stock prices. The stock prices of the firms which are between 2009:1 and 2015:2 periods in BIST Technology Index have been taken as dependent variables in this study. Leverage ratio, dividend payout ratio, earnings per share ratio, return on assets ratio, price/earnings ratio, net profit growing speed, capital stock increase speed, trading ratio and market/book value have been taken as independent variable. In the study least-square multiple regression model has been used. According to the results of the research, it can be suggested that the most important firm concerned factors affecting stock prices are market/book value and earnings per share ratio

    Mining low-diameter clusters conserved in graph collections

    Get PDF
    The analysis of social and biological networks often involves modeling clusters of interest as cliques or their graph-theoretic generalizations. The k-club model, which relaxes the requirement of pairwise adjacency in a clique to length-bounded paths inside the cluster, has been used to model cohesive subgroups in social networks and functional modules/complexes in biological networks. However, if the graphs are time-varying, or if they change under different (experimental) conditions, we may be interested in clusters that preserve their property over time or under changes in conditions. To model such clusters that are conserved in a collection of graphs, we consider a cross-graph k-club model, a subset of nodes that forms a k-club in every graph in the collection.In this dissertation, we consider the canonical optimization problem of finding a cross-graph k-club of maximum cardinality. The overall goal of this dissertation is to develop integer programming approaches to solve the problem. We establish computational complexity of the problem and its related problems. We introduce a naive extension of the cut-like formulation for the maximum k-club problem and offer ideas to strengthen it. We introduce valid inequalities for the problem and extend existing inequalities valid for the single-graph problem to the cross-graph setting. We introduce algorithmic ideas to solve this problem using a decomposition branch-and-cut algorithm. For scale reduction, we explore preprocessing procedures and extended formulations. We assess computational effectiveness of the techniques we propose and evaluate their performance on benchmark instances.We introduce and study in this dissertation, the maximum k-club signature problem, which aims to find a maximum cardinality cross-graph k-club in T consecutive graphs in a sequence of graphs, where the parameter T is specified by the user. We propose a 'moving window' method that solves a sequence of several maximum cross-graph k-club problems, and assess the performance of the approaches we propose in solving the signature variant

    Hybrid intelligence for data mining

    Full text link
    Today, enormous amount of data are being recorded in all kinds of activities. This sheer size provides an excellent opportunity for data scientists to retrieve valuable information using data mining techniques. Due to the complexity of data in many neoteric problems, one-size-fits-all solutions are seldom able to provide satisfactory answers. Although the studies of data mining have been active, hybrid techniques are rarely scrutinized in detail. Currently, not many techniques can handle time-varying properties while performing their core functions, neither do they retrieve and combine information from heterogeneous dimensions, e.g., textual and numerical horizons. This thesis summarizes our investigations on hybrid methods to provide data mining solutions to problems involving non-trivial datasets, such as trajectories, microblogs, and financial data. First, time-varying dynamic Bayesian networks are extended to consider both causal and dynamic regularization requirements. Combining with density-based clustering, the enhancements overcome the difficulties in modeling spatial-temporal data where heterogeneous patterns, data sparseness and distribution skewness are common. Secondly, topic-based methods are proposed for emerging outbreak and virality predictions on microblogs. Complicated models that consider structural details are popular while others might have taken overly simplified assumptions to sacrifice accuracy for efficiency. Our proposed virality prediction solution delivers the benefits of both worlds. It considers the important characteristics of a structure yet without the burden of fine details to reduce complexity. Thirdly, the proposed topic-based approach for microblog mining is extended for sentiment prediction problems in finance. Sentiment-of-topic models are learned from both commentaries and prices for better risk management. Moreover, previously proposed, supervised topic model provides an avenue to associate market volatility with financial news yet it displays poor resolutions at extreme regions. To overcome this problem, extreme topic model is proposed to predict volatility in financial markets by using supervised learning. By mapping extreme events into Poisson point processes, volatile regions are magnified to reveal their hidden volatility-topic relationships. Lastly, some of the proposed hybrid methods are applied to service computing to verify that they are sufficiently generic for wider applications
    corecore