459,677 research outputs found
Pemodelan Pola Hubungan Kemampuan Lulusan Universitas Lancang Kuning Dengan Kebutuhan Dunia USAha Dan Industri
The rapid growth of the data warehouse has created conditions for rich data but poor information. Data mining is the mining or the discovery of new information by looking for certain patterns or rules of a number of large amounts of data that is expected to produce an interesting pattern or important information from the condition. By utilizing the the graduate tracer data that is associated with users of graduates, they are business and industry, are expected to produce information about the pattern of their relationship through data mining techniques, association rule. Category ability of graduates in measuring the level parameter is less necessary, reasonably necessary, needed, and is needed in the world of business and industry. The algorithm used is a priori algorithm, the information displayed in the form of support and confidence values of each category type abilities of graduates
Marketing relations and communication infrastructure development in the banking sector based on big data mining
Purpose: The article aims to study the methodological tools for applying the technologies of intellectual analysis of big data in the modern digital space, the further implementation of which can become the basis for the marketing relations concept implementation in the banking sector of the Russian Federation‘ economy. Structure/Methodology/Approach: For the marketing relations development in the banking sector in the digital economy, it seems necessary: firstly, to identify the opportunities and advantages of the big data mining in banking marketing; secondly, to identify the sources and methods of processing big data; thirdly, to study the examples of the big data mining successful use by Russian banks and to formulate the recommendations on the big data technologies implementation in the digital marketing banking strategy. Findings: The authors‘ analysis showed that big data technologies processing of open online and offline sources of information significantly increases the data amount available for intelligent analysis, as a result of which the interaction between the bank and the target client reaches a new level of partnership. Practical Implications: Conclusions and generalizations of the study can be applied in the practice of managing financial institutions. The results of the study can be used by bank management to form a digital marketing strategy for long-term communication. Originality/Value: The main contribution of this study is that the authors have identified the main directions of using big data in relationship marketing to generate additional profit, as well as the possibility of intellectual analysis of the client base, aimed at expanding the market share and retaining customers in the banking sector of the economy.peer-reviewe
Structural Deep Embedding for Hyper-Networks
Network embedding has recently attracted lots of attentions in data mining.
Existing network embedding methods mainly focus on networks with pairwise
relationships. In real world, however, the relationships among data points
could go beyond pairwise, i.e., three or more objects are involved in each
relationship represented by a hyperedge, thus forming hyper-networks. These
hyper-networks pose great challenges to existing network embedding methods when
the hyperedges are indecomposable, that is to say, any subset of nodes in a
hyperedge cannot form another hyperedge. These indecomposable hyperedges are
especially common in heterogeneous networks. In this paper, we propose a novel
Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with
indecomposable hyperedges. More specifically, we theoretically prove that any
linear similarity metric in embedding space commonly used in existing methods
cannot maintain the indecomposibility property in hyper-networks, and thus
propose a new deep model to realize a non-linear tuplewise similarity function
while preserving both local and global proximities in the formed embedding
space. We conduct extensive experiments on four different types of
hyper-networks, including a GPS network, an online social network, a drug
network and a semantic network. The empirical results demonstrate that our
method can significantly and consistently outperform the state-of-the-art
algorithms.Comment: Accepted by AAAI 1
Data mining Twitter for cancer, diabetes, and asthma insights
Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data
Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in Complex Regional Pain Syndrome
Background: Collaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment. Results: A text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFκB) as a possible central mediator in both the initiation and progression of CRPS. Conclusion: The result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies
Scalable supergraph search in large graph databases
© 2016 IEEE. Supergraph search is a fundamental problem in graph databases that is widely applied in many application scenarios. Given a graph database and a query-graph, supergraph search retrieves all data-graphs contained in the query-graph from the graph database. Most existing solutions for supergraph search follow the pruning-and-verification framework, which prunes false answers based on features in the pruning phase and performs subgraph isomorphism testings on the remaining graphs in the verification phase. However, they are not scalable to handle large-sized data-graphs and query-graphs due to three drawbacks. First, they rely on a frequent subgraph mining algorithm to select features which is expensive and cannot generate large features. Second, they require a costly verification phase. Third, they process features in a fixed order without considering their relationship to the query-graph. In this paper, we address the three drawbacks and propose new indexing and query processing algorithms. In indexing, we select features directly from the data-graphs without expensive frequent subgraph mining. The features form a feature-tree that contains all-sized features and both the cost sharing and pruning power of the features are considered. In query processing, we propose a verification-free algorithm, where the order to process features is query-dependent by considering both the cost sharing and the pruning power. We explore two optimization strategies to further improve the algorithm efficiency. The first strategy applies a lightweight graph compression technique and the second strategy optimizes the inclusion of answers. Finally, we conduct extensive performance studies on two real large datasets to demonstrate the high scalability of our algorithms
Recommended from our members
Distributionally Robust Performance Analysis with Applications to Mine Valuation and Risk
We consider several problems motivated by issues faced in the mining industry. In recent years, it has become clear that mines have substantial tail risk in the form of environmental disasters, and this tail risk is not incorporated into common pricing and risk models. However, data sets of the extremal climate behavior that drive this risk are very small, and generally inadequate for properly estimating the tail behavior. We propose a data-driven methodology that comes up with reasonable worst-case scenarios, given the data size constraints, and we incorporate this into a real options based model for the valuation of mines. We propose several different iterations of the model, to allow the end-user to choose the degree to which they wish to specify the financial consequences of the disaster scenario. Next, in order to perform a risk analysis on a portfolio of mines, we propose a method of estimating the correlation structure of high-dimensional max-stable processes. Using the techniques of (Liu Et al, 2017) to map the relationship between normal correlations and max-stable correlations, we can then use techniques inspired by (Bickel et al, 2008, Liu et al, 2014, Rothman et al, 2009) to estimate the underlying correlation matrix, while preserving a sparse, positive-definite structure. The correlation matrices are then used in the calculation of model-robust risk metrics (VaR, CVAR) using the the Sample-Out-of-Sample methodology (Blanchet and Kang, 2017). We conclude with several new techniques that were developed in the field of robust performance analysis, that while not directly applied to mining, were motivated by our studies into distributionally robust optimization in order to address these problems
REALISTIC MODELING OF HANDOVER EVENTS IN A MULTI-CARRIER 5G NETWORK: A PRELIMINARY STEP TOWARDS COP-KPI RELATIONSHIP REALIZATION
The ever-increasing demand for mobile data traffic along with new use cases are set to make the current cellular network technology obsolete and give rise to a newer and better one in the form of 5G. This arising technology is coming with a promise
of massive capacity, ultra-high reliability and close to zero latency, however, coming alongside is additional complexity. 5G is expected to carry along with it more than 5000 confi guration and optimization parameters (COPs). These COPs are the backbone of a network as most of the Key Performance Indicators (KPIs) relies on the proper settings of these COPs. To set these parameters optimally, it is imperative that the relationship between COPs and KPIs be understood. However, to date, this relationship between COPs and KPIs is known to some extend but is not fully realized. But mining the COP-KPI relationship is not a dead end. Machine Learning (ML) can be leveraged to learn KPI behavior with changes in COPs. Yet, ML's full potential is bounded by the lack of representative data in the wireless community to effectively train these models. Gathering these data is, in
itself, a challenge. Real data from live network is abundant, yet not representative. Although simulator is a promising source of data, its performance lies on how realistic and detailed the modeling and implementation of its functions are. In this thesis paper, we have presented a realistic and comprehensive modeling of one of the most important functions of a wireless network: the handover function. In line with 3GPP standards, we have modeled and implemented more than 20 handover related COPs. The model is incorporated in a python-based simulator to generate data. Validation and evaluation are done to prove the model accuracy and its effectiveness in capturing real handover procedure. Use cases are also presented to show its capability to simulate different COP settings and show the effects on KPIs. This thesis paper is presented as an initial step in generating representative dataset to train machine learning to model COP-KPI relationship
- …