178,733 research outputs found
Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach
Virtual screening (VS) is widely used during computational drug discovery to
reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to
predict new compound-protein interactions (CPIs) from known CPI network data
using several methods, including machine learning and data mining. Although
CGBVS facilitates highly efficient and accurate CPI prediction, it has poor
performance for prediction of new compounds for which CPIs are unknown. The
pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high
accuracy for prediction of new compounds. In this study, on the basis of link
mining, we improved the PKM by combining link indicator kernel (LIK) and
chemical similarity and evaluated the accuracy of these methods. The proposed
method obtained an average area under the precision-recall curve (AUPR) value
of 0.562, which was higher than that achieved by the conventional Gaussian
interaction profile (GIP) method (0.425), and the calculation time was only
increased by a few percent
Recommended from our members
Visualizing latent domain knowledge
Knowledge discovery and data mining commonly rely on finding salient patterns of association from a vast amount of data. Traditional citation analysis of scientific literature draws insights from strong citation patterns. Latent domain knowledge, in contrast to the mainstream domain knowledge, often consists of highly relevant but relatively infrequently cited scientific works. Visualizing latent domain knowledge presents a significant challenge to knowledge discovery and quantitative studies of science. We build upon a citation-based knowledge visualization procedure and develop an approach that not only captures knowledge structures from prominent and highly cited works, but also traces latent domain knowledge through low-frequency citation chains. We apply this approach to two cases: (1) identifying cross-domain applications of Pathfinder networks (PFNETs) and (2) clarifying the current status of scientific inquiry of a possible link between Bovine spongiform encephalopathy (BSE), also known as mad cow disease, and a new variant Creutzfeldt-Jakob disease (vCJD), a type of brain disease in human
Market Basket Analysis with Shortened Web Link Click Data
Market research is an indispensable part of an organization\u27s ability to understand market dynamics in an area. Over the past 20 years data collection and analysis through Knowledge Discovery through Databases (KDD) has arisen to supplement the traditional methods of surveys and focus groups. Market Basket Analysis is an area of KDD that identifies associations between commonly purchased items. As social media use has grown, link shortening companies help users share links in a constrained space environment and, in exchange, collect data about each user when a link is clicked. This research applies market basket analysis techniques with graph mining to shortened web link data to identify communities of co-visited websites to help analysts better understand web traffic for an area during a time range. Patterns within clusters of web domains regarding hardware platforms, operating systems, or referral sources are then identified and used to gain a better understanding of an area
Allele mining strategies: principles and utilisation for blast resistance genes in rice (Oryza sativa L.)
Allele mining is a promising way to dissect naturally occurring allelic variants of candidate genes with essential agronomic qualities. With the identification, isolation and characterisation of blast resistance genes in rice, it is now possible to dissect the actual allelic variants of these genes within an array of rice cultivars via allele mining. Multiple alleles from the complex locus serve as a reservoir of variation to generate functional genes. The routine sequence exchange is one of the main mechanisms of R gene evolution and development. Allele mining for resistance genes can be an important method to identify additional resistance alleles and new haplotypes along with the development of allele-specific markers for use in marker-assisted selection. Allele mining can be visualised as a vital link between effective utilisation of genetic and genomic resources in genomics-driven modern plant breeding. This review studies the actual concepts and potential of mining approaches for the discovery of alleles and their utilisation for blast resistance genes in rice. The details provided here will be important to provide the rice breeder with a worthwhile introduction to allele mining and its methodology for breakthrough discovery of fresh alleles hidden in hereditary diversity, which is vital for crop improvement
An Efficient Learning of Constraints For Semi-Supervised Clustering using Neighbour Clustering Algorithm
Data mining is the process of finding the previously unknown and potentially interesting patterns and relation in database. Data mining is the step in the knowledge discovery in database process (KDD) .The structures that are the outcome of the data mining process must meet certain condition so that these can be considered as knowledge. These conditions are validity, understandability, utility, novelty, interestingness. Researcher identifies two fundamental goals of data mining: prediction and description. The proposed research work suggests the semi-supervised clustering problem where to know (with varying degree of certainty) that some sample pairs are (or are not) in the same class. A probabilistic model for semi-supervised clustering based on Shared Semi-supervised Neighbor clustering (SSNC) that provides a principled framework for incorporating supervision into prototype-based clustering. Semi-supervised clustering that combines the constraint-based and fitness-based approaches in a unified model. The proposed method first divides the Constraint-sensitive assignment of instances to clusters, where points are assigned to clusters so that the overall distortion of the points from the cluster centroids is minimized, while a minimum number of must-link and cannot-link constraints are violated. Experimental results across UCL Machine learning semi-supervised dataset results show that the proposed method has higher F-Measures than many existing Semi-Supervised Clustering methods
WormBase 2012: more genomes, more data, new website
Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community
Web structure mining of dynamic pages
Web structure mining in static web contents decreases the accuracy of mined outcomes and affects the quality of decision making activity. By structure mining in web hidden data, the accuracy ratio of mined outcomes can be improved, thus enhancing the reliability and quality of decision making activity. Data Mining is an automated or semi automated exploration and analysis of large volume of data in order to reveal meaningful patterns. The term web mining is the discovery and analysis of useful information from World Wide Web that helps web search engines to find high quality web pages and enhances web click stream analysis. One branch of web mining is web structure mining. The goal of which is to generate structural summary about the Web site and Web pages. Web structure mining tries to discover the link structure of the hyperlinks at the inter-document level. In recent years, Web link structure mining has been widely used to infer important information about Web pages. But a major part of the web is in hidden form, also called Deep Web or Hidden Web that refers to documents on the Web that are dynamic and not accessible by general search engines; most search engine spiders can access only publicly index able Web (or the visible Web). Most documents in the hidden Web, including pages hidden behind search forms, specialized databases, and dynamically generated Web pages, are not accessible by general Web mining applications. Dynamic content generation is used in modern web pages and user forms are used to get information from a particular user and stored in a database. The link structure lying in these forms can not be accessed during conventional mining procedures. To access these links, user forms are filled automatically by using a rule based framework which has robust ability to read a web page containing dynamic contents as activeX controls like input boxes, command buttons, combo boxes, etc. After reading these controls dummy values are filled in the available fields and the doGet or doPost methods are automatically executed to acquire the link of next subsequent web page. The accuracy ratio of web page hierarchical structures can phenomenally be improved by including these hidden web pages in the process of Web structure mining. The designed system framework is adequately strong to process the dynamic Web pages along with static ones
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
- …