33,935 research outputs found
Application of data mining techniques in bioinformatics
With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Bioinformatics, an upcoming field in today’s world, which involves use of large databases can be effectively searched through data mining techniques to derive useful rules. Based on the type of knowledge that is mined, data mining techniques [1] can be mainly classified into association rules, decision trees and clustering. Until recently, biology lacked the tools to analyze massive repositories of information such as the human genome database [3]. The data mining techniques are effectively used to extract meaningful relationships from these data.Data mining is especially used in microarray analysis which is used to study the activity of different cells under different conditions. Two algorithms under each mining techniques were implemented for a large database and
compared with each other.
1. Association Rule Mining: - (a) a priori (b) partition
2. Clustering: - (a) k-means (b) k-medoids
3. Classification Rule Mining:- Decision tree generation using (a) gini index (b) entropy value. Genetic algorithms were applied to association and classification techniques. Further, kmeans and Density Based Spatial Clustering of Applications of Noise (DBSCAN) clustering techniques [1] were applied to a microarray dataset and compared. The microarray dataset was downloaded from internet using the Gene Array Analyzer Software(GAAS).The clustering was done on the basis of the signal color intensity of the genes in the microarray experiment. The following results were obtained:-
1. Association:- For smaller databases, the a priori algorithm works better than partition algorithm and for larger databases partition works better.
2. Clustering:- With respect to the number of interchanges, k-medoids algorithm works better than k-means algorithm.
3. Classification:- The results were similar for both the indices (gini index and entropy value). The application of genetic algorithm improved the efficiency of the association and classification techniques. For the microarray dataset, it was found that DBSCAN is less efficient than k-means when the database is small but for larger database DBSCAN is more accurate and efficient in terms of no. of clusters and time of execution. DBSCAN execution time increases linearly with the increase in database and was much lesser than that of k-means for larger database. Owing to the involvement of large datasets and the need to derive results from them, data mining techniques can be effectively put in use in the field of Bio-informatics [2]. The techniques can be applied to find associations among the genes, cluster similar gene and protein sequences and draw decision trees to classify the genes. Further, the data mining techniques can be made more efficient by applying genetic algorithms which greatly improves the search procedure and reduces the execution time
Image mining: trends and developments
[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining
Image mining: issues, frameworks and techniques
[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in significantly large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an
interdisciplinary endeavor that draws upon expertise in
computer vision, image processing, image retrieval, data
mining, machine learning, database, and artificial
intelligence. Despite the development of many
applications and algorithms in the individual research
fields cited above, research in image mining is still in its infancy. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining at the end of this paper
ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities
Density-based spatial clustering of applications with noise (DBSCAN) is a
data clustering algorithm which has the high-performance rate for dataset where
clusters have the constant density of data points. One of the significant
attributes of this algorithm is noise cancellation. However, DBSCAN
demonstrates reduced performances for clusters with different densities.
Therefore, in this paper, an adaptive DBSCAN is proposed which can work
significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on
Electrical Engineering and Information & Communication Technology (iCEEiCT
2018
Data Management and Mining in Astrophysical Databases
We analyse the issues involved in the management and mining of astrophysical
data. The traditional approach to data management in the astrophysical field is
not able to keep up with the increasing size of the data gathered by modern
detectors. An essential role in the astrophysical research will be assumed by
automatic tools for information extraction from large datasets, i.e. data
mining techniques, such as clustering and classification algorithms. This asks
for an approach to data management based on data warehousing, emphasizing the
efficiency and simplicity of data access; efficiency is obtained using
multidimensional access methods and simplicity is achieved by properly handling
metadata. Clustering and classification techniques, on large datasets, pose
additional requirements: computational and memory scalability with respect to
the data size, interpretability and objectivity of clustering or classification
results. In this study we address some possible solutions.Comment: 10 pages, Late
- …