771 research outputs found
Mining frequent biological sequences based on bitmap without candidate sequence generation
Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability
ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper,
we propose a new mining algorithm based on Animal Migration Optimization (AMO), called
ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which
are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is
applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the
number of association rules with a new fitness function that incorporates frequent rules. It is
observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated
ARM-AMO: An Efficient Association Rule Mining Algorithm Based on Animal Migration Optimization
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI linkAssociation rule mining (ARM) aims to find out association rules that satisfy predefined minimum support and confidence from a given database. However, in many cases ARM generates extremely large number of association rules, which are impossible for end users to comprehend or validate, thereby limiting the usefulness of data mining results. In this paper,
we propose a new mining algorithm based on Animal Migration Optimization (AMO), called
ARM-AMO, to reduce the number of association rules. It is based on the idea that rules which
are not of high support and unnecessary are deleted from the data. Firstly, Apriori algorithm is
applied to generate frequent itemsets and association rules. Then, AMO is used to reduce the
number of association rules with a new fitness function that incorporates frequent rules. It is
observed from the experiments that, in comparison with the other relevant techniques, ARM-AMO greatly reduces the computational time for frequent item set generation, memory for association rule generation, and the number of rules generated
Colossal Trajectory Mining: A unifying approach to mine behavioral mobility patterns
Spatio-temporal mobility patterns are at the core of strategic applications such as urban planning and monitoring. Depending on the strength of spatio-temporal constraints, different mobility patterns can be defined. While existing approaches work well in the extraction of groups of objects sharing fine-grained paths, the huge volume of large-scale data asks for coarse-grained solutions. In this paper, we introduce Colossal Trajectory Mining (CTM) to efficiently extract heterogeneous mobility patterns out of a multidimensional space that, along with space and time dimensions, can consider additional trajectory features (e.g., means of transport or activity) to characterize behavioral mobility patterns. The algorithm is natively designed in a distributed fashion, and the experimental evaluation shows its scalability with respect to the involved features and the cardinality of the trajectory dataset
Corporate Smart Content Evaluation
Nowadays, a wide range of information sources are available due to the
evolution of web and collection of data. Plenty of these information are
consumable and usable by humans but not understandable and processable by
machines. Some data may be directly accessible in web pages or via data feeds,
but most of the meaningful existing data is hidden within deep web databases
and enterprise information systems. Besides the inability to access a wide
range of data, manual processing by humans is effortful, error-prone and not
contemporary any more. Semantic web technologies deliver capabilities for
machine-readable, exchangeable content and metadata for automatic processing
of content. The enrichment of heterogeneous data with background knowledge
described in ontologies induces re-usability and supports automatic processing
of data. The establishment of “Corporate Smart Content” (CSC) - semantically
enriched data with high information content with sufficient benefits in
economic areas - is the main focus of this study. We describe three actual
research areas in the field of CSC concerning scenarios and datasets
applicable for corporate applications, algorithms and research. Aspect-
oriented Ontology Development advances modular ontology development and
partial reuse of existing ontological knowledge. Complex Entity Recognition
enhances traditional entity recognition techniques to recognize clusters of
related textual information about entities. Semantic Pattern Mining combines
semantic web technologies with pattern learning to mine for complex models by
attaching background knowledge. This study introduces the afore-mentioned
topics by analyzing applicable scenarios with economic and industrial focus,
as well as research emphasis. Furthermore, a collection of existing datasets
for the given areas of interest is presented and evaluated. The target
audience includes researchers and developers of CSC technologies - people
interested in semantic web features, ontology development, automation,
extracting and mining valuable information in corporate environments. The aim
of this study is to provide a comprehensive and broad overview over the three
topics, give assistance for decision making in interesting scenarios and
choosing practical datasets for evaluating custom problem statements. Detailed
descriptions about attributes and metadata of the datasets should serve as
starting point for individual ideas and approaches
Analyze Large Multidimensional Datasets Using Algebraic Topology
This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper also investigate the possibility of Hadoop integration and the challenges that come with the framework
Bibliometric of Feature Selection Using Optimization Techniques in Healthcare using Scopus and Web of Science Databases
Feature selection technique is an important step in the prediction and classification process, primarily in data mining related aspects or related to medical field. Feature selection is immersive with the errand of choosing a subset of applicable features that could be utilized in developing a prototype. Medical datasets are huge in size; hence some effective optimization techniques are required to produce accurate results. Optimization algorithms are a critical function in medical data mining particularly in identifying diseases since it offers excellent effectiveness in minimum computational expense and time. The classification algorithms also produce superior outcomes when an objective function is built using the feature selection algorithm. The solitary motive of the research paper analysis is to comprehend the reach and utility of optimization algorithms such as the Genetic Algorithm (GA), the Particle Swarm Optimization (PSO) and the Ant Colony Optimization (ACO) in the field of Health care.
The aim is to bring efficiency and maximum optimization in the health care sector using the vast information that is already available related to these fields. With the help of data sets that are available in the health care analysis, our focus is to extract the most important features using optimization techniques and work on different algorithms so as to get the most optimized result.
Precision largely depends on usefulness of features that are taken into consideration along with finding useful patterns in those features to characterize the main problem. The Performance of the optimized algorithm finds the overall optimum with less function evaluation. The principle target of this examination is to optimize feature selection technique to bring an optimized and efficient model to cater to various health issues.
In this research paper, to do bibliometric analysis Scopus and Web of Science databases are used. This bibliometric analysis considers important keywords, datasets, significance of the considered research papers. It also gives details about types, sources of publications, yearly publication trends, significant countries from Scopus and Web of Science. Also, it captures details about co-appearing keywords, authors, source titles through networked diagrams. In a way, this research paper can be useful to researchers who want to contribute in the area of feature selection and optimization in healthcare. From this research paper it is observed that there is a lot scope for research for the considered research area. This kind of research will also be helpful for analyzing pandemic scenarios like COVID-19
- …