11 research outputs found

    Extracting Functional Modules from Biological Pathways

    Get PDF
    It has been proposed that functional modules are the fundamental units of cellular function. Methods to identify these modules have thus far relied on gene expression data or protein-protein interaction (PPI) data, but have a few limitations. We propose a new method, using biological pathway data to identify functional modules, that can potentially overcome these limitations. We also construct a network of these modules using functionally relevant PPI data. This network displays the flow and integration of information between modules and can be used to map cellular function

    Improving the Performance of Heterogeneous Hadoop Clusters Using Map Reduce

    Get PDF
    The key issue that emerges because of the tremendous development of connectivity among devices and frameworks is making such a great amount of data at an exponential rate that an achievable answer for preparing it is getting to be troublesome step by step. Thusly, building up a stage for such propelled dimension of data handling, equipment just as programming improvements should be led to come in level with such generous data. To enhance the proficiency of Hadoop bunches in putting away and dissecting big data, we have proposed an algorithmic methodology that will provide food the necessities of heterogeneous data put away .over Hadoop groups and enhance the execution just as effectiveness. The proposed paper intends to discover the adequacy of new calculation, correlation, proposals, and an aggressive way to deal with discover the best answer for enhancing the big data situation. The Map Reduce method from Hadoop will help in keeping up a nearby watch over the unstructured or heterogeneous Hadoop bunches with bits of knowledge on results obviously from the algorithm.in this paper we proposed new Generating another calculation to tackle these issues for the business just as non-business uses can help the advancement of network. The proposed calculation can help enhance the situation of data ordering calculation MapReduce in heterogeneous Hadoop groups. The exposition work and analyses directed under this work have copied very amazing outcomes, some of them being the selection of schedulers to plan employments, arrangement of data in similitude lattice, bunching before planning inquiries and in addition, iterative, mapping and diminishing and restricting the inner conditions together to stay away from question slowing down and execution times. The test led additionally sets up the way that if a procedure is characterized to deal with the diverse use case situations, one could generally lessen the expense of processing and can profit on depending on disseminated frameworks for quick executions

    Predicting User\u27s Future Requests Using Frequent Patterns

    Get PDF
    In this research, we predict User\u27s Future Request using Data Mining Algorithm. Usage of the World Wide Web has resulted in a huge amount of data and handling of this data is getting hard day by day. All this data is stored as Web Logs and each web log is stored in a different format with different Field names like search string, URL with its corresponding timestamp, User ID’s that helps for session identification, Status code, etc. Whenever a user requests for a URL there is a delay in getting the page requested and sometimes the request is denied. Our goal is to generate a Frequent Pattern Itemset on the Web Log we have chosen and after analyzing and processing the data we apply Apriori All algorithm with a minimum support to prune and improve the Frequent Pattern and thereby predict the User\u27s Future Request which will help the user in successfully reaching out to the URL pages he has requested

    Using Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) to Mine Frequent Patterns

    Get PDF
    Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting ⊂ target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern

    An Efficient Itemset Representation for Mining Frequent Patterns in Transactional Databases

    Get PDF
    In this paper we propose very efficient itemset representation for frequent itemset mining from transactional databases. The combinatorial number system is used to uniquely represent frequent k-itemset with just one integer value, for any k ≥ 2. Experiments show that memory requirements can be reduced up to 300 %, especially for very low minimal support thresholds. Further, we exploit combinatorial number schema for representing candidate itemsets during iterative join-based approach. The novel algorithm maintains one-dimensional array rank, starting from k = 2nd iteration. At the index r of the array, the proposed algorithm stores unique integer representation of the r-th candidate in lexicographic order. The rank array provides joining of two candidate k-itemsets to be O(1) instead of O(k) operation. Additionally, the rank array provides faster determination which candidates are contained in the given transaction during the support count and test phase. Finally, we believe that itemset ranking by combinatorial number system can be effectively integrated into pattern-growth algorithms, that are state-of-the-art in frequent itemset mining, and additionally improve their performances

    DMC-GRASP: A continuous GRASP hybridized with data mining

    Get PDF
    The hybridization of metaheuristics with data mining techniques has been successfully applied to combinatorial optimization problems. Examples of this type of strategy are DM-GRASP and MDM-GRASP, hybrid versions of the Greedy Randomized Adaptive Search Procedure (GRASP) metaheuristic, which incorporate data mining techniques. This type of hybrid method is called Data-Driven Metaheuristics and aims at extracting useful knowledge from the data generated by metaheuristics in their search process. Despite success in combinatorial problems like the set packing problem and maximum diversity problem, proposals of this type to solve continuous optimization problems are still scarce in the literature. This work presents a data mining hybrid version of C-GRASP, an adaptation of GRASP for problems with continuous variables. We call this new version DMC-GRASP, which identifies patterns in high-quality solutions and generates new solutions guided by these patterns. We performed computational experiments with DMC-GRASP on a set of well-known mathematical benchmark functions, and the results showed that metaheuristics for continuous optimization could also benefit from using patterns to guide the search for better solutions

    Wissensentdeckung im Kontext der Produktionssimulation

    Get PDF
    Klassische Simulationsstudien im Kontext von Produktionssystemen zielen üblicherweise darauf ab, typische, vorab definierte Fragestellungen zu beantworten. Wirkzusammenhänge, die über diesen definierten Projektrahmen hinausgehen, bleiben eventuell unentdeckt. Mit steigender Rechenleistung und der Verfügbarkeit von Big-Data-Infrastrukturen entstehen neue Möglichkeiten zur Durchführung groß angelegter Simulationsstudien, um das Verhalten des Modells möglichst vollständig abzudecken und auszuwerten. Dies wird allgemein als Data Farming bezeichnet. In diesem Buch wird die Methode des Data Farming für die Wissensentdeckung in Produktionssimulationen weiterentwickelt. Dazu wird ein Konzept ausgearbeitet, welches die Auswahl geeigneter Experimentdesignmethoden, die Anwendung und Ausgestaltung von geeigneten Data-Mining-Verfahren sowie Visualisierungs- und Interaktionsmethoden beinhaltet. Das Konzept wird dann in insgesamt vier Fallstudien angewendet.Discrete simulation is an important and established method for the investigation of the dynamic behavior of complex production and logistic systems. Simulation is therefore an essential tool for planning, operating, and controlling those systems, for example in the automotive or semiconductor industries. In this context, typical simulation studies aim at answering pre-defined questions about these systems. This is often accompanied by the simulation and analysis of a few pre-defined scenarios. Relations and effects outside of those predefined project scopes may therefore remain undiscovered. On the other hand, with increasing computing power and the general availability of big data infrastructures, new possibilities arise for carrying out very large bandwidths of simulation experiments in order to cover the behavior of the model as completely as possible and analyze the output data in an automated way. This is generally referred to as data farming. The goal of this work was to transfer and enhance the concept of data farming for the application on knowledge discovery in manufacturing simulations. For this purpose, a holistic concept was created for finding unknown, hidden, and useful knowledge in massive amounts of simulation data. The concept contains the selection of suitable experiment design methods, the application and elaboration of suitable data mining methods in an appropriate and targeted analysis process, as well as the definition of suitable visualization and interaction methods for an iterative and user-focused analysis of large amounts of simulation output data. Furthermore, the concept was prototypically implemented in an integrated software framework. The applicability of the concept was shown and validated in four case studies including two academic and two real-world case studies.Die diskrete Simulation stellt eine wichtige und etablierte Methode zur Untersuchung des dynamischen Verhaltens von komplexen Produktions- und Logistiksystemen dar. Sie ist daher zur Planung, Steuerung und Kontrolle solcher Systeme unerlässlich, beispielsweise in der Automobilindustrie oder in der Halbleiterfertigung. Klassische Simulationsstudien zielen in diesem Kontext üblicherweise darauf ab, typische, vorab definierte Fragestellungen zu beantworten. Dies geht oftmals einher mit der Simulation und Analyse einiger weniger vorab definierter Szenarien. Wirkzusammenhänge, die über diesen definierten Projektrahmen hinausgehen, bleiben daher eventuell unentdeckt. Auf der anderen Seite erwachsen mit steigender Rechenleistung und der allgemeinen Verfügbarkeit von Big-Data-Infrastrukturen neue Möglichkeiten zur Durchführung von sehr großen Bandbreiten von Simulationsexperimenten, um das Verhalten des Modells möglichst vollständig abzudecken und automatisiert auszuwerten. Dies wird allgemein als Data Farming bezeichnet. Ziel dieser Arbeit war es, die Methode des Data Farming für die Nutzung zur Wissensentdeckung in Produktionssimulationen zu übertragen und weiterzuentwickeln. Dazu wurde ein ganzheitliches Konzept ausgearbeitet, um unbekannte, versteckte und potenziell nützliche Wirkzusammenhänge in großen Mengen von Simulationsdaten entdecken zu können. Das Konzept beinhaltet hierzu die Auswahl geeigneter Experimentdesignmethoden, die Anwendung und Ausgestaltung von geeigneten Data-Mining-Verfahren in einem dafür zweckmäßigen und zielgerichteten Analyseprozess sowie die Definition geeigneter Visualisierungs- und Interaktionsmethoden zur iterativen, anwenderorientierten Analyse großer Mengen von Simulationsdaten. Darüber hinaus wurde das Konzept in einem ganzheitlichen Softwareframework prototypisch implementiert. Die Anwendbarkeit des Konzeptes wurde anhand von vier Fallstudien aufgezeigt und validiert. Die Fallstudien beinhalteten hierbei zwei akademische Laborstudien sowie zwei Industrieanwendungsfälle
    corecore