10 research outputs found

    Biclustering sobre datos de expresi贸n g茅nica basado en b煤squeda dispersa

    Get PDF
    Falta palabras claveLos datos de expresi贸n g茅nica, y su particular naturaleza e importancia, motivan no s贸lo el desarrollo de nuevas t茅cnicas sino la formulaci贸n de nuevos problemas como el problema del biclustering. El biclustering es una t茅cnica de aprendizaje no supervisado que agrupa tanto genes como condiciones. Este doble agrupamiento lo diferencia del clustering tradicional sobre este tipo de datos ya que 茅ste s贸lo agrupa o bien genes o condiciones. La presente tesis presenta un nuevo algoritmo de biclustering que permite el estudio de distintos criterios de b煤squeda. Dicho algoritmo utiliza esquema de b煤squeda dispersa, o scatter search, que independiza el mecanismo de b煤squeda del criterio empleado. Se han estudiado tres criterios de b煤squeda diferentes que motivan las tres principales aportaciones de la tesis. En primer lugar se estudia la correlaci贸n lineal entre los genes, que se integra como parte de la funci贸n objetivo empleada por el algoritmo de biclustering. La correlaci贸n lineal permite encontrar biclusters con patrones de desplazamiento y escalado, lo que mejora propuestas anteriores. En segundo lugar, y motivado por el significado biol贸gico de los patrones de activaci贸n-inhibici贸n entre genes, se modifica la correlaci贸n lineal de manera que se contemplen estos patrones. Por 煤ltimo, se ha tenido en cuenta la informaci贸n disponible sobre genes en repositorios p煤blicos, como la ontolog铆a de genes GO, y se incorpora dicha informaci贸n como parte del criterio de b煤squeda. Se a帽ade un t茅rmino extra que refleja, por cada bicluster que se eval煤e, la calidad de ese grupo de genes seg煤n su informaci贸n almacenada en GO. Se estudian dos posibilidades para dicho t茅rmino de integraci贸n de informaci贸n biol贸gica, se comparan entre s铆 y se comprueba que los resultados son mejores cuando se usa informaci贸n biol贸gica en el algoritmo de biclustering. Las tres aportaciones descritas, junto con una serie de pasos intermedios, han dado lugar a resultados publicados tanto en revistas como en conferencias nacionales e internacionales

    Mathematical Methods and Operation Research in Logistics, Project Planning, and Scheduling

    Get PDF
    In the last decade, the Industrial Revolution 4.0 brought flexible supply chains and flexible design projects to the forefront. Nevertheless, the recent pandemic, the accompanying economic problems, and the resulting supply problems have further increased the role of logistics and supply chains. Therefore, planning and scheduling procedures that can respond flexibly to changed circumstances have become more valuable both in logistics and projects. There are already several competing criteria of project and logistic process planning and scheduling that need to be reconciled. At the same time, the COVID-19 pandemic has shown that even more emphasis needs to be placed on taking potential risks into account. Flexibility and resilience are emphasized in all decision-making processes, including the scheduling of logistic processes, activities, and projects

    Optimization-Based Network Analysis with Applications in Clustering and Data Mining

    Get PDF
    In this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research. The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem. Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems

    Design and Management of Manufacturing Systems

    Get PDF
    Although the design and management of manufacturing systems have been explored in the literature for many years now, they still remain topical problems in the current scientific research. The changing market trends, globalization, the constant pressure to reduce production costs, and technical and technological progress make it necessary to search for new manufacturing methods and ways of organizing them, and to modify manufacturing system design paradigms. This book presents current research in different areas connected with the design and management of manufacturing systems and covers such subject areas as: methods supporting the design of manufacturing systems, methods of improving maintenance processes in companies, the design and improvement of manufacturing processes, the control of production processes in modern manufacturing systems production methods and techniques used in modern manufacturing systems and environmental aspects of production and their impact on the design and management of manufacturing systems. The wide range of research findings reported in this book confirms that the design of manufacturing systems is a complex problem and that the achievement of goals set for modern manufacturing systems requires interdisciplinary knowledge and the simultaneous design of the product, process and system, as well as the knowledge of modern manufacturing and organizational methods and techniques

    T茅cnicas big data para el procesamiento de flujos de datos masivos en tiempo real

    Get PDF
    Programa de Doctorado en Biotecnolog铆a, Ingenier铆a y Tecnolog铆a Qu铆micaL铆nea de Investigaci贸n: Ingenier铆a, Ciencia de Datos y Bioinform谩ticaClave Programa: DBIC贸digo L铆nea: 111Machine learning techniques have become one of the most demanded resources by companies due to the large volume of data that surrounds us in these days. The main objective of these technologies is to solve complex problems in an automated way using data. One of the current perspectives of machine learning is the analysis of continuous flows of data or data streaming. This approach is increasingly requested by enterprises as a result of the large number of information sources producing time-indexed data at high frequency, such as sensors, Internet of Things devices, social networks, etc. However, nowadays, research is more focused on the study of historical data than on data received in streaming. One of the main reasons for this is the enormous challenge that this type of data presents for the modeling of machine learning algorithms. This Doctoral Thesis is presented in the form of a compendium of publications with a total of 10 scientific contributions in International Conferences and journals with high impact index in the Journal Citation Reports (JCR). The research developed during the PhD Program focuses on the study and analysis of real-time or streaming data through the development of new machine learning algorithms. Machine learning algorithms for real-time data consist of a different type of modeling than the traditional one, where the model is updated online to provide accurate responses in the shortest possible time. The main objective of this Doctoral Thesis is the contribution of research value to the scientific community through three new machine learning algorithms. These algorithms are big data techniques and two of them work with online or streaming data. In this way, contributions are made to the development of one of the current trends in Artificial Intelligence. With this purpose, algorithms are developed for descriptive and predictive tasks, i.e., unsupervised and supervised learning, respectively. Their common idea is the discovery of patterns in the data. The first technique developed during the dissertation is a triclustering algorithm to produce three-dimensional data clusters in offline or batch mode. This big data algorithm is called bigTriGen. In a general way, an evolutionary metaheuristic is used to search for groups of data with similar patterns. The model uses genetic operators such as selection, crossover, mutation or evaluation operators at each iteration. The goal of the bigTriGen is to optimize the evaluation function to achieve triclusters of the highest possible quality. It is used as the basis for the second technique implemented during the Doctoral Thesis. The second algorithm focuses on the creation of groups over three-dimensional data received in real-time or in streaming. It is called STriGen. Streaming modeling is carried out starting from an offline or batch model using historical data. As soon as this model is created, it starts receiving data in real-time. The model is updated in an online or streaming manner to adapt to new streaming patterns. In this way, the STriGen is able to detect concept drifts and incorporate them into the model as quickly as possible, thus producing triclusters in real-time and of good quality. The last algorithm developed in this dissertation follows a supervised learning approach for time series forecasting in real-time. It is called StreamWNN. A model is created with historical data based on the k-nearest neighbor or KNN algorithm. Once the model is created, data starts to be received in real-time. The algorithm provides real-time predictions of future data, keeping the model always updated in an incremental way and incorporating streaming patterns identified as novelties. The StreamWNN also identifies anomalous data in real-time allowing this feature to be used as a security measure during its application. The developed algorithms have been evaluated with real data from devices and sensors. These new techniques have demonstrated to be very useful, providing meaningful triclusters and accurate predictions in real time.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e inform谩tic
    corecore