37 research outputs found

    A Novel Method for Seismogenic Zoning Based on Triclustering: Application to the Iberian Peninsula

    Get PDF
    A previous definition of seismogenic zones is required to do a probabilistic seismic hazard analysis for areas of spread and low seismic activity. Traditional zoning methods are based on the available seismic catalog and the geological structures. It is admitted that thermal and resistant parameters of the crust provide better criteria for zoning. Nonetheless, the working out of the rheological profiles causes a great uncertainty. This has generated inconsistencies, as different zones have been proposed for the same area. A new method for seismogenic zoning by means of triclustering is proposed in this research. The main advantage is that it is solely based on seismic data. Almost no human decision is made, and therefore, the method is nearly non-biased. To assess its performance, the method has been applied to the Iberian Peninsula, which is characterized by the occurrence of small to moderate magnitude earthquakes. The catalog of the National Geographic Institute of Spain has been used. The output map is checked for validity with the geology. Moreover, a geographic information system has been used for two purposes. First, the obtained zones have been depicted within it. Second, the data have been used to calculate the seismic parameters (b-value, annual rate). Finally, the results have been compared to Kohonen’s self-organizing maps.Ministerio de Economía y Competitividad TIN2014-55894-C2-RJunta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    K

    Get PDF

    Research project grouping and ranking by using adaptive Mahalanobis clustering

    Get PDF
    The paper discusses the problem of grouping and ranking of research projects submitted for a call. The projects are grouped into clusters based on the assessment obtained in the review procedure and by using the adaptive Mahalanobis clustering method as a special case of the Expectation Maximization algorithm. The cluster of projects assessed as best is specially analyzed and ranked. The paper outlines several possibilities for the use of data obtained in the review procedure, and the proposed method is illustrated with the example of internal research projects at the University of Osijek

    Effect of different distance measures in result of cluster analysis

    Get PDF
    The objective of this master’s thesis was to explore different distance measures that could be used in clustering and to evaluate how different distance measures in K-medoid clustering method would affect the clustering output. The different distance measures used in this research includes Euclidean, Squared Euclidean, Manhattan, Chebyshev and Mahalanobis distance. To achieve the research objective, K-medoid method with different distance measures was applied to a spatial dataset to explore relative information revealed by each distance measure. The effect of each distance measure on output is documented and the output was further compared with each other to reveal the differences between each distance measure. The study starts with literature review of cluster analysis process where necessary steps for performing cluster analysis are explained. In literature section, different clustering methods with particular characteristics of each method are described that would serve as basis for choice of clustering method. Data description and data analysis is included thereafter which is followed by interpretation of clustering result and its use for Terrain analysis. Terrain analysis has its significance in forest industry, military as well as crisis management and is usually concerned with off-road mobility of a vehicle or a group of vehicles between given locations. In case of terrain analysis, clustering could be used to group the similar areas and determine the off-road mobility of a particular vehicle. This result could be further categorized according to suitability of the item in the cluster and interpreted using expert evaluation in order to reveal useful information about mobility in a terrain. Cluster Validation measures were applied to output of clustering to determine the differences between different distance measures. The findings of this study indicate that in the study area, there exists some level of differences in the result of clustering when different distance measures are used. This difference is then interpreted with the help of input dataset and expert opinion to understand the effect of different distance measures in the dataset. Finally, the study provides basis for mobility analysis with help of clustering output

    Statistical analysis of different seismogenic zonings of the Iberian Peninsula and adjacent areas through a Geographic Information System

    Get PDF
    The knowledge of the seismic hazard in the Iberian Peninsula (IP) and its neighboring area is important to address the mitigation of damage that earthquakes could cause in it. The occurrence of earthquakes in the area is quite frequent because it is in the contact zone between the Eurasian Plate and the African Plate. The general objective of this document is the calculation, representation and analysis of a set of seismic parameters (b-value, maximum magnitude and annual rate of earthquakes per unit area) of the Iberian Peninsula and its adjacent area, considering geographic information systems (GIS) as a basic working tool. These systems allow the integration of data from different information sources, as well as rigorous and quality analysis and graphical representations. To achieve this goal, having a quality seismic catalog is essential. Therefore, one has been compiled for the area as complete, rigorous and extensive in the time possible, and further, revised, homogeneous in size (magnitude) and with independent events. This has served as the basis for the works exposed here. For the generation of this catalog, the database of earthquakes of the National Geographic Institute of Spain has been consider as a starting point, that has been revised (especially the magnitude) and completed with other databases and specific studies. In addition, the catalog of work has included earthquakes for which only macroseismic (and reliable) information is available as well as those recorded during the instrumental period according to the scientific advances of each moment. Then, the size of all the events has been transformed to moment magnitude (Mw) in order to compare it, taking into account only the events with Mw greater or equal to 3.0. Subsequently, a process of elimination of non-main shocks (foreshocks, aftershocks and swarms) has been carried out. Finally, a completeness date has been considered for each magnitude. In this thesis, the b-value, the annual rate of earthquakes per unit area and the maximum magnitude have been calculated, represented and analyzed. In addition, it has been done through two approaches. The first deals with zoning related to Spanish seismic regulations and are based on both geological characteristics and seismicity of the area; and others that are based on objective and mathematically robust criteria and considers only seismicity. In the second approach, a set of multiresolution grids have been established, in which zonings are defined according to a purely geographic criterion. The size of the cells (zones) has been 0.5º x 0.5º for the calculation of the maximum magnitude recorded and 1º x 1º and 2º x 2º for the b-value and the annual rate normalized with the area. In both types of zoning, after the calculations and the representation of the seismic parameters, an analysis of them has been carried out. From this analysis it can be deduced that in some areas there has not been a quantity of events that allows to derive seismic parameters with solidity from a statistical point of view. It can also be concluded that earthquakes with maximum recorded magnitude have a marine epicenter and are located in the SW of the IP. Moreover, the b-value takes a value of 1.0 or somewhat lower in the contact zone between the Eurasian and African plates (a value that decreases further to the east), while in the mainland, 1.2 can be considered an approximate value, with somewhat higher values in some areas. Finally, regarding the annual rate, it should be noted that the highest values (close to 1E-3 events / km2) appear in the Granada basin and in the Pyrenees Region and to a lesser extent, to the SW of Cabo de San Vicente, in Galicia and a large part of the southeast of the IP where values greater than 1E-4 are exceeded.El conocimiento de la peligrosidad sísmica en la península ibérica y su entorno es importante para abordar la mitigación de los daños que los terremotos podrían causar en la misma. La ocurrencia de terremotos en el área es bastante frecuente porque se encuentra en la zona de contacto entre la placa euroasiática y la africana. El objetivo general de esta tesis doctoral es el cálculo, representación y análisis de un conjunto de parámetros que intervienen en la definición de la peligrosidad sísmica de la península ibérica y su área adyacente, considerando como herramienta básica de trabajo los sistemas de información geográfica. Estos permiten la integración de datos de distintas fuentes de información, así como el análisis y representaciones gráficas rigurosas y de calidad. Para la consecución de este objetivo, el disponer de un catálogo sísmico de calidad es fundamental. Por tanto, se ha compilado uno para la zona lo más completo, riguroso y extenso en el tiempo posible y además, revisado, homogéneo en tamaño (magnitud) y con eventos independientes. Este ha servido como base para los trabajos que aquí se exponen. Para la generación del mismo, se ha partido de la base de datos de terremotos del Instituto Geográfico Nacional de España, que se ha visto revisada (sobre todo la magnitud) y completada con otras bases de datos y estudios específicos. Además, en el catálogo del trabajo se han incluido, desde terremotos de los que únicamente se dispone de información macrosísmica (y fiable) como los registrados durante la época instrumental según los avances científicos de cada momento. Luego, se ha transformado el tamaño de todos los eventos a magnitud momento (Mw) para poder compararlo, tomando solo los eventos con Mw mayor o igual a 3,0. Posteriormente, se ha llevado a cabo un proceso de eliminación de terremotos no principales (premonitores, réplicas y enjambres). Finalmente, se ha considerado una fecha de completitud para cada magnitud. En esta tesis se han calculado, representado y analizado el parámetro b-value, la tasa anual de terremotos por unidad de área y la magnitud máxima. Además, se ha hecho a través de dos aproximaciones. La primera versa sobre zonificaciones relacionadas con la normativa sismorresistente española y basadas tanto en las características geológicas como en la sismicidad de la zona; y por otras que parten de criterios objetivos y robustos matemáticamente y están basadas solo en la sismicidad. En la segunda aproximación, se han establecido un conjunto de mallas multirresolución, en las que las zonificaciones son definidas según un criterio puramente geográfico. El tamaño de las celdas (zonas) ha sido de 0,5º x 0,5º para el cálculo de la magnitud máxima registrada y de 1º x 1º y 2º x 2º para el del b-value y la tasa anual normalizada con el área. En ambos tipos de zonificaciones, tras los cálculos y la representación de los parámetros sísmicos, se ha llevado a cabo un análisis de los mismos. De este se deduce que en algunas zonas no ha ocurrido una cantidad de eventos que permita extraer parámetros sísmicos con solidez desde un punto de vista estadístico. También se puede concluir que los terremotos con magnitud máxima registrada tienen epicentro marino y se encuentran al suroeste de la península ibérica. Por otro lado, el b-value toma un valor de 1,0 o algo menor en la zona de contacto entre las placas euroasiática y africana (valor que disminuye más al este), mientras que en tierra firme como valor aproximado se puede considerar 1,2, con valores algo mayores en algunas zonas. Finalmente, respecto a la tasa anual de terremotos, cabe comentar que los valores más altos (cercanos a 1E-3 eventos / km2) aparecen en la cuenca de Granada y en la región de los Pirineos y, en menor medida al SO del Cabo de San Vicente, en Galicia y gran parte del sureste peninsular donde se superan valores mayores a 1E-4.Premio Extraordinario de Doctorado U

    Técnicas big data para el procesamiento de flujos de datos masivos en tiempo real

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111Machine learning techniques have become one of the most demanded resources by companies due to the large volume of data that surrounds us in these days. The main objective of these technologies is to solve complex problems in an automated way using data. One of the current perspectives of machine learning is the analysis of continuous flows of data or data streaming. This approach is increasingly requested by enterprises as a result of the large number of information sources producing time-indexed data at high frequency, such as sensors, Internet of Things devices, social networks, etc. However, nowadays, research is more focused on the study of historical data than on data received in streaming. One of the main reasons for this is the enormous challenge that this type of data presents for the modeling of machine learning algorithms. This Doctoral Thesis is presented in the form of a compendium of publications with a total of 10 scientific contributions in International Conferences and journals with high impact index in the Journal Citation Reports (JCR). The research developed during the PhD Program focuses on the study and analysis of real-time or streaming data through the development of new machine learning algorithms. Machine learning algorithms for real-time data consist of a different type of modeling than the traditional one, where the model is updated online to provide accurate responses in the shortest possible time. The main objective of this Doctoral Thesis is the contribution of research value to the scientific community through three new machine learning algorithms. These algorithms are big data techniques and two of them work with online or streaming data. In this way, contributions are made to the development of one of the current trends in Artificial Intelligence. With this purpose, algorithms are developed for descriptive and predictive tasks, i.e., unsupervised and supervised learning, respectively. Their common idea is the discovery of patterns in the data. The first technique developed during the dissertation is a triclustering algorithm to produce three-dimensional data clusters in offline or batch mode. This big data algorithm is called bigTriGen. In a general way, an evolutionary metaheuristic is used to search for groups of data with similar patterns. The model uses genetic operators such as selection, crossover, mutation or evaluation operators at each iteration. The goal of the bigTriGen is to optimize the evaluation function to achieve triclusters of the highest possible quality. It is used as the basis for the second technique implemented during the Doctoral Thesis. The second algorithm focuses on the creation of groups over three-dimensional data received in real-time or in streaming. It is called STriGen. Streaming modeling is carried out starting from an offline or batch model using historical data. As soon as this model is created, it starts receiving data in real-time. The model is updated in an online or streaming manner to adapt to new streaming patterns. In this way, the STriGen is able to detect concept drifts and incorporate them into the model as quickly as possible, thus producing triclusters in real-time and of good quality. The last algorithm developed in this dissertation follows a supervised learning approach for time series forecasting in real-time. It is called StreamWNN. A model is created with historical data based on the k-nearest neighbor or KNN algorithm. Once the model is created, data starts to be received in real-time. The algorithm provides real-time predictions of future data, keeping the model always updated in an incremental way and incorporating streaming patterns identified as novelties. The StreamWNN also identifies anomalous data in real-time allowing this feature to be used as a security measure during its application. The developed algorithms have been evaluated with real data from devices and sensors. These new techniques have demonstrated to be very useful, providing meaningful triclusters and accurate predictions in real time.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e informátic

    Smart Urban Water Networks

    Get PDF
    This book presents the paper form of the Special Issue (SI) on Smart Urban Water Networks. The number and topics of the papers in the SI confirm the growing interest of operators and researchers for the new paradigm of smart networks, as part of the more general smart city. The SI showed that digital information and communication technology (ICT), with the implementation of smart meters and other digital devices, can significantly improve the modelling and the management of urban water networks, contributing to a radical transformation of the traditional paradigm of water utilities. The paper collection in this SI includes different crucial topics such as the reliability, resilience, and performance of water networks, innovative demand management, and the novel challenge of real-time control and operation, along with their implications for cyber-security. The SI collected fourteen papers that provide a wide perspective of solutions, trends, and challenges in the contest of smart urban water networks. Some solutions have already been implemented in pilot sites (i.e., for water network partitioning, cyber-security, and water demand disaggregation and forecasting), while further investigations are required for other methods, e.g., the data-driven approaches for real time control. In all cases, a new deal between academia, industry, and governments must be embraced to start the new era of smart urban water systems

    A data-based approach for dynamic classification of functional scenarios oriented to industrial process plants

    Get PDF
    L'objectif principal de cette thèse est de développer un algorithme dynamique de partitionnement de données (classification non supervisée ou " clustering " en anglais) qui ne se limite pas à des concepts statiques et qui peut gérer des distributions qui évoluent au fil du temps. Cet algorithme peut être utilisé dans les systèmes de surveillance du processus, mais son application ne se limite pas à ceux-ci. Les contributions de cette thèse peuvent être présentées en trois groupes: 1. Contributions au partitionnement dynamique de données en utilisant : un algorithme de partitionnement dynamique basé à la fois sur la distance et la densité des échantillons est présenté. Cet algorithme ne fait aucune hypothèse sur la linéarité ni la convexité des groupes qu'il analyse. Ces clusters, qui peuvent avoir des densités différentes, peuvent également se chevaucher. L'algorithme développé fonctionne en ligne et fusionne les étapes d'apprentissage et de reconnaissance, ce qui permet de détecter et de caractériser de nouveaux comportements en continu tout en reconnaissant l'état courant du système. 2. Contributions à l'extraction de caractéristiques : une nouvelle approche permettant d'extraire des caractéristiques dynamiques est présentée. Cette approche, basée sur une approximation polynomiale par morceaux, permet de représenter des comportements dynamiques sans perdre les informations relatives à la magnitude et en réduisant simultanément la sensibilité de l'algorithme au bruit dans les signaux analysés. 3. Contributions à la modélisation de systèmes à événements discrets évolutifs a partir des résultats du clustering : les résultats de l'algorithme de partitionnement sont utilisés comme base pour l'élaboration d'un modèle à événements discrets du processus. Ce modèle adaptatif offre une représentation du comportement du processus de haut niveau sous la forme d'un automate dont les états représentent les états du processus appris par le partitionnement jusqu'à l'instant courant et les transitions expriment l'atteignabilité des états.The main objective of this thesis is to propose a dynamic clustering algorithm that can handle not only dynamic data but also evolving distributions. This algorithm is particularly fitted for the monitoring of processes generating massive data streams, but its application is not limited to this domain. The main contributions of this thesis are: 1. Contribution to dynamic clustering by the proposal of an approach that uses distance- and density-based analyses to cluster non-linear, non-convex, overlapped data distributions with varied densities. This algorithm, that works in an online fashion, fusions the learning and lassification stages allowing to continuously detect and characterize new concepts and at the same time classifying the input samples, i.e. which means recognizing the current state of the system in a supervision application. 2. Contribution to feature extraction by the proposal of a novel approach to extract dynamic features. This approach ,based on piece-polynomial approximation, allows to represent dynamic behaviors without losing magnitude related information and to reduce at the same time the algorithm sensitivity to noise corrupting the signals. 3. Contribution to automatic discrete event modeling for evolving systems by exploiting informations brought by the clustering. The generated model is presented as a timed automaton that provides a high-level representation of the behavior of the process. The latter is adaptive in the sense that its construction is elaborated following the discovery of new concepts by the clustering algorithm

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data
    corecore