5 research outputs found

    QUANTUM COMPUTING AND HPC TECHNIQUES FOR SOLVING MICRORHEOLOGY AND DIMENSIONALITY REDUCTION PROBLEMS

    Get PDF
    Tesis doctoral en período de exposición públicaDoctorado en Informática (RD99/11)(8908

    Preface

    Get PDF
    DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018.DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018

    MDS-Based Multiresolution Nonlinear Dimensionality Reduction Model for Color Image Segmentation

    Full text link

    Novel methods for multi-view learning with applications in cyber security

    Get PDF
    Modern data is complex. It exists in many different forms, shapes and kinds. Vectors, graphs, histograms, sets, intervals, etc.: they each have distinct and varied structural properties. Tailoring models to the characteristics of various feature representations has been the subject of considerable research. In this thesis, we address the challenge of learning from data that is described by multiple heterogeneous feature representations. This situation arises often in cyber security contexts. Data from a computer network can be represented by a graph of user authentications, a time series of network traffic, a tree of process events, etc. Each representation provides a complementary view of the holistic state of the network, and so data of this type is referred to as multi-view data. Our motivating problem in cyber security is anomaly detection: identifying unusual observations in a joint feature space, which may not appear anomalous marginally. Our contributions include the development of novel supervised and unsupervised methods, which are applicable not only to cyber security but to multi-view data in general. We extend the generalised linear model to operate in a vector-valued reproducing kernel Hilbert space implied by an operator-valued kernel function, which can be tailored to the structural characteristics of multiple views of data. This is a highly flexible algorithm, able to predict a wide variety of response types. A distinguishing feature is the ability to simultaneously identify outlier observations with respect to the fitted model. Our proposed unsupervised learning model extends multidimensional scaling to directly map multi-view data into a shared latent space. This vector embedding captures both commonalities and disparities that exist between multiple views of the data. Throughout the thesis, we demonstrate our models using real-world cyber security datasets.Open Acces

    Low-Dimensional Representations of Earth System Processes

    Get PDF
    In times of global change, we must closely monitor the state of our planet in order to understand gradual or abrupt changes early on. In fact, each of the Earth's subsystems-i.e. the biosphere, atmosphere, hydrosphere, cryosphere, and anthroposphere-can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region, e.g. the Multivariate ENSO (El Ă‘ino-Southern Oscillation) Index. Indicator approaches have been used extensively to describe socioeconomic data too, and a range of indices have been proposed to synthesize and interpret this information. For instance the "Human Development Index" (HDI) by the United Nations Development Programme was designed to capture specific aspects of development. "Dimensionality reduction" (DR) is a widely used approach to find low dimensional and interpretable representations of data that are natively embedded in high-dimensional spaces. Here, we propose a robust method to create indicators using dimensionality reduction to better represent the terrestrial biosphere and the global socioeconomic system. We aim to explore the performance of the approach and to interpret the resulting indicators. For biosphere indicators, the concept was tested using 12 explanatory variables representing the biophysical states of ecosystems and land-atmosphere water, energy, and carbon fluxes. We find that two indicators account for 73% of the variance of the state of the biosphere in space and time. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water and energy availability. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heatwave (2010), they also allow us to interpret the impacts of these events. The indicators also reveal changes in the seasonal cycle, e.g. increasing seasonal amplitudes of productivity in agricultural areas and in arctic regions. We also apply the method on the "World Development Indicators", a database with more than 1500 variables, to track the socioeconomic development at a country level. The aim was to extract the core dimensions of development in a highly efficient way, using a method of nonlinear dimensionality reduction. We find that over 90% of variance in the WDIs can be represented by five uncorrelated and nonlinear dimensions. The first dimension (explaining 74%) represents the state of education, health, income, infrastructure, trade, population, and pollution. The second dimension (explaining 10%) differentiates countries by gender ratios, labor market, and energy production patterns. Overall, we find that the data contained in the WDIs are highly nonlinear therefore requiring nonlinear methods to extract the main patterns of development. Globally, most countries show rather consistent temporal trends towards wealthier and aging societies. Deviations from the long-term trajectories are detected with our approach during warfare, environmental disasters, or fundamental political changes. In general we find that the indicator approach is able to extract general patterns from complex databases and that it can be applied to databases of varying characteristics. We also find that indicators are can different kinds of changes occurring in the system, such as extreme events, permanent changes or trends. Therefore it is a useful tool for general monitoring and exploratory data analysis. The approach is flexible and can be applied to complex datasets, such as large data, nonlinear data, as well as data with many missing values.In times of global change, we must closely monitor the state of our planet in order to understand gradual or abrupt changes early on. In fact, each of the Earth's subsystems-i.e. the biosphere, atmosphere, hydrosphere, cryosphere, and anthroposphere-can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region, e.g. the Multivariate ENSO (El Ă‘ino-Southern Oscillation) Index. Indicator approaches have been used extensively to describe socioeconomic data too, and a range of indices have been proposed to synthesize and interpret this information. For instance the "Human Development Index" (HDI) by the United Nations Development Programme was designed to capture specific aspects of development. "Dimensionality reduction" (DR) is a widely used approach to find low dimensional and interpretable representations of data that are natively embedded in high-dimensional spaces. Here, we propose a robust method to create indicators using dimensionality reduction to better represent the terrestrial biosphere and the global socioeconomic system. We aim to explore the performance of the approach and to interpret the resulting indicators. For biosphere indicators, the concept was tested using 12 explanatory variables representing the biophysical states of ecosystems and land-atmosphere water, energy, and carbon fluxes. We find that two indicators account for 73% of the variance of the state of the biosphere in space and time. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water and energy availability. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heatwave (2010), they also allow us to interpret the impacts of these events. The indicators also reveal changes in the seasonal cycle, e.g. increasing seasonal amplitudes of productivity in agricultural areas and in arctic regions. We also apply the method on the "World Development Indicators", a database with more than 1500 variables, to track the socioeconomic development at a country level. The aim was to extract the core dimensions of development in a highly efficient way, using a method of nonlinear dimensionality reduction. We find that over 90% of variance in the WDIs can be represented by five uncorrelated and nonlinear dimensions. The first dimension (explaining 74%) represents the state of education, health, income, infrastructure, trade, population, and pollution. The second dimension (explaining 10%) differentiates countries by gender ratios, labor market, and energy production patterns. Overall, we find that the data contained in the WDIs are highly nonlinear therefore requiring nonlinear methods to extract the main patterns of development. Globally, most countries show rather consistent temporal trends towards wealthier and aging societies. Deviations from the long-term trajectories are detected with our approach during warfare, environmental disasters, or fundamental political changes. In general we find that the indicator approach is able to extract general patterns from complex databases and that it can be applied to databases of varying characteristics. We also find that indicators are can different kinds of changes occurring in the system, such as extreme events, permanent changes or trends. Therefore it is a useful tool for general monitoring and exploratory data analysis. The approach is flexible and can be applied to complex datasets, such as large data, nonlinear data, as well as data with many missing values
    corecore