9 research outputs found

    ALGORITHM FOR PROCESSING DATA OF GEOLOGICAL SURVEYS USING GIS TECHNOLOGIES (ON THE EXAMPLE OF THE MATERIALS OF DRILLING STUDY OF BREST REGION TERRITORY)

    Get PDF
    Рассмотрена проблема «больших данных», подходы к их классификации, а также основные методы, применяемые при их обработке. Приведен анализ наиболее распространенных способов пред- варительного статистического анализа пространственных данных. На примере информации, полученной в результате геологических изысканий, проведенных на территории Брестской области, разработан алгоритм, позволяющий с применением геоинформационных технологий осуществлять обработку данных геологического бурения. Представленный алгоритм включает в себя несколько последовательных этапов и учитывает существующие подходы к анализу пространственных данных. Для автоматизации процессов обработки информации создан набор инструментов «processing of geological data»

    АЛГОРИТМ ОБРАБОТКИ ДАННЫХ ГЕОЛОГИЧЕСКИХ ИЗЫСКАНИЙ С ПРИМЕНЕНИЕМ ГИС-ТЕХНОЛОГИЙ (НА ПРИМЕРЕ МАТЕРИАЛОВ БУРОВОЙ ИЗУЧЕННОСТИ ТЕРРИТОРИИ БРЕСТСКОЙ ОБЛАСТИ)

    Get PDF
    В данной статье рассмотрена проблема “больших данных”, подходы к их классификации, а также основные методы, применяемые при их обработке. Приведен анализ наиболее распространенных способов предварительного статистического анализа пространственных данных. На примере информации, полученной в результате геологических изысканий, проведенных на территории Брестской области разработан алгоритм, позволяющий с применением геоинформационных технологий осуществлять обработку данных геологического бурения. Представленный алгоритм включает несколько последовательных этапов и учитывает существующие подходы к анализу пространственных данных. Для автоматизации процессов обработки информации создан набор инструментов «processing of geological data»

    Converting a Water Pressurized Network in a Small Town into a Solar Power Water System

    Get PDF
    The efficient management of water and energy is one challenge for managers of water pressurized systems. In a scheme with high pressure on the environment, solar power appears as an opportunity for nonrenewable energy expenditure reduction and emissions elimination. In Spain, new legislation that eliminates old taxes associated with solar energy production, a drop in the cost of solar photovoltaic modules, and higher values of irradiance has converted solar powered water systems into one of the trendiest topics in the water industry. One alternative to store energy (compulsory in standalone photovoltaic systems) when managing pressurized urban water networks is the use of head tanks (tanks accumulate water during the day and release it at night). This work intends to compare the pressurized network running as a standalone system and a hybrid solution that incorporates solar energy supply and electricity grids. The indicator used for finding the best choice is the net present value for the solar power water system lifespan. This study analyzed the possibility of transferring the energy surplus obtained at midday to the electricity grid, a circumstance introduced in the Spanish legislation since April 2019. We developed a real case study in a small town in the Alicante Province, whose findings provide planning policymakers with very useful information in this case and similar case studies.Antonio Jodar-Abellán acknowledges financial support received from the Spanish FPU scholarship for the training of university teachers. In the same way, this work has been partially funded by the Cátedra del Agua of the University of Alicante and the Diputación Provincial de Alicante (https://catedradelaguaua.org/)

    DENCAST: distributed density-based clustering for multi-target regression

    Get PDF
    Recent developments in sensor networks and mobile computing led to a huge increase in data generated that need to be processed and analyzed efficiently. In this context, many distributed data mining algorithms have recently been proposed. Following this line of research, we propose the DENCAST system, a novel distributed algorithm implemented in Apache Spark, which performs density-based clustering and exploits the identified clusters to solve both single- and multi-target regression tasks (and thus, solves complex tasks such as time series prediction). Contrary to existing distributed methods, DENCAST does not require a final merging step (usually performed on a single machine) and is able to handle large-scale, high-dimensional data by taking advantage of locality sensitive hashing. Experiments show that DENCAST performs clustering more efficiently than a state-of-the-art distributed clustering algorithm, especially when the number of objects increases significantly. The quality of the extracted clusters is confirmed by the predictive capabilities of DENCAST on several datasets: It is able to significantly outperform (p-value <0.05<0.05 ) state-of-the-art distributed regression methods, in both single and multi-target settings

    Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data

    No full text
    The increasing presence of geo-distributed sensor networks implies the generation of huge volumes of data from multiple geographical locations at an increasing rate. This raises important issues which become more challenging when the final goal is that of the analysis of the data for forecasting purposes or, more generally, for predictive tasks. This paper proposes a framework which supports predictive modeling tasks from streaming data coming from multiple geo-referenced sensors. In particular, we propose a distance-based anomaly detection strategy which considers objects described by embedding features learned via a stacked auto-encoder. We then devise a repair strategy which repairs the data detected as anomalous exploiting non-anomalous data measured by sensors in nearby spatial locations. Subsequently, we adopt Gradient Boosted Trees (GBTs)to predict/forecast values assumed by a target variable of interest for the repaired newly arriving (unlabeled)data, using the original feature representation or the embedding feature representation learned via the stacked auto-encoder. The workflow is implemented with distributed Apache Spark programming primitives and tested on a cluster environment. We perform experiments to assess the performance of each module, separately and in a combined manner, considering the predictive modeling of one-day-ahead energy production, for multiple renewable energy sites. Accuracy results show that the proposed framework allows reducing the error up to 13.56%. Moreover, scalability results demonstrate the efficiency of the proposed framework in terms of speedup, scaleup and execution time under a stress test

    Etterretningsanalyse og stordata. Stordatadrevet ACH – bedre analyser eller enda en tidstyv?

    Get PDF
    Ved å ta i bruk nye teknologier tilknyttet stordataanalyse, hevder denne oppgaven at det finnes et stort potensial for å kunne gjøre etterretningsanalyse raskere, og sette den enkelte analytiker i stand til å dekke over en større informasjonsmengde med høyere presisjon. Dette muliggjøres av den teknologiske utviklingen innenfor databehandling og stordatasystemer som gjør det mulig å overføre, analysere og sammenstille informasjon raskere og mer effektivt, og gjennom dette kunne tolke store datamengder. Utgangspunktet for denne oppgaven var en observasjon av at stordata- og etterretningsanalyse har en del fellestrekk, og at stordataanalyse derfor kan ha et potensiale for å bidra til mer effektiv etterretningsanalyse. Målsetningen ble derfor å utforske om, og på hvilken måte, stordataanalyse kan understøtte etterretningsanalyse. For å gjøre dette blir det foreslått en metode som kombinerer stordata og etterretningsanalyse. Metoden har fått navnet stordatadrevet ACH. Metoden benytter stordataanalyse til å avdekke mønstre og foreslå konklusjoner, mens ACH benyttes som et rammeverk for å utvikle hypoteser, vurdere kontekst og ta de endelige beslutningene. Metoden blir evaluert gjennom et eksperiment. Eksperimentet tester en hypotese om at det foregår ulovlig, urapportert eller uregulert (UUU) fiske i norske farvann. Med bakgrunn i tre indikatorer for UUU-fiske ble det beskrevet algoritmer som kunne svare på de nevnte indikatorene. For å gjennomføre eksperimentet ble det etablert en stordatainfrastruktur, og Kystverkets åpne AIS-strøm ble benyttet som stordatakilde. I løpet av eksperimentet ble mer enn 5 000 000 AIS-meldinger analysert, og svar på de ulike indikatorene ble presentert i sanntid. Resultatet av eksperimentet var at det ble identifisert ett fiskefartøy som vi kan hevde at det er økt sannsynlighet for at driver med UUU-fiske. Et sentralt funn er at det er tidkrevende og komplisert å etablere stordataløsninger. Stordatainfrastruktur krever spesifikk kompetanse både for å sette opp, konfigurere og drifte. Det er derfor viktig at det etableres et tett samarbeid mellom etterretningsanalytiker og dataingeniør(er) når det skal etableres slike løsninger. Studien viser at stordatadrevet ACH er et kraftig verktøy når det anvendes riktig. Den endelige konklusjonen er derfor at stordatadrevet ACH kan øke omfanget og anvendeligheten til ACH spesielt og til etterretningsanalyse generelt, men dette forutsetter at det benyttes på rett problemstillinger. Det må ses på som et supplement til eksisterende prosesser og systemer, ikke en erstatning
    corecore