244 research outputs found

    PlantES: A plant electrophysiological multi-source data online analysis and sharing platform

    Get PDF
    At present, plant electrophysiological data volumes and complexity are increasing rapidly. It causes the demand for efficient management of big data, data sharing among research groups, and fast analysis. In this paper, we proposed PlantES (Plant Electrophysiological Data Sharing), a distributed computing-based prototype system that can be used to store, manage, visualize, analyze, and share plant electrophysiological data. We deliberately designed a storage schema to manage the multi-source plant electrophysiological data by integrating distributed storage systems HDFS and HBase to access all kinds of files efficiently. To improve the online analysis efficiency, parallel computing algorithms on Spark were proposed and implemented, e.g., plant electrical signals extraction method, the adaptive derivative threshold algorithm, and template matching algorithm. The experimental results indicated that Spark efficiently improves the online analysis. Meanwhile, the online visualization and sharing of multiple types of data in the web browser were implemented. Our prototype platform provides a solution for web-based sharing and analysis of plant electrophysiological multi-source data and improves the comprehension of plant electrical signals from a systemic perspective

    SAT-hadoop-processor: a distributed remote sensing big data processing software for earth observation applications

    Get PDF
    Nowadays, several environmental applications take advantage of remote sensing techniques. A considerable volume of this remote sensing data occurs in near real-time. Such data are diverse and are provided with high velocity and variety, their pre-processing requires large computing capacities, and a fast execution time is critical. This paper proposes a new distributed software for remote sensing data pre-processing and ingestion using cloud computing technology, specifically OpenStack. The developed software discarded 86% of the unneeded daily files and removed around 20% of the erroneous and inaccurate datasets. The parallel processing optimized the total execution time by 90%. Finally, the software efficiently processed and integrated data into the Hadoop storage system, notably the HDFS, HBase, and Hive.This research was funded by Erasmus+ KA 107 program, and the UPC funded the APC. This work has received funding from the Spanish Government under contracts PID2019-106774RBC21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version

    An Optimized Kappa Architecture for IoT Data Management in Smart Farming

    Full text link
    peer reviewedAgriculture 4.0 is a domain of IoT in full growth which produces large amounts of data from machines, robots, and sensors networks. This data must be processed very quickly, especially for the systems that need to make real-time decisions. The Kappa architecture provides a way to process Agriculture 4.0 data at high speed in the cloud, and thus meets processing requirements. This paper presents an optimized version of the Kappa architecture allowing fast and efficient data management in Agriculture. The goal of this optimized version of the classical Kappa architecture is to improve memory management and processing speed. the Kappa architecture parameters are fine tuned in order to process data from a concrete use case. The results of this work have shown the impact of parameters tweaking on the speed of treatment. We have also proven that the combination of Apache Samza with Apache Druid offers the better performances

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    Topic-based classification and identification of global trends for startup companies

    Get PDF
    Altres ajuts: Acord transformatiu CRUE-CSICUnidad de excelencia María de Maeztu CEX2019-000940-MTo foresee global economic trends, one needs to understand the present startup companies that soon may become new market leaders. In this paper, we explore textual descriptions of more than 250 thousand startups in the Crunchbase database. We analyze the 2009-2019 period by using topic modeling. We propose a novel classification of startup companies free from expert bias that contains 38 topics and quantifies the weight of each of these topics for all the startups. Taking the year of establishment and geographical location of the startups into account, we measure which topics were increasing or decreasing their share over time, and which of them were predominantly present in Europe, North America, or other regions. We find that the share of startups focused on data analytics, social platforms, and financial transfers, and time management has risen, while an opposite trend is observed for mobile gaming, online news, and online social networks as well as legal and professional services. We also identify strong regional differences in topic distribution, suggesting certain concentration of the startups. For example, sustainable agriculture is presented stronger in South America and Africa, while pharmaceutics, in North America and Europe. Furthermore, we explore which pairs of topics tend to co-occur more often together, quantify how multisectoral the startups are, and which startup classes attract more investments. Finally, we compare our classification to the one existing in the Crunchbase database, demonstrating how we improve it

    Power extraction circuits for piezoelectric energy harvesters and time series data in water supply systems

    Get PDF
    This thesis investigates two fundamental technological challenges that prevent water utilities from deploying infrastructure monitoring apparatus with high spatial and temporal resolution: providing sufficient power for sensor nodes by increasing the power output from a vibration-driven energy harvester based on piezoelectric transduction, and the processing and storage of large volumes of data resulting from the increased level of pressure and flow rate monitoring. Piezoelectric energy harvesting from flow-induced vibrations within a water main represents a potential source of power to supply a sensor node capable of taking high- frequency measurements. A main factor limiting the amount of power from a piezoelectric device is the damping force that can be achieved. Electronic interface circuits can modify this damping in order to increase the power output to a reasonable level. A unified analytical framework was developed to compare circuits able to do this in terms of their power output. A new circuit is presented that out-performs existing circuits by a factor of 2, which is verified experimentally. The second problem concerns the management of large data sets arising from resolving challenges with the provision of power to sensor devices. The ability to process large data volumes is limited by the throughput of storage devices. For scientists to execute queries in a timely manner, query execution must be performant. The large volume of data that must be gathered to extract information from historic trends mandates a scalable approach. A scalable, durable storage and query execution framework is presented that is able to significantly improve the execution time of user-defined queries. A prototype database was implemented and validated on a cluster of commodity servers using live data gathered from a London pumping station and transmission mains. Benchmark results and reliability tests are included that demonstrate a significant improvement in performance over a traditional database architecture for a range of frequently-used operations, with many queries returning results near-instantaneously

    The role of big data in smart city

    No full text
    The expansion of big data and the evolution of Internet of Things (IoT) technologies have played an important role in the feasibility of smart city initiatives. Big data offer the potential for cities to obtain valuable insights from a large amount of data collected through various sources, and the IoT allows the integration of sensors, radio-frequency identification, and Bluetooth in the real-world environment using highly networked services. The combination of the IoT and big data is an unexplored research area that has brought new and interesting challenges for achieving the goal of future smart cities. These new challenges focus primarily on problems related to business and technology that enable cities to actualize the vision, principles, and requirements of the applications of smart cities by realizing the main smart environment characteristics. In this paper, we describe the existing communication technologies and smart-based applications used within the context of smart cities. The visions of big data analytics to support smart cities are discussed by focusing on how big data can fundamentally change urban populations at different levels. Moreover, a future business model that can manage big data for smart cities is proposed, and the business and technological research challenges are identified. This study can serve as a benchmark for researchers and industries for the future progress and development of smart cities in the context of big data

    Big Data Management for Cloud-Enabled Geological Information Services

    Get PDF

    A Machine Learning Framework to Predict Determinant Factors of Seeds

    Get PDF
    In this paper, we audit the machine learning apparatuses for foreseeing determinant components of seeds. We depict this issue regarding Big Data, ANN, Hadoop and R. We consider Machine-learning techniques especially suited to forecasts dependent on existing information, yet exact expectations about the far off future are frequently on a very basic level unthinkable. Farming is an industry where recorded and current information flourish. This survey researches the various information sources accessible in the horticultural field and dissects them for utilization in Seed determinant factor Predictions. We recognized certain relevant information and researched techniques for utilizing this information to improve forecast inside the farming action
    corecore