23 research outputs found

    Communication Theoretic Data Analytics

    Full text link
    Widespread use of the Internet and social networks invokes the generation of big data, which is proving to be useful in a number of applications. To deal with explosively growing amounts of data, data analytics has emerged as a critical technology related to computing, signal processing, and information networking. In this paper, a formalism is considered in which data is modeled as a generalized social network and communication theory and information theory are thereby extended to data analytics. First, the creation of an equalizer to optimize information transfer between two data variables is considered, and financial data is used to demonstrate the advantages. Then, an information coupling approach based on information geometry is applied for dimensionality reduction, with a pattern recognition example to illustrate the effectiveness. These initial trials suggest the potential of communication theoretic data analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan. 201

    Using Pattern Recognition for Investment Decision Support in Taiwan Stock Market

    Get PDF
    In Taiwan stock market, it has been accumulated large amounts of time series stock data and successful investment strategies. The stock price, which is impacted by various factors, is the result of buyer-seller investment strategies. Since the stock price reflects numerous factors, its pattern can be described as the strategies of investors. In this paper, pattern recognition concept is adapted to match the current stock price trend with the repeatedly appearing past price data. Accordingly, a new method is introduced in this research that extracting features quickly from stock time series chart to find out the most critical feature points. The matching can be processed via the corresponding information of the feature points. In other words, the goal is to seek for the historical repeatedly appearing patterns, namely the similar trend, offering the investors to make investment strategies

    Discovering System Health Anomalies Using Data Mining Techniques

    Get PDF
    We present a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an Integrated System Health Monitoring system. We specifically treat the problem of discovering anomalous features in the time series that may be indicative of a system anomaly, or in the case of a manned system, an anomaly due to the human. Identification of these anomalies is crucial to building stable, reusable, and cost-efficient systems. The framework consists of an analysis platform and new algorithms that can scale to thousands of sensor streams to discovers temporal anomalies. We discuss the mathematical framework that underlies the system and also describe in detail how this framework is general enough to encompass both discrete and continuous sensor measurements. We also describe a new set of data mining algorithms based on kernel methods and hidden Markov models that allow for the rapid assimilation, analysis, and discovery of system anomalies. We then describe the performance of the system on a real-world problem in the aircraft domain where we analyze the cockpit data from aircraft as well as data from the aircraft propulsion, control, and guidance systems. These data are discrete and continuous sensor measurements and are dealt with seamlessly in order to discover anomalous flights. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications

    Forecasting model for the change in stage of reservoir water level

    Get PDF
    Reservoir is one of major structural approaches for flood mitigation. During floods, early reservoir water release is one of the actions taken by the reservoir operator to accommodate incoming heavy rainfall. Late water release might give negative effect to the reservoir structure and cause flood at downstream area. However, current rainfall may not directly influence the change of reservoir water level. The delay may occur as the streamflow that carries the water might take some time to reach the reservoir. This study is aimed to develop a forecasting model for the change in stage of reservoir water level. The model considers the changes of reservoir water level and its stage as the input and the future change in stage of reservoir water level as the output. In this study, the Timah Tasoh reservoir operational data was obtained from the Perlis Department of Irrigation and Drainage (DID). The reservoir water level was categorised into stages based on DID manual. A modified sliding window algorithm has been deployed to segment the data into temporal patterns. Based on the patterns, three models were developed: the reservoir water level model, the change of reservoir water level and stage of reservoir water level model, and the combination of the change of reservoir water level and stage of reservoir water level model. All models were simulated using neural network and their performances were compared using on mean square error (MSE) and percentage of correctness. The result shows that the change of reservoir water level and stage of reservoir water model produces the lowest MSE and the highest percentage of correctness when compared to the other two models. The findings also show that a delay of two previous days has affected the change in stage of reservoir water level. The model can be applied to support early reservoir water release decision making. Thus, reduce the impact of flood at the downstream area

    T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data

    Get PDF
    The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events

    Minería de datos y lógica difusa.Una aplicación al estudio de la rentabilidad económica de las empresas agroalimentarias en Andalucía

    Get PDF
    En este trabajo se estudia la rentabilidad de la empresa agroalimentaria en Andalucía (España)mediante un conjunto de ratios, elaborados por el Instituto de Estadística de Andalucía a partir de la Central de Balances de Actividades Empresariales de Andalucía. El objeto es encontrar las características contables de las empresas más rentables. Los aspectos metodológicos que se contemplan en la aplicación comprenden algunas técnicas estadísticas avanzadas y nuevos métodos de extracción de conocimiento en grandes bases de datos (knowledge discovery y data mining). Las conclusiones a que se llega, expresadas en forma de reglas difusas obtenidas de la base de datos mediante la “teoría computacional de la percepción” (Zadeh, 2001; Last, Klein y Kandel, 2001), parecen plenamente congruentes con los postulados del análisis financiero.Si la rotación de activos es baja, no hay altas rentabilidades. Por el contrario, una importante rotación del activo, acompañada por una aceptable situación de liquidez, es lo que caracteriza a las empresas más rentables.Data mining, knowledge discovery, exploratory data analysis, neural networks, decision trees, accounting ratios, return of assets.
    corecore