23 research outputs found
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
Recommended from our members
Multidimensional Time Series Fuzzy Association Rules Mining
In this paper, we present a new solution, in which the fuzziness of both subsequences and subsequences interval has been taken into consideration for solving the problem of multidimensional time series fuzzy association rules mining. Aimed at dealing with the new conception, this paper has put forward some key algorithms of the solution. Finally, an application example of multidimensional time series fuzzy association rules mining is illustrated. The result shows that rules with fuzzy interval can only be mined out by the above-mentioned new method
Using Pattern Recognition for Investment Decision Support in Taiwan Stock Market
In Taiwan stock market, it has been accumulated large amounts of time series stock data and successful investment strategies. The stock price, which is impacted by various factors, is the result of buyer-seller investment strategies. Since the stock price reflects numerous factors, its pattern can be described as the strategies of investors.
In this paper, pattern recognition concept is adapted to match the current stock price trend with the repeatedly appearing past price data. Accordingly, a new method is introduced in this research that extracting features quickly from stock time series chart to find out the most critical feature points. The matching can be processed via the corresponding information of the feature points. In other words, the goal is to seek for the historical repeatedly appearing patterns, namely the similar trend, offering the investors to make investment strategies
Discovering System Health Anomalies Using Data Mining Techniques
We present a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an Integrated System Health Monitoring system. We specifically treat the problem of discovering anomalous features in the time series that may be indicative of a system anomaly, or in the case of a manned system, an anomaly due to the human. Identification of these anomalies is crucial to building stable, reusable, and cost-efficient systems. The framework consists of an analysis platform and new algorithms that can scale to thousands of sensor streams to discovers temporal anomalies. We discuss the mathematical framework that underlies the system and also describe in detail how this framework is general enough to encompass both discrete and continuous sensor measurements. We also describe a new set of data mining algorithms based on kernel methods and hidden Markov models that allow for the rapid assimilation, analysis, and discovery of system anomalies. We then describe the performance of the system on a real-world problem in the aircraft domain where we analyze the cockpit data from aircraft as well as data from the aircraft propulsion, control, and guidance systems. These data are discrete and continuous sensor measurements and are dealt with seamlessly in order to discover anomalous flights. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications
Forecasting model for the change in stage of reservoir water level
Reservoir is one of major structural approaches for flood mitigation. During floods, early reservoir water release is one of the actions taken by the reservoir operator to accommodate incoming heavy rainfall. Late water release might give negative effect to the reservoir structure and cause flood at downstream area. However, current rainfall may not directly influence the change of reservoir water level. The delay may occur as the streamflow that carries the water might take some time to reach the reservoir. This study is aimed to develop a forecasting model for the change in stage
of reservoir water level. The model considers the changes of reservoir water level and its stage as the input and the future change in stage of reservoir water level as the output. In this study, the Timah Tasoh reservoir operational data was obtained from the Perlis Department of Irrigation and Drainage (DID). The reservoir water level
was categorised into stages based on DID manual. A modified sliding window algorithm has been deployed to segment the data into temporal patterns. Based on the patterns, three models were developed: the reservoir water level model, the change of reservoir water level and stage of reservoir water level model, and the combination of the change of reservoir water level and stage of reservoir water level model. All models were simulated using neural network and their performances were compared using on mean square error (MSE) and percentage of correctness. The result shows that the change of reservoir water level and stage of reservoir water
model produces the lowest MSE and the highest percentage of correctness when compared to the other two models. The findings also show that a delay of two previous days has affected the change in stage of reservoir water level. The model
can be applied to support early reservoir water release decision making. Thus, reduce the impact of flood at the downstream area
T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data
The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events
Minería de datos y lógica difusa.Una aplicación al estudio de la rentabilidad económica de las empresas agroalimentarias en Andalucía
En este trabajo se estudia la rentabilidad de la empresa agroalimentaria en Andalucía (España)mediante un conjunto de ratios, elaborados por el Instituto de Estadística de Andalucía a partir de la Central de Balances de Actividades Empresariales de Andalucía. El objeto es encontrar las características contables de las empresas más rentables. Los aspectos metodológicos que se contemplan en la aplicación comprenden algunas técnicas estadísticas avanzadas y nuevos métodos de extracción de conocimiento en grandes bases de datos (knowledge discovery y data mining). Las conclusiones a que se llega, expresadas en forma de reglas difusas obtenidas de la base de datos mediante la “teoría computacional de la percepción” (Zadeh, 2001; Last, Klein y Kandel, 2001), parecen plenamente congruentes con los postulados del análisis financiero.Si la rotación de activos es baja, no hay altas rentabilidades. Por el contrario, una importante rotación del activo, acompañada por una aceptable situación de liquidez, es lo que caracteriza a las empresas más rentables.Data mining, knowledge discovery, exploratory data analysis, neural networks, decision trees, accounting ratios, return of assets.