15 research outputs found

    Various aspects of vehicles image data-streams reduction for road traffic sufficient description

    Get PDF
    The on-line image processing was implemented for video-camera usage for traffic control. Due to reduce the immense data sets dimension various speculations of data sampling methods were introduced. At the beginning the needed sampling ratio has been found then simple but effective image processing algorithms have to be chosen, finally the hardware solutions for parallel processing are discussed. The PLA computing engine was involved for coping with this task; for fulfilling the assumed characteristics. The developer has to consider several restrictions and preferences. None universal algorithm is available up to now. The reported works, concern vehicles stream recorders development that has to do all recording and computing procedures in strictly defined time limits

    Discretization of Continuous Attributes

    No full text
    7 pagesIn the data mining field, many learning methods -like association rules, Bayesian networks, induction rules (Grzymala-Busse & Stefanowski, 2001)- can handle only discrete attributes. Therefore, before the machine learning process, it is necessary to re-encode each continuous attribute in a discrete attribute constituted by a set of intervals, for example the age attribute can be transformed in two discrete values representing two intervals: less than 18 (a minor) and 18 and more (of age). This process, known as discretization, is an essential task of the data preprocessing, not only because some learning methods do not handle continuous attributes, but also for other important reasons: the data transformed in a set of intervals are more cognitively relevant for a human interpretation (Liu, Hussain, Tan & Dash, 2002); the computation process goes faster with a reduced level of data, particularly when some attributes are suppressed from the representation space of the learning problem if it is impossible to find a relevant cut (Mittal & Cheong, 2002); the discretization can provide non-linear relations -e.g., the infants and the elderly people are more sensitive to illness

    A Comparison of Four Approaches to Discretization Based on Entropy †

    Get PDF
    We compare four discretization methods, all based on entropy: the original C4.5 approach to discretization, two globalized methods, known as equal interval width and equal frequency per interval, and a relatively new method for discretization called multiple scanning using the C4.5 decision tree generation system. The main objective of our research is to compare the quality of these four methods using two criteria: an error rate evaluated by ten-fold cross-validation and the size of the decision tree generated by C4.5. Our results show that multiple scanning is the best discretization method in terms of the error rate and that decision trees generated from datasets discretized by multiple scanning are simpler than decision trees generated directly by C4.5 or generated from datasets discretized by both globalized discretization methods

    Merging of Numerical Intervals in Entropy-Based Discretization

    Get PDF
    As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches

    Sustainability in Peripheral and Ultra-Peripheral Rural Areas through a Multi-Attribute Analysis: The Case of the Italian Insular Region

    Get PDF
    Italy has adopted the strategy of inner areas, mainly based on physical distance from public services. The strategy promotes a multi-level and multi-fund governance approach and the local partnership of mayors. Our paper focuses on rural areas, identified by the national strategy of inner areas, as peripheral and ultra-peripheral, in the Italian insular region (Sicily and Sardinia). It analyzes, at the municipality level, socio-demographic, economic, and environmental sustainability using appropriate indicators. Aiming at discovering the underlying relationship portrayed by multi-attribute data in an information system, we applied rough set theory. The inductive decision rules obtained through this data mining methodology reveal the simultaneous presence or absence of important characteristics aiming at reaching different levels of sustainability. Without the requirement of statistical assumptions regarding data distribution or structures for collecting data, such as functions or equations, this method ensures the description of patterns exhibited by data. Of particular interest is the assessment of conditional attributes (i.e., the selected indicators), and the information connecting them to sustainability, as a decision attribute. The most important result is rule generation, specifically, decision rules that are able to suggest tools for policy makers at different levels

    Discretisation of conditions in decision rules induced for continuous

    Get PDF
    Typically discretisation procedures are implemented as a part of initial pre-processing of data, before knowledge mining is employed. It means that conclusions and observations are based on reduced data, as usually by discretisation some information is discarded. The paper presents a different approach, with taking advantage of discretisation executed after data mining. In the described study firstly decision rules were induced from real-valued features. Secondly, data sets were discretised. Using categories found for attributes, in the third step conditions included in inferred rules were translated into discrete domain. The properties and performance of rule classifiers were tested in the domain of stylometric analysis of texts, where writing styles were defined through quantitative attributes of continuous nature. The performed experiments show that the proposed processing leads to sets of rules with significantly reduced sizes while maintaining quality of predictions, and allows to test many data discretisation methods at the acceptable computational costs
    corecore