42,671 research outputs found

    Rough sets approach to symbolic value partition

    Get PDF
    AbstractIn data mining, searching for simple representations of knowledge is a very important issue. Attribute reduction, continuous attribute discretization and symbolic value partition are three preprocessing techniques which are used in this regard. This paper investigates the symbolic value partition technique, which divides each attribute domain of a data table into a family for disjoint subsets, and constructs a new data table with fewer attributes and smaller attribute domains. Specifically, we investigates the optimal symbolic value partition (OSVP) problem of supervised data, where the optimal metric is defined by the cardinality sum of new attribute domains. We propose the concept of partition reducts for this problem. An optimal partition reduct is the solution to the OSVP-problem. We develop a greedy algorithm to search for a suboptimal partition reduct, and analyze major properties of the proposed algorithm. Empirical studies on various datasets from the UCI library show that our algorithm effectively reduces the size of attribute domains. Furthermore, it assists in computing smaller rule sets with better coverage compared with the attribute reduction approach

    Identifying Effective Features and Classifiers for Short Term Rainfall Forecast Using Rough Sets Maximum Frequency Weighted Feature Reduction Technique

    Get PDF
    Precise rainfall forecasting is a common challenge across the globe in meteorological predictions. As rainfall forecasting involves rather complex dynamic parameters, an increasing demand for novel approaches to improve the forecasting accuracy has heightened. Recently, Rough Set Theory (RST) has attracted a wide variety of scientific applications and is extensively adopted in decision support systems. Although there are several weather prediction techniques in the existing literature, identifying significant input for modelling effective rainfall prediction is not addressed in the present mechanisms. Therefore, this investigation has examined the feasibility of using rough set based feature selection and data mining methods, namely Naïve Bayes (NB), Bayesian Logistic Regression (BLR), Multi-Layer Perceptron (MLP), J48, Classification and Regression Tree (CART), Random Forest (RF), and Support Vector Machine (SVM), to forecast rainfall. Feature selection or reduction process is a process of identifying a significant feature subset, in which the generated subset must characterize the information system as a complete feature set. This paper introduces a novel rough set based Maximum Frequency Weighted (MFW) feature reduction technique for finding an effective feature subset for modelling an efficient rainfall forecast system. The experimental analysis and the results indicate substantial improvements of prediction models when trained using the selected feature subset. CART and J48 classifiers have achieved an improved accuracy of 83.42% and 89.72%, respectively. From the experimental study, relative humidity2 (a4) and solar radiation (a6) have been identified as the effective parameters for modelling rainfall prediction

    Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection

    Get PDF
    This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.Ministry of Higher Education under the Fundamental Research Grant Scheme (FRGS/1/2018/ICT04/UTM/01/1)Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, Malaysia Research University Network (MRUN) Vot 4L876SPEV project, University of Hradec Kralove, Faculty of Informatics and Management, Czech Republic (ID: 2102–2021), “Smart Solutions in Ubiquitous Computing Environments

    Rough Sets: a Bibliometric Analysis from 2014 to 2018

    Get PDF
    Along almost forty years, considerable research has been undertaken on rough set theory to deal with vague information. Rough sets have proven to be extremely helpful for a diversity of computer-science problems (e.g., knowledge discovery, computational logic, machine learning, etc.), and numerous application domains (e.g., business economics, telecommunications, neurosciences, etc.). Accordingly, the literature on rough sets has grown without ceasing, and nowadays it is immense. This paper provides a comprehensive overview of the research published for the last five years. To do so, it analyzes 4,038 records retrieved from the Clarivate Web of Science database, identifying (i) the most prolific authors and their collaboration networks, (ii) the countries and organizations that are leading research on rough sets, (iii) the journals that are publishing most papers, (iv) the topics that are being most researched, and (v) the principal application domains

    Attributes and weights in health care priority setting: a systematic review of what counts and to what extent

    Get PDF
    In most societies resources are insufficient to provide everyone with all the health care they want. In practice, this means that some people are given priority over others. On what basis should priority be given? In this paper we are interested in the general public's views on this question. We set out to synthesis what the literature has found as a whole regarding which attributes or factors the general public think should count in priority setting and what weight they should receive. A systematic review was undertaken (in August 2014) to address these questions based on empirical studies that elicited stated preferences from the general public. Sixty four studies, applying eight methods, spanning five continents met the inclusion criteria. Discrete Choice Experiment (DCE) and Person Trade-off (PTO) were the most popular standard methods for preference elicitation, but only 34% of all studies calculated distributional weights, mainly using PTO. While there is heterogeneity, results suggest the young are favoured over the old, the more severely ill are favoured over the less severely ill, and people with self-induced illness or high socioeconomic status tend to receive lower priority. In those studies that considered health gain, larger gain is universally preferred, but at a diminishing rate. Evidence from the small number of studies that explored preferences over different components of health gain suggests life extension is favoured over quality of life enhancement; however this may be reversed at the end of life. The majority of studies that investigated end of life care found weak/no support for providing a premium for such care. The review highlights considerable heterogeneity in both methods and results. Further methodological work is needed to achieve the goal of deriving robust distributional weights for use in health care priority setting.12 page(s

    Role based behavior analysis

    Get PDF
    Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condições que se alteram rapidamente. Dois requisitos para esse sucesso são trabalhadores proactivos e uma infra-estrutura ágil de Tecnologias de Informacão/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nível da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizações. Além disso, se não houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderá ser utilizada de forma ineficiente, com excessos em algumas áreas e deficiências noutras. Finalmente, incentivar a proactividade não implica acesso completo e sem restrições, uma vez que pode deixar os sistemas vulneráveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese é desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo é iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com características semelhantes são agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais são comprados com os perfis de grupo, e os que diferem significativamente são marcados como anomalias para análise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda métodos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicações. Os resultados confirmam que os grupos projectam com precisão comportamento semelhante. Além disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos são assinalados para posterior análise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it

    Designing IS service strategy: an information acceleration approach

    Get PDF
    Information technology-based innovation involves considerable risk that requires insight and foresight. Yet, our understanding of how managers develop the insight to support new breakthrough applications is limited and remains obscured by high levels of technical and market uncertainty. This paper applies a new experimental method based on “discrete choice analysis” and “information acceleration” to directly examine how decisions are made in a way that is behaviourally sound. The method is highly applicable to information systems researchers because it provides relative importance measures on a common scale, greater control over alternate explanations and stronger evidence of causality. The practical implications are that information acceleration reduces the levels of uncertainty and generates a more accurate rationale for IS service strategy decisions

    ANALYSIS OF ALTERNATIVE MANUFACTURING PROCESSES FOR LIGHTWEIGHT BIW DESIGNS, USING ANALYTICAL HIERARCHY PROCESS

    Get PDF
    The main objective of the analysis was to investigate the forming of Body in White (BIW) panels using alternative processes most suitable for replacing the conventional press working process in order to achieve a reduction in the total mass of the vehicle body structure. The selection of the alternatives was guided by multi criteria decision making tool, the Analytic Hierarchy Process (AHP). Here the alternatives were selected based on their relative importance to the different manufacturing attributes considered. The selected processes were applied to the manufacturing of different parts of BIW indicated in the BOM along with suggestion of the appropriate material to be used

    Air traffic flow management regulations: big data analytics

    Get PDF
    Air traffic in Europe is constantly increasing. Due to this, Air Traffic Management is getting more complex and all stakeholders get affected by that. Among these, air traffic controllers are the ones that suffer the biggest impact in terms of overload of work. Every day, a set of regulations occurs in the regions controlled by these operators, which provokes delays on ground and rerouting in mid-air. All of these variations directly affect the entire ATM network and translates into big expenses for passengers and airlines. With this project, the aim is to predict these daily contingencies by using big data analysis models, so that costs associated are reduced. Most of the information needed to run the analysis has been very complicated to extract, process and correlate because the data sources are not open to researchers. Therefore, the number of instances available for the prediction is very low (only 18 months of data). Nevertheless, while working with this limitation, a Naive Bayes classifier has been chosen as the analytical algorithm. In terms of results, the work done does not reveal a high predictive capability due to the amount of data acquired and the simplicity of the temporal variables. This suggests that, in future researches, it could be convenient to intake broader historical data (more years). Moreover, more complex predictive models could be implemented if variables coming from the weather or the number of flights are used.Ingeniería Aeroespacial (Plan 2010
    corecore