244 research outputs found

    Discovering Knowledge through Highly Interactive Information Based Systems

    Get PDF
    [EN] The new Internet era has increased a production of digital data. The mankind had an easy way to the knowledge access never before, but at the same time the rapidly increasing rate of new data, the ease of duplication and transmission of these data across the Net, the new available channels for information dissemination, the large amounts of historical data, questionable quality of the existing data and so on are issues for information overload that causes more difficult to make decision using the right data. Soft-computing techniques for decision support systems and business intelligent systems present pretty interesting and necessary solutions for data management and supporting decision-making processes, but the last step at the decision chain is usually supported by a human agent that has to process the system outcomes in form of reports or visualizations. These kinds of information representations are not enough to make decisions because of behind them could be hidden information patterns that are not obvious for automatic data processing and humans must interact with these data representation in order to discover knowledge. According to this, the current special issue is devoted to present nine experiences that combine visualization and visual analytics techniques, data mining methods, intelligent recommendation agents, user centered evaluation and usability patterns, etc. in interactive systems as a key issue for knowledge discovering in advanced and emerging information systems.[ES] La nueva era de Internet ha aumentado la producción de datos digitales. Nunca nates la humanidad ha tenido una manera más fácil el acceso a los conocimientos, pero al mismo tiempo el rápido aumento de la tasa de nuevos datos, la facilidad de duplicación y transmisión de estos datos a través de la red, los nuevos canales disponibles para la difusión de información, las grandes cantidades de los datos históricos, cuestionable calidad de los datos existentes y así sucesivamente, son temas de la sobrecarga de información que hace más difícil tomar decisiones con la información correcta. Técnicas de Soft-computing para los sistemas de apoyo a las decisiones y sistemas inteligentes de negocios presentan soluciones muy interesantes y necesarias para la gestión de datos y procesos de apoyo a la toma de decisiones, pero el último paso en la cadena de decisiones suele ser apoyados por un agente humano que tiene que procesar los resultados del sistema de en forma de informes o visualizaciones. Este tipo de representaciones de información no son suficientes para tomar decisiones debido detrás de ellos podrían ser patrones de información ocultos que no son obvios para el procesamiento automático de datos y los seres humanos deben interactuar con estos representación de datos con el fin de descubrir el conocimiento. De acuerdo con esto, el presente número especial está dedicado a nueve experiencias actuales que combinan técnicas de visualización y de análisis visual, métodos de minería de datos, agentes de recomendación inteligentes y evaluación centrada en el usuario y patrones de usabilidad, etc. En sistemas interactivos como un tema clave para el descubrimiento de conocimiento en los sistemas de información avanzados y emergentes

    An Adaptive Flex-Deluge Approach to University Exam Timetabling

    Get PDF

    Binary Black Widow Optimization Algorithm for Feature Selection Problems

    Get PDF
    This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. Recently, a new metaheuristic algorithm, known as the Black Widow Optimization (BWO) algorithm, had great results when applied to a range of daunting design problems in the field of engineering, and has not yet been applied to FS problems. In this thesis, we are proposing a modified Binary Black Widow Optimization (BBWO) algorithm to solve FS problems. The FS evaluation method used in this study is the wrapper method, designed to keep a degree of balance between two significant processes: (i) minimize the number of selected features (ii) maintain a high level of accuracy. To achieve this, we have used the k-nearest-neighbor (KNN) machine learning algorithm in the learning stage intending to evaluate the accuracy of the solutions generated by the (BBWO). The proposed method is applied to twenty-eight public datasets provided by UCI. The results are then compared with up-to-date FS algorithms. Our results show that the BBWO works as good as, or even better in some cases, when compared to those FS algorithms. However, the results also show that the BBWO faces the problem of slow convergence due to the use of a population of solutions and the lack of local exploitation. To further improve the exploitation process and enhance the BBWO’s performance, we are proposing an improvement to the BBWO algorithm by combining it with a local metaheuristic algorithm based on the hill-climbing algorithm (HCA). This improvement method (IBBWO) is also tested on the twenty-eight datasets provided by UCI and the results are then compared with the basic BBWO and the up-to-date FS algorithms. Results show that the (IBBWO) produces better results in most cases when compared to basic BBWO. The results also show that IBBWO outperforms the most known FS algorithms in many cases

    Soil Property and Class Maps of the Conterminous US at 100 meter Spatial Resolution based on a Compilation of National Soil Point Observations and Machine Learning

    Full text link
    With growing concern for the depletion of soil resources, conventional soil data must be updated to support spatially explicit human-landscape models. Three US soil point datasetswere combined with a stack of over 200 environmental datasets to generate complete coverage gridded predictions at 100 m spatial resolution of soil properties (percent organic C, total N, bulk density, pH, and percent sand and clay) and US soil taxonomic classes (291 great groups and 78 modified particle size classes) for the conterminous US. Models were built using parallelized random forest and gradient boosting algorithms. Soil property predictions were generated at seven standard soil depths (0, 5, 15, 30, 60, 100 and 200 cm). Prediction probability maps for US soil taxonomic classifications were also generated. Model validation results indicate an out-of-bag classification accuracy of 60 percent for great groups, and 66 percent for modified particle size classes; for soil properties cross-validated R-square ranged from 62 percent for total N to 87 percent for pH. Nine independent validation datasets were used to assess prediction accuracies for soil class models and results ranged between 24-58 percent and 24-93 percent for great group and modified particle size class prediction accuracies, respectively. The hybrid "SoilGrids+" modeling system that incorporates remote sensing data, local predictions of soil properties, conventional soil polygon maps, and machine learning opens the possibility for updating conventional soil survey data with machine learning technology to make soil information easier to integrate with spatially explicit models, compared to multi-component map units.Comment: Submitted to Soil Science Society of America Journal, 40 pages, 12 figures, 3 table

    A hybrid kidney algorithm strategy for combinatorial interaction testing problem

    Get PDF
    Combinatorial Interaction Testing (CIT) generates a sampled test case set (Final Test Suite (FTS)) instead of all possible test cases. Generating the FTS with the optimum size is a computational optimization problem (COP) as well as a Non-deterministic Polynomial hard (NP-hard) problem. Recent studies have implemented hybrid metaheuristic algorithms as the basis for CIT strategy. However, the existing hybrid metaheuristic-based CIT strategies generate a competitive FTS size, there is no single CIT strategy can overcome others existing in all cases. In addition, the hybrid metaheuristic-based CIT strategies require more execution time than their own original algorithm-based strategies. Kidney Algorithm (KA) is a recent metaheuristic algorithm and has high efficiency and performance in solving different optimization problems against most of the state-of-the-art of metaheuristic algorithms. However, KA has limitations in the exploitation and exploration processes as well as the balancing control process is needed to be improved. These shortages cause KA to fail easily into the local optimum. This study proposes a low-level hybridization of KA with the mutation operator and improve the filtration process in KA to form a recently Hybrid Kidney Algorithm (HKA). HKA addresses the limitations in KA by improving the algorithm's exploration and exploitation processes by hybridizing KA with mutation operator, and improve the balancing control process by enhancing the filtration process in KA. HKA improves the efficiency in terms of generating an optimum FTS size and enhances the performance in terms of the execution time. HKA has been adopted into the CIT strategy as HKA based CIT Strategy (HKAS) to generate the most optimum FTS size. The results of HKAS shows that HKAS can generate the optimum FTS size in more than 67% of the benchmarking experiments as well as contributes by 34 new optimum size of FTS. HKAS also has better efficiency and performance than KAS. HKAS is the first hybrid metaheuristic-based CIT strategy that generates an optimum FTS size with less execution time than the original algorithm-based CIT strategy. Apart from supporting different CIT features: uniform/VS CIT, IOR CIT as well as the interaction strength up to 6, this study also introduces another recently variant of KA which are Improved KA (IKA) and Mutation KA (MKA) as well as new CIT strategies which are IKA-based (IKAS) and MKA-based (MKAS)

    Decision support continuum paradigm for cardiovascular disease: Towards personalized predictive models

    Get PDF
    Clinical decision making is a ubiquitous and frequent task physicians make in their daily clinical practice. Conventionally, physicians adopt a cognitive predictive modelling process (i.e. knowledge and experience learnt from past lecture, research, literature, patients, etc.) for anticipating or ascertaining clinical problems based on clinical risk factors that they deemed to be most salient. However, with the inundation of health data and the confounding characteristics of diseases, more effective clinical prediction approaches are required to address these challenges. Approximately a few century ago, the first major transformation of medical practice took place as science-based approaches emerged with compelling results. Now, in the 21st century, new advances in science will once again transform healthcare. Data science has been postulated as an important component in this healthcare reform and has received escalating interests for its potential for ‘personalizing’ medicine. The key advantages of having personalized medicine include, but not limited to, (1) more effective methods for disease prevention, management and treatment, (2) improved accuracy for clinical diagnosis and prognosis, (3) provide patient-oriented personal health plan, and (4) cost containment. In view of the paramount importance of personalized predictive models, this thesis proposes 2 novel learning algorithms (i.e. an immune-inspired algorithm called the Evolutionary Data-Conscious Artificial Immune Recognition System, and a neural-inspired algorithm called the Artificial Neural Cell System for classification) and 3 continuum-based paradigms (i.e. biological, time and age continuum) for enhancing clinical prediction. Cardiovascular disease has been selected as the disease under investigation as it is an epidemic and major health concern in today’s world. We believe that our work has a meaningful and significant impact to the development of future healthcare system and we look forward to the wide adoption of advanced medical technologies by all care centres in the near future.Open Acces

    Systems Engineering

    Get PDF
    The book "Systems Engineering: Practice and Theory" is a collection of articles written by developers and researches from all around the globe. Mostly they present methodologies for separate Systems Engineering processes; others consider issues of adjacent knowledge areas and sub-areas that significantly contribute to systems development, operation, and maintenance. Case studies include aircraft, spacecrafts, and space systems development, post-analysis of data collected during operation of large systems etc. Important issues related to "bottlenecks" of Systems Engineering, such as complexity, reliability, and safety of different kinds of systems, creation, operation and maintenance of services, system-human communication, and management tasks done during system projects are addressed in the collection. This book is for people who are interested in the modern state of the Systems Engineering knowledge area and for systems engineers involved in different activities of the area. Some articles may be a valuable source for university lecturers and students; most of case studies can be directly used in Systems Engineering courses as illustrative materials

    Machine learning in astronomy

    Get PDF
    The search to find answers to the deepest questions we have about the Universe has fueled the collection of data for ever larger volumes of our cosmos. The field of supernova cosmology, for example, is seeing continuous development with upcoming surveys set to produce a vast amount of data that will require new statistical inference and machine learning techniques for processing and analysis. Distinguishing between real objects and artefacts is one of the first steps in any transient science pipeline and, currently, is still carried out by humans - often leading to hand scanners having to sort hundreds or thousands of images per night. This is a time-consuming activity introducing human biases that are extremely hard to characterise. To succeed in the objectives of future transient surveys, the successful substitution of human hand scanners with machine learning techniques for the purpose of this artefact-transient classification therefore represents a vital frontier. In this thesis we test various machine learning algorithms and show that many of them can match the human hand scanner performance in classifying transient difference g, r and i-band imaging data from the SDSS-II SN Survey into real objects and artefacts. Using principal component analysis and linear discriminant analysis, we construct a grand total of 56 feature sets with which to train, optimise and test a Minimum Error Classifier (MEC), a naive Bayes classifier, a k-Nearest Neighbours (kNN) algorithm, a Support Vector Machine (SVM) and the SkyNet artificial neural network

    Restoration and Domain Adaptation for Unconstrained Face Recognition

    Get PDF
    Face recognition (FR) has received great attention and tremendous progress has been made during the past two decades. While FR at close range under controlled acquisition conditions has achieved a high level of performance, FR at a distance under unconstrained environment remains a largely unsolved problem. This is because images collected from a distance usually suffer from blur, poor illumination, pose variation etc. In this dissertation, we present models and algorithms to compensate for these variations to improve the performance for FR at a distance. Blur is a common factor contributing to the degradation of images collected from a distance, e.g., defocus blur due to long range acquisition, motion blur due to movement of subjects. For this purpose, we study the image deconvolution problem. This is an ill-posed problem, and solutions are usually obtained by exploiting prior information of desired output image to reduce ambiguity, typically through the Bayesian framework. In this dissertation, we consider the role of an example driven manifold prior to address the deconvolution problem. Specifically, we incorporate unlabeled image data of the object class in the form of a patch manifold to effectively regularize the inverse problem. We propose both parametric and non-parametric approaches to implicitly estimate the manifold prior from the given unlabeled data. Extensive experiments show that our method performs better than many competitive image deconvolution methods. More often, variations from the collected images at a distance are difficult to address through physical models of individual degradations. For this problem, we utilize domain adaptation methods to adapt recognition systems to the test data. Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain. We focus on the unsupervised domain adaptation problem where labeled data are not available in the target domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross domain recognition. Experimental results on publicly available datasets demonstrate the effectiveness of our approach for face recognition across pose, blur and illumination variations, and cross dataset object classification. Most existing domain adaptation methods assume homogeneous source domain which is usually modeled by a single subspace. Yet in practice, oftentimes we are given mixed source data with different inner characteristics. Modeling these source data as a single domain would potentially deteriorate the adaptation performance, as the adaptation procedure needs to account for the large within class variations in the source domain. For this problem, we propose two approaches to mitigate the heterogeneity in source data. We first present an approach for selecting a subset of source samples which is more similar to the target domain to avoid negative knowledge transfer. We then consider the scenario that the heterogenous source data are due to multiple latent domains. For this purpose, we derive a domain clustering framework to recover the latent domains for improved adaptation. Moreover, we formulate submodular objective functions which can be solved by an efficient greedy method. Experimental results show that our approaches compare favorably with the state-of-the-art
    corecore