17 research outputs found

    The V- and W-operators in inverse resolutions

    Get PDF
    This article gives algorithms for V- and W-operators in inverse resolution. It discusses also the completeness of these algorithms

    Global discretization of continuous attributes as preprocessing for machine learning

    Get PDF
    AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

    Adaptive Language-based Mental Health Assessment with Item-Response Theory

    Full text link
    Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs

    Classifiers for modeling of mineral potential

    Get PDF
    [Extract] Classification and allocation of land-use is a major policy objective in most countries. Such an undertaking, however, in the face of competing demands from different stakeholders, requires reliable information on resources potential. This type of information enables policy decision-makers to estimate socio-economic benefits from different possible land-use types and then to allocate most suitable land-use. The potential for several types of resources occurring on the earth's surface (e.g., forest, soil, etc.) is generally easier to determine than those occurring in the subsurface (e.g., mineral deposits, etc.). In many situations, therefore, information on potential for subsurface occurring resources is not among the inputs to land-use decision-making [85]. Consequently, many potentially mineralized lands are alienated usually to, say, further exploration and exploitation of mineral deposits. Areas with mineral potential are characterized by geological features associated genetically and spatially with the type of mineral deposits sought. The term 'mineral deposits' means .accumulations or concentrations of one or more useful naturally occurring substances, which are otherwise usually distributed sparsely in the earth's crust. The term 'mineralization' refers to collective geological processes that result in formation of mineral deposits. The term 'mineral potential' describes the probability or favorability for occurrence of mineral deposits or mineralization. The geological features characteristic of mineralized land, which are called recognition criteria, are spatial objects indicative of or produced by individual geological processes that acted together to form mineral deposits. Recognition criteria are sometimes directly observable; more often, their presence is inferred from one or more geographically referenced (or spatial) datasets, which are processed and analyzed appropriately to enhance, extract, and represent the recognition criteria as spatial evidence or predictor maps. Mineral potential mapping then involves integration of predictor maps in order to classify areas of unique combinations of spatial predictor patterns, called unique conditions [51] as either barren or mineralized with respect to the mineral deposit-type sought

    Classification Based on both Attribute Value Weight and Tuple Weight under the Cloud Computing

    Get PDF
    In recent years, more and more people pay attention to cloud computing. Users need to deal with magnanimity data in the cloud computing environment. Classification can predict the need of users from large data in the cloud computing environment. Some traditional classification methods frequently adopt the following two ways. One way is to remove instance after it is covered by a rule, another way is to decrease tuple weight of instance after it is covered by a rule. The quality of these traditional classifiers may be not high. As a result, they cannot achieve high classification accuracy in some data. In this paper, we present a new classification approach, called classification based on both attribute value weight and tuple weight (CATW). CATW is distinguished from some traditional classifiers in two aspects. First, CATW uses both attribute value weight and tuple weight. Second, CATW proposes a new measure to select best attribute values and generate high quality classification rule set. Our experimental results indicate that CATW can achieve higher classification accuracy than some traditional classifiers

    AFRANCI : multi-layer architecture for cognitive agents

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 201

    Computer aided identification of biological specimens using self-organizing maps

    Get PDF
    For scientific or socio-economic reasons it is often necessary or desirable that biological material be identified. Given that there are an estimated 10 million living organisms on Earth, the identification of biological material can be problematic. Consequently the services of taxonomist specialists are often required. However, if such expertise is not readily available it is necessary to attempt an identification using an alternative method. Some of these alternative methods are unsatisfactory or can lead to a wrong identification. One of the most common problems encountered when identifying specimens is that important diagnostic features are often not easily observed, or may even be completely absent. A number of techniques can be used to try to overcome this problem, one of which, the Self Organizing Map (or SOM), is a particularly appealing technique because of its ability to handle missing data. This thesis explores the use of SOMs as a technique for the identification of indigenous trees of the Acacia species in KwaZulu-Natal, South Africa. The ability of the SOM technique to perform exploratory data analysis through data clustering is utilized and assessed, as is its usefulness for visualizing the results of the analysis of numerical, multivariate botanical data sets. The SOM’s ability to investigate, discover and interpret relationships within these data sets is examined, and the technique’s ability to identify tree species successfully is tested. These data sets are also tested using the C5 and CN2 classification techniques. Results from both these techniques are compared with the results obtained by using a SOM commercial package. These results indicate that the application of the SOM to the problem of biological identification could provide the start of the long-awaited breakthrough in computerized identification that biologists have eagerly been seeking.Dissertation (MSc)--University of Pretoria, 2011.Computer Scienceunrestricte

    Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems

    Get PDF
    This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to gain a deeper understanding of team learning. This work presents experimental results which illustrate the importance of heterogeneous teams of collaborative learning agents, as well as outlines heuristics which govern successful construction of teams of classifiers. A number of application domains are studied in this dissertation. One approach is focused on the effects of sharing knowledge and collaboration of multiple heterogeneous, intelligent agents (hardware or software) which work together to learn a task. As each agent employs a different machine learning technique, the system consists of multiple knowledge sources and their respective heterogeneous knowledge representations. Collaboration between agents involves sharing knowledge to both speed up team learning, as well as to refine the team's overall performance and group behavior. Experiments have been performed that vary the team composition in terms of machine learning algorithms, learning strategies employed by the agents, and sharing frequency for a predator-prey cooperative pursuit task. For lifelong learning, heterogeneous learning teams were more successful compared to homogeneous learning counterparts. Interestingly, sharing increased the learning rate, but sharing with higher frequency showed diminishing results. Lastly, knowledge conflicts are reduced over time, as more sharing takes place. These results support further investigation of the merits of heterogeneous learning. This dissertation also focuses on discovering heuristics for constructing successful teams of heterogeneous classifiers, including many aspects of team learning and collaboration. In one application, multi-agent machine learning and classifier combination are utilized to learn rock facies sequences from wireline well log data. Gas and oil reservoirs have been the focus of modeling efforts for many years as an attempt to locate zones with high volumes. Certain subsurface layers and layer sequences, such as those containing shale, are known to be impermeable to gas and/or liquid. Oil and natural gas then become trapped by these layers, making it possible to drill wells to reach the supply, and extract for use. The drilling of these wells, however, is costly. Here, the focus is on how to construct a successful set of classifiers, which periodically collaborate, to increase the classification accuracy. Utilizing multiple, heterogeneous collaborative learning agents is shown to be successful for this classification problem. We were able to obtain 84.5% absolute accuracy using the Multi-Agent Collaborative Learning Architecture, an improvement of about 6.5% over the best results achieved by Kansas Geological Survey with the same data set. Several heuristics are presented for constructing teams of multiple collaborative classifiers for predicting rock facies. Another application utilizes multi-agent machine learning and classifier combination to learn water presence using airborne polar radar data acquired from Greenland in 1999 and 2007. Ground and airborne depth-soundings of the Greenland and Antarctic ice sheets have been used for many years to determine characteristics such as ice thickness, subglacial topography, and mass balance of large bodies of ice. Ice coring efforts have supported these radar data to provide ground truth for validation of the state (wet or frozen) of the interface between the bottom of the ice sheet and the underlying bedrock. Subglacial state governs the friction, flow speed, transport of material, and overall change of the ice sheet. In this dissertation, we focus on how to construct a successful set of classifiers which periodically collaborate to increase classification accuracy. The underlying method results in radar independence, allowing model transfer from 1999 to 2007 to produce water presence maps of the Greenland ice sheet with differing radars. We were able to obtain 86% accuracy using the Multi-Agent Collaborative Learning Architecture with this data set. Utilizing multiple, heterogeneous collaborative learning agents is shown to be successful for this classification problem as well. Several heuristics, some of which agree with those found in the other applications, are presented for constructing teams of multiple collaborative classifiers for predicting subglacial water presence. General findings from these different experiments suggest that constructing a team of classifiers using a heterogeneous mixture of homogeneous teams is preferred. Larger teams generally perform better, as decisions from multiple learners can be combined to arrive at a consensus decision. Employing heterogeneous learning algorithms integrates different error models to arrive at higher accuracy classification from complementary knowledge bases. Collaboration, although not found to be universally useful, offers certain team configurations an advantage. Collaboration with low to medium frequency was found to be beneficial, while high frequency collaboration was found to be detrimental to team classification accuracy. Full mode learning, where each learner receives the entire training set for the learning phase, consistently outperforms independent mode learning, where the training set is distributed to all learners in a team in a non-overlapping fashion. Results presented in this dissertation support the application of multi-agent machine learning and collaboration to current challenging, real-world classification problems

    Language-independent pre-processing of large document bases for text classification

    Get PDF
    Text classification is a well-known topic in the research of knowledge discovery in databases. Algorithms for text classification generally involve two stages. The first is concerned with identification of textual features (i.e. words andlor phrases) that may be relevant to the classification process. The second is concerned with classification rule mining and categorisation of "unseen" textual data. The first stage is the subject of this thesis and often involves an analysis of text that is both language-specific (and possibly domain-specific), and that may also be computationally costly especially when dealing with large datasets. Existing approaches to this stage are not, therefore, generally applicable to all languages. In this thesis, we examine a number of alternative keyword selection methods and phrase generation strategies, coupled with two potential significant word list construction mechanisms and two final significant word selection mechanisms, to identify such words andlor phrases in a given textual dataset that are expected to serve to distinguish between classes, by simple, language-independent statistical properties. We present experimental results, using common (large) textual datasets presented in two distinct languages, to show that the proposed approaches can produce good performance with respect to both classification accuracy and processing efficiency. In other words, the study presented in this thesis demonstrates the possibility of efficiently solving the traditional text classification problem in a language-independent (also domain-independent) manner
    corecore