8 research outputs found

    Distance Functions and Attribute Weighting in a K-Nearest Neighbors Classifier

    Get PDF
    To assess environmental health of a stream, field, or other ecological object, characteristics of that object should be compared to a set of reference objects known to be healthy. Using streams as objects, we propose a k-nearest neighbors algorithm (Bates Prins and Smith, 2006) to find the appropriate set of reference streams to use as a comparison set for any given test stream. Previously, investigations of the k-nearest neighbors algorithm have utilized a variety of distance functions, the best of which has been the Interpolated Value Difference Metric (IVDM), proposed by Wilson and Martinez (1997). We propose two alternatives to the IVDM: Wilson and Martinez\u27s Windowed Value Difference Metric (WVDM) and the Density-Based Value Difference Metric (DBVDM) developed by Wojna (2005). We extend the WVDM and DBVDM to handle continuous response variables and compare these distance measures to the IVDM within the ecological k-nearest neighbors context. Additionally, we compare two existing attribute weighting schemes (Wojna 2005) when applied to the IVDM, WVDM, and DBVDM, and we propose a new attribute weighting method for use with these distance functions as well. In assessing environmental impairment, the WVDM and DBVDM were slight improvements over the IVDM. Attribute weighting also increased the effectiveness of the k-nearest neighbors algorithm in this ecological setting. This research was supported by NSF grant NSF-DMS 0552577 and was conducted during an 8-week summer research experience for undergraduates (REU)

    Survey of Rough and Fuzzy Hybridization

    Get PDF
    In this research existing barriers and the influence of product’s functional lifecycle on the adoption of circular revenue models in the civil and non-residential building sector was investigated. A revenue model, i.e. how revenues are generated in a business model, becomes circular if it is used to extend producer responsibility to create financial incentives for producers to benefit from making their product more circular. For example, leasing or a buy-back scheme in theory creates an incentive for producers to, amongst others, make the product last longer, to be maintained more easily and to be returned. In the Dutch national policy documents there is a call for the development of circular revenue models to extend producer responsibility in the construction sector, as the construction sector is highlighted as a key sector in terms of environmental impact. Adopting circular revenue models in the construction has so far not been research, however expectations about barriers towards adopting circular revenue models can be derived from related literature. The civil and non-residential building sub-sector of the construction sector is of special interest as this subsector has specific characteristics that were expected to create barriers towards adopting circular revenue models: ownership rights and the long functional lifecycle of products (e.g. buildings). This led to the main research question: “What are the barriers to the adoption of circular revenue models in the civil- and non-residential building sector?” The long functional lifecycle of buildings is of special interest as literature suggests that buildings are made from products with different functional lifecycles. This led to led to an additional sub question: “What is the influence of product’s functional lifecycle on the adoption of circular revenue models in the civil and non-residential building sector?” To answer both research questions, the research was split up into three phases. First, semi-structured interviews were held with practitioners, e.g. companies that have adopted, or are working on adopting, circular revenue models. Based upon the results, a second round of interviews was held with experts to better understand the barriers and gather more in-depth insights. The topics chosen for this round were based on the results from the practitioners. The third research phase was a focus group session held primarily with respondents from the expert and practitioner interviews. During the focus group preliminary results were presented and several topics were discussed. During this research 25 barriers, such as a maximum duration for contracts, short-term thinking and the adoption of measurement methods, towards adopting circular revenue models in the civil and non-residential building sector were found, which fit under five main categories in order of importance: financial, sector-specific, regulatory, organisational and technical barriers. Furthermore, seven additional barriers were found when adopting circular revenue models in which producers retain ownership. This shows that there are many barriers that hinder the adoption of circular revenue models in the civil and non-residential building sector, especially when adopting circular revenue models where producers retain ownership. Furthermore, during this research it was found that the shorter the functional lifecycle of building layers, the more easy the adoption of circular revenue models becomes, because, amongst others, financing for longer that 15 years is difficult and two parties to not like to be mutually dependents upon each other over long time periods. In increasing order of difficulty circular revenue models can be adopted to the building layers with longer functional lifecycles: space plan, services, skin and structure. During the research a consensus amongst respondents was identified that circular revenue models should not be adopted to the structure, as the functional lifecycle was too long. In addition to the functional lifecycle, four additional variables were identified that emphasise why the adoption of circular revenue models to building layers with shorter functional lifecycles is more interesting: ratio CAPEX/OPEX, flexibility of products, focus on investor or user and complexity of products

    Algorytmy uczenia się relacji podobieństwa z wielowymiarowych zbiorów danych

    Get PDF
    The notion of similarity plays an important role in machine learning and artificial intelligence. It is widely used in tasks related to a supervised classification, clustering, an outlier detection and planning. Moreover, in domains such as information retrieval or case-based reasoning, the concept of similarity is essential as it is used at every phase of the reasoning cycle. The similarity itself, however, is a very complex concept that slips out from formal definitions. A similarity of two objects can be different depending on a considered context. In many practical situations it is difficult even to evaluate the quality of similarity assessments without considering the task for which they were performed. Due to this fact the similarity should be learnt from data, specifically for the task at hand. In this dissertation a similarity model, called Rule-Based Similarity, is described and an algorithm for constructing this model from available data is proposed. The model utilizes notions from the rough set theory to derive a similarity function that allows to approximate the similarity relation in a given context. The construction of the model starts from the extraction of sets of higher-level features. Those features can be interpreted as important aspects of the similarity. Having defined such features it is possible to utilize the idea of Tversky’s feature contrast model in order to design an accurate and psychologically plausible similarity function for a given problem. Additionally, the dissertation shows two extensions of Rule-Based Similarity which are designed to efficiently deal with high dimensional data. They incorporate a broader array of similarity aspects into the model. In the first one it is done by constructing many heterogeneous sets of features from multiple decision reducts. To ensure their diversity, a randomized reduct computation heuristic is proposed. This approach is particularly well-suited for dealing with the few-objects-many-attributes problem, e.g. the analysis of DNA microarray data. A similar idea can be utilized in the text mining domain. The second of the proposed extensions serves this particular purpose. It uses a combination of a semantic indexing method and an information bireducts computation technique to represent texts by sets of meaningful concepts. The similarity function of the proposed model can be used to perform an accurate classification of previously unseen objects in a case-based fashion or to facilitate clustering of textual documents into semantically homogeneous groups. Experiments, whose results are also presented in the dissertation, show that the proposed models can successfully compete with the state-of-the-art algorithms.Pojęcie podobieństwa pełni istotną rolę w dziedzinach uczenia maszynowego i sztucznej inteligencji. Jest ono powszechnie wykorzystywane w zadaniach dotyczących nadzorowanej klasyfikacji, grupowania, wykrywania nietypowych obiektów oraz planowania. Ponadto w dziedzinach takich jak wyszukiwanie informacji (ang. information retrieval) lub wnioskowanie na podstawie przykładów (ang. case-based reasoning) pojęcie podobieństwa jest kluczowe ze względu na jego obecność na wszystkich etapach wyciągania wniosków. Jednakże samo podobieństwo jest pojęciem niezwykle złożonym i wymyka się próbom ścisłego zdefiniowania. Stopień podobieństwa między dwoma obiektami może być różny w zależności od kontekstu w jakim się go rozpatruje. W praktyce trudno jest nawet ocenić jakość otrzymanych stopni podobieństwa bez odwołania się do zadania, któremu mają służyć. Z tego właśnie powodu modele oceniające podobieństwo powinny być wyuczane na podstawie danych, specjalnie na potrzeby realizacji konkretnego zadania. W niniejszej rozprawie opisano model podobieństwa zwany Regułowym Modelem Podobieństwa (ang. Rule-Based Similarity) oraz zaproponowano algorytm tworzenia tego modelu na podstawie danych. Wykorzystuje on elementy teorii zbiorów przybliżonych do konstruowania funkcji podobieństwa pozwalającej aproksymować podobieństwo w zadanym kontekście. Konstrukcja ta rozpoczyna się od wykrywania zbiorów wysokopoziomowych cech obiektów. Mogą być one interpretowane jako istotne aspekty podobieństwa. Mając zdefiniowane tego typu cechy możliwe jest wykorzystanie idei modelu kontrastu cech Tversky’ego (ang. feature contrast model) do budowy precyzyjnej oraz zgodnej z obserwacjami psychologów funkcji podobieństwa dla rozważanego problemu. Dodatkowo, niniejsza rozprawa zawiera opis dwóch rozszerzeń Regułowego Modelu Podobieństwa przystosowanych do działania na danych o bardzo wielu atrybutach. Starają się one włączyć do modelu szerszy zakres aspektów podobieństwa. W pierwszym z nich odbywa się to poprzez konstruowanie wielu zbiorów cech z reduktów decyzyjnych. Aby zapewnić ich zróżnicowanie, zaproponowano algorytm łączący heurystykę zachłanna z elementami losowymi. Podejście to jest szczególnie wskazane dla zadań związanych z problemem małej liczby obiektów i dużej liczby cech (ang. the few-objects-many-attributes problem), np. analizy danych mikromacierzowych. Podobny pomysł może być również wykorzystany w dziedzinie analizy tekstów. Realizowany jest on przez drugie z proponowanych rozszerzeń modelu. Łączy ono metodę semantycznego indeksowania z algorytmem obliczania bireduktów informacyjnych, aby reprezentować teksty dobrze zdefiniowanymi pojęciami. Funkcja podobieństwa zaproponowanego modelu może być wykorzystana do klasyfikacji nowych obiektów oraz do łączenia dokumentów tekstowych w semantycznie spójne grupy. Eksperymenty, których wyniki opisano w rozprawie, dowodzą, ze zaproponowane modele mogą skutecznie konkurować nawet z powszechnie uznanymi rozwiązaniami

    Survey of Rough and Fuzzy Hybridization

    Full text link

    Arabic goal-oriented conversational agents using semantic similarity techniques

    Get PDF
    Conversational agents (CAs) are computer programs used to interact with humans in conversation. Goal-Oriented Conversational agents (GO-CAs) are programs that interact with humans to serve a specific domain of interest; its’ importance has increased recently and covered fields of technology, sciences and marketing. There are several types of CAs used in the industry, some of them are simple with limited usage, others are sophisticated. Generally, most CAs were to serve the English language speakers, a few were built for the Arabic language, this is due to the complexity of the Arabic language, lack of researchers in both linguistic and computing. This thesis covered two types of GO-CAs. The first is the traditional pattern matching goal oriented CA (PMGO-CA), and the other is the semantic goal oriented CA (SGO-CA). Pattern matching conversational agents (PMGO-CA) techniques are widely used in industry due to their flexibility and high performance. However, they are labour intensive, difficult to maintain or update, and need continuous housekeeping to manage users’ utterances (especially when instructions or knowledge changes). In addition to that they lack for any machine intelligence. Semantic conversational agents (SGO-CA) techniques utilises humanly constructed knowledge bases such as WordNet to measure word and sentence similarity. Such measurement witnessed many researches for the English language, and very little for the Arabic language. In this thesis, the researcher developed a novelty of a new methodology for the Arabic conversational agents (using both Pattern Matching and Semantic CAs), starting from scripting, knowledge engineering, architecture, implementation and evaluation. New tools to measure the word and sentence similarity were also constructed. To test performance of those CAs, a domain representing the Iraqi passport services was built. Both CAs were evaluated and tested by domain experts using special evaluation metrics. The evaluation showed very promising results, and the viability of the system for real life

    Analogy–based reasoning in classifier construction

    No full text
    Analogy-based reasoning methods in machine learning make it possible to reason about properties of objects on the basis of similarities between objects. A specific similarity based method is the k nearest neighbors (k-nn) classification algorithm. In the k-nn algorithm, a decision about a new object x is inferred on the basis of a fixed number k of the objects most similar to x in a given set of examples. The primary contribution of the dissertation is the introduction of two new classification models based on the k-nn algorithm. The first model is a hybrid combination of the k-nn algorithm with rule induction. The proposed combination uses minimal consistent rules defined by local reducts of a set of examples. To make this combination possible the model of minimal consistent rules is generalized to a metric-dependent form. An effective polynomial algorithm implementing the classification model based on minimal consistent rules has been proposed by Bazan. We modify this algorithm in such a way that afte
    corecore