8 research outputs found

    Stakeholders’ impact on the reuse potential of structural elements at the end-of-life of a building: A machine learning approach

    Get PDF
    The construction industry, and at its core the building sector, is the largest consumer of non-renewable resources, which produces the highest amount of waste and greenhouse gas emissions worldwide. Since most of the embodied energy and CO2 emissions during the construction and demolition phases of a building are related to its structure, measures to extend the service life of these components should be prioritised. This study develops a set of easy-to-understand instructions to facilitate the practitioners in assessing the social sustainability and responsibility of reusing the load-bearing structural components within the building sector. The results derived by developing and then employing advanced machine learning techniques indicate that the most significant social factor is the perception of the regulatory authorities. The second and third ranks among the social reusability factors belong to risks. Since there is a strong correlation between perception and risk, the potential risks associated with reusing structural elements affect the stakeholders’ perception of reuse. The Bayesian network developed in this study unveil the complex and non-linear correlation between variables, which means none of the factors could alone determine the reusability of an element. This paper shows that by using the basics of probability theory and combining them with advanced supervised machine learning techniques, it is possible to develop tools that reliably estimate the social reusability of these elements based on influencing variables. Therefore, the authors propose using the developed approach in this study to promote materials' circularity in different construction industry sub-sectors

    Validierung von Vorhersagemodellen fĂĽr Zahnverlust in einer Langzeitkohorte von Parodontitispatienten: Einfluss von Entwicklungs- und Validierungsstrategien.

    Get PDF
    Ziel: Ziel dieser Dissertationsarbeit war es, drei unterschiedlich komplexe Vorhersagemodelle für Zahnverlust bei Parodontitispatienten unter Berücksichtigung bekannter methodischer Mängel zu validieren. Um die Problematik der Klassenunausgeglichenheit des Zahnerhaltes versus -verlustes in klinischen Stichproben zu adressieren, wurde dabei erstmals eine Oversampling Technik auf die Daten zum Zahnverlust angewandt. Material und Methode: Eine Langzeitkohorte (390 Patienten, 7518 Zähne) erhielt über 18,2 ± 5,6 Jahre unterstützende parodontale Therapie an einer Universitätszahnklinik. Zahnverlust sowie drei patienten- und fünf zahnbezogene Prädiktoren wurden dokumentiert. Die Modelle wurden intern mittels einer Kreuzvalidierung und extern anhand eines separaten Trainingsdatensatzes validiert, um den Einfluss der Modellkomplexität (logistische Regression, Random Forest, Gradient Boosting Machine) und der Synthetic Minority Oversampling Technique (SMOTE, Technik zur Vervielfachung der Minoritätsklasse) auf die Ergebnisse zu bestimmen und die Modelle und Validierungsstrategien zu vergleichen. Ergebnisse: Alle Modelle zeigten niedrige Sensitivität bei hoher Spezifität. Komplexere Modelle (Random Forest, Gradient Boosting Machine) hatten in der internen und externen Validierung keine signifikant höheren „Area under the curve“(AUC)-Werte als die logistische Regression (logR). Unter SMOTE wurde die Diskriminationsstärke der komplexeren Modelle in der internen Validierung überschätzt, während die AUC in der externen Validierung signifikant niedriger war, da die Sensitivität auf Kosten der Spezifität stieg. Bei der logR trat kein signifikanter Unterschied auf, sie reagierte robust. Konklusion: Zusammenfassend erscheint keines der Modelle klinisch sinnvoll anwendbar und SMOTE für die Daten ungeeignet zu sein. Weiterführende Untersuchungen im Sinne einer rigorosen Entwicklung bezüglich der Klassenunausgeglichenheit und weiterer Prädiktoren müssen folgen

    Empirical study of dimensionality reduction methodologies for classification problems

    Get PDF
    Cuando hablamos de “Dimensionality Reduction” en Informática o “Big Data” nos referimos al proceso de reducción de variables previamente examinadas de un conjunto de datos para poder así obtener un conjunto de variables menor que nos permitirá construir un modelo de datos igual o con mejor precisión y menor cantidad de datos. Con este propósito se aplican técnicas de “Feature Selection” y “Feature Extraction”, con la primera de ellas extraemos un conjunto de características importantes de un dataset mediante el uso de distintos algoritmos de “machine learning”, mientras que con la segunda obtendremos un nuevo conjunto de características obtenidas a partir de las características originales. En este trabajo de fin de grado hacemos un estudio empírico sobre las distintas metodologías para clasificación de problemas utilizando un dataset médico llamado NCS-1 de pacientes clínicos con distintas patologías médicas, estudiamos los distintos algoritmos que se pueden aplicar a cada caso determinado con dicho dataset, y finalmente con los datos obtenidos realizamos un benchmark que nos permite entender mejor los distintos modelos estudiados.When we speak about Dimensionality reduction in informatics or big data, we refer to the process of reducing the number of random variables under consideration, and so, obtaining a set of principle variables which allow us to build a data model with the same or similar accuracy and a lower amount of data. For this purpose, we apply feature selection and feature extraction techniques. With feature selection we select a subset of the original feature set using techniques of machine learning, and with feature extraction we are going to build a new set of features from the original feature set. In this Project, we are going to make an empirical study about the different methodologies for classification problems using a medical dataset called NCS-1 of clinical patients with different medical pathologies, we study the different algorithms that can be applied for each case with this dataset, and finally with obtained data developing a Benchmark to understand the different applied models.Grado en Ingeniería Informátic

    a priori synthetic sampling for increasing classification sensitivity in imbalanced data sets

    Get PDF
    Building accurate classifiers for predicting group membership is made difficult when data is skewed or imbalanced which is typical of real world data sets. The classifier has the tendency to be biased towards the over represented group as a result. This imbalance is considered a class imbalance problem which will induce bias into the classifier particularly when the imbalance is high. Class imbalance data usually suffers from data intrinsic properties beyond that of imbalance alone. The problem is intensified with larger levels of imbalance most commonly found in observational studies. Extreme cases of class imbalance are commonly found in many domains including fraud detection, mammography of cancer and post term births. These rare events are usually the most costly or have the highest level of risk associated with them and are therefore of most interest. To combat class imbalance the machine learning community has relied upon embedded, data preprocessing and ensemble learning approaches. Exploratory research has linked several factors that perpetuate the issue of misclassification in class imbalanced data. However, there remains a lack of understanding between the relationship of the learner and imbalanced data among the competing approaches. The current landscape of data preprocessing approaches have appeal due to the ability to divide the problem space in two which allows for simpler models. However, most of these approaches have little theoretical bases although in some cases there is empirical evidence supporting the improvement. The main goals of this research is to introduce newly proposed a priori based re-sampling methods that improve concept learning within class imbalanced data. The results in this work highlight the robustness of these techniques performance within publicly available data sets from different domains containing various levels of imbalance. In this research the theoretical and empirical reasons are explored and discussed
    corecore