480 research outputs found

    Prediction of Biochemical Oxygen Demand Using Radial Basis Function Network

    Get PDF
    Biochemical oxygen demand shows the amount of oxygen needed by microorganisms to decompose dissolved organic substances suspended in water. This variable determines water quality. The higher value indicates lower water quality. Obtaining this value requires a lengthy procedure of five days in typical laboratories. This paper proposes to predict biochemical oxygen demand using a radial basis function network with improvement relational fuzzy c-means clustering to set centroid by using 11 parameters that come from water quality records. The dataset used in testing consisting of weekly parameters between 2014-2019. Testing results show performance measurement of mean absolute error, mean square error, root mean square error, mean absolute percentage error, and accuracy using centroid with improvement relational fuzzy c-means 0.15016, 0.3677, 0.19082, 21.64490 and 78.35510 comparing with centroid from fuzzy c-means 0.16002, 0.04021, 0.19963, 22.83184, and 77.16816

    Similarity, Retrieval, and Classification of Motion Capture Data

    Get PDF
    Three-dimensional motion capture data is a digital representation of the complex spatio-temporal structure of human motion. Mocap data is widely used for the synthesis of realistic computer-generated characters in data-driven computer animation and also plays an important role in motion analysis tasks such as activity recognition. Both for efficiency and cost reasons, methods for the reuse of large collections of motion clips are gaining in importance in the field of computer animation. Here, an active field of research is the application of morphing and blending techniques for the creation of new, realistic motions from prerecorded motion clips. This requires the identification and extraction of logically related motions scattered within some data set. Such content-based retrieval of motion capture data, which is a central topic of this thesis, constitutes a difficult problem due to possible spatio-temporal deformations between logically related motions. Recent approaches to motion retrieval apply techniques such as dynamic time warping, which, however, are not applicable to large data sets due to their quadratic space and time complexity. In our approach, we introduce various kinds of relational features describing boolean geometric relations between specified body points and show how these features induce a temporal segmentation of motion capture data streams. By incorporating spatio-temporal invariance into the relational features and induced segments, we are able to adopt indexing methods allowing for flexible and efficient content-based retrieval in large motion capture databases. As a further application of relational motion features, a new method for fully automatic motion classification and retrieval is presented. We introduce the concept of motion templates (MTs), by which the spatio-temporal characteristics of an entire motion class can be learned from training data, yielding an explicit, compact matrix representation. The resulting class MT has a direct, semantic interpretation, and it can be manually edited, mixed, combined with other MTs, extended, and restricted. Furthermore, a class MT exhibits the characteristic as well as the variational aspects of the underlying motion class at a semantically high level. Classification is then performed by comparing a set of precomputed class MTs with unknown motion data and labeling matching portions with the respective motion class label. Here, the crucial point is that the variational (hence uncharacteristic) motion aspects encoded in the class MT are automatically masked out in the comparison, which can be thought of as locally adaptive feature selection

    Selected Papers from IEEE ICASI 2019

    Get PDF
    The 5th IEEE International Conference on Applied System Innovation 2019 (IEEE ICASI 2019, https://2019.icasi-conf.net/), which was held in Fukuoka, Japan, on 11–15 April, 2019, provided a unified communication platform for a wide range of topics. This Special Issue entitled “Selected Papers from IEEE ICASI 2019” collected nine excellent papers presented on the applied sciences topic during the conference. Mechanical engineering and design innovations are academic and practical engineering fields that involve systematic technological materialization through scientific principles and engineering designs. Technological innovation by mechanical engineering includes information technology (IT)-based intelligent mechanical systems, mechanics and design innovations, and applied materials in nanoscience and nanotechnology. These new technologies that implant intelligence in machine systems represent an interdisciplinary area that combines conventional mechanical technology and new IT. The main goal of this Special Issue is to provide new scientific knowledge relevant to IT-based intelligent mechanical systems, mechanics and design innovations, and applied materials in nanoscience and nanotechnology

    Reliable statistical modeling of weakly structured information

    Get PDF
    The statistical analysis of "real-world" data is often confronted with the fact that most standard statistical methods were developed under some kind of idealization of the data that is often not adequate in practical situations. This concerns among others i) the potentially deficient quality of the data that can arise for example due to measurement error, non-response in surveys or data processing errors and ii) the scale quality of the data, that is idealized as "the data have some clear scale of measurement that can be uniquely located within the scale hierarchy of Stevens (or that of Narens and Luce or Orth)". Modern statistical methods like, e.g., correction techniques for measurement error or robust methods cope with issue i). In the context of missing or coarsened data, imputation techniques and methods that explicitly model the missing/coarsening process are nowadays wellestablished tools of refined data analysis. Concerning ii) the typical statistical viewpoint is a more pragmatical one, in case of doubt one simply presumes the strongest scale of measurement that is clearly "justified". In more complex situations, like for example in the context of the analysis of ranking data, statisticians often simply do not worry about purely measurement theoretic reservations too much, but instead embed the data structure in an appropriate, easy to handle space, like e.g. a metric space and then use all statistical tools available for this space. Against this background, the present cumulative dissertation tries to contribute from different perspectives to the appropriate handling of data that challenge the above-mentioned idealizations. A focus here is on the one hand on analysis of interval-valued and set-valued data within the methodology of partial identification, and on the other hand on the analysis of data with values in a partially ordered set (poset-valued data). Further tools of statistical modeling treated in the dissertation are necessity measures in the context of possibility theory and concepts of stochastic dominance for poset-valued data. The present dissertation consists of 8 contributions, which will be detailedly discussed in the following sections: Contribution 1 analyzes different identification regions for partially identified linear models under interval-valued responses and develops a further kind of identification region (as well as a corresponding estimator). Estimates for the identifcation regions are compared to each other and also to classical statistical approaches for a data set on wine quality. Contribution 2 deals with logistic regression under coarsened responses, analyzes point-identifying assumptions and develops likelihood-based estimators for the identified set. The methods are illustrated with data of a wave of the panel study "Labor Market and Social Security" (PASS). Contribution 3 analyzes the combinatorial structure of the extreme points and the edges of a polytope (called credal set or core in the literature) that plays a crucial role in imprecise probability theory. Furthermore, an efficient algorithm for enumerating all extreme points is given and compared to existing standard methods. Contribution 4 develops a quantile concept for data or random variables with values in a complete lattice, which is applied in Contribution 5 to the case of ranking data in the context of a data set on the wisdom of the crowd phenomena. In Contribution 6 a framework for evaluating the quality of different aggregation functions of Social Choice Theory is developed, which enables analysis of quality in dependence of group specific homogeneity. In a simulation study, selected aggregation functions, including an aggregation function based on the concepts of Contribution 4 and Contribution 5, are analyzed. Contribution 7 supplies a linear program that allows for detecting stochastic dominance for poset-valued random variables, gives proposals for inference and regularization, and generalizes the approach to the general task of optimizing a linear function on a closure system. The generality of the developed methods is illustrated with data examples in the context of multivariate inequality analysis, item impact and differential item functioning in the context of item response theory, analyzing distributional differences in spatial statistics and guided regularization in the context of cognitive diagnosis models. Contribution 8 uses concepts of stochastic dominance to establish a descriptive approach for a relational analysis of person ability and item difficulty in the context of multidimensional item response theory. All developed methods have been implemented in the language R ([R Development Core Team, 2014]) and are available from the author upon request. The application examples corroborate the usefulness of weak types of statistical modeling examined in this thesis, which, beyond their flexibility to deal with many kinds of data deficiency, can still lead to informative substance matter conclusions that are then more reliable due to the weak modeling.Die statistische Analyse real erhobener Daten sieht sich oft mit der Tatsache konfrontiert, dass übliche statistische Standardmethoden unter einer starken Idealisierung der Datensituation entwickelt wurden, die in der Praxis jedoch oft nicht angemessen ist. Dies betrifft i) die möglicherweise defizitäre Qualität der Daten, die beispielsweise durch Vorhandensein von Messfehlern, durch systematischen Antwortausfall im Kontext sozialwissenschaftlicher Erhebungen oder auch durch Fehler während der Datenverarbeitung bedingt ist und ii) die Skalenqualität der Daten an sich: Viele Datensituationen lassen sich nicht in die einfachen Skalenhierarchien von Stevens (oder die von Narens und Luce oder Orth) einordnen. Modernere statistische Verfahren wie beispielsweise Messfehlerkorrekturverfahren oder robuste Methoden versuchen, der Idealisierung der Datenqualität im Nachhinein Rechnung zu tragen. Im Zusammenhang mit fehlenden bzw. intervallzensierten Daten haben sich Imputationsverfahren zur Vervollständigung fehlender Werte bzw. Verfahren, die den Entstehungprozess der vergröberten Daten explizit modellieren, durchgesetzt. In Bezug auf die Skalenqualität geht die Statistik meist eher pragmatisch vor, im Zweifelsfall wird das niedrigste Skalenniveau gewählt, das klar gerechtfertigt ist. In komplexeren multivariaten Situationen, wie beispielsweise der Analyse von Ranking-Daten, die kaum noch in das Stevensche "Korsett" gezwungen werden können, bedient man sich oft der einfachen Idee der Einbettung der Daten in einen geeigneten metrischen Raum, um dann anschließend alle Werkzeuge metrischer Modellierung nutzen zu können. Vor diesem Hintergrund hat die hier vorgelegte kumulative Dissertation deshalb zum Ziel, aus verschiedenen Blickwinkeln Beiträge zum adäquaten Umgang mit Daten, die jene Idealisierungen herausfordern, zu leisten. Dabei steht hier vor allem die Analyse intervallwertiger bzw. mengenwertiger Daten mittels partieller Identifikation auf der Seite defzitärer Datenqualität im Vordergrund, während bezüglich Skalenqualität der Fall von verbandswertigen Daten behandelt wird. Als weitere Werkzeuge statistischer Modellierung werden hier insbesondere Necessity-Maße im Rahmen der Imprecise Probabilities und Konzepte stochastischer Dominanz für Zufallsvariablen mit Werten in einer partiell geordneten Menge betrachtet. Die vorliegende Dissertation umfasst 8 Beiträge, die in den folgenden Kapiteln näher diskutiert werden: Beitrag 1 analysiert verschiedene Identifikationsregionen für partiell identifizierte lineare Modelle unter intervallwertig beobachteter Responsevariable und schlägt eine neue Identifikationsregion (inklusive Schätzer) vor. Für einen Datensatz, der die Qualität von verschiedenen Rotweinen, gegeben durch ExpertInnenurteile, in Abhängigkeit von verschiedenen physikochemischen Eigenschaften beschreibt, werden Schätzungen für die Identifikationsregionen analysiert. Die Ergebnisse werden ebenfalls mit den Ergebissen klassischer Methoden für Intervalldaten verglichen. Beitrag 2 behandelt logistische Regression unter vergröberter Responsevariable, analysiert punktidentifizierende Annahmen und entwickelt likelihoodbasierte Schätzer für die entsprechenden Identifikationsregionen. Die Methode wird mit Daten einer Welle der Panelstudie "Arbeitsmarkt und Soziale Sicherung" (PASS) illustriert. Beitrag 3 analysiert die kombinatorische Struktur der Extrempunkte und der Kanten eines Polytops (sogenannte Struktur bzw. Kern einer Intervallwahrscheinlichkeit bzw. einer nicht-additiven Mengenfunktion), das von wesentlicher Bedeutung in vielen Gebieten der Imprecise Probability Theory ist. Ein effizienter Algorithmus zur Enumeration aller Extrempunkte wird ebenfalls gegeben und mit existierenden Standardenumerationsmethoden verglichen. In Beitrag 4 wird ein Quantilkonzept für verbandswertige Daten bzw. Zufallsvariablen vorgestellt. Dieses Quantilkonzept wird in Beitrag 5 auf Ranking-Daten im Zusammenhang mit einem Datensatz, der das "Weisheit der Vielen"-Phänomen untersucht, angewendet. Beitrag 6 entwickelt eine Methode zur probabilistischen Analyse der "Qualität" verschiedener Aggregationsfunktionen der Social Choice Theory. Die Analyse wird hier in Abhäangigkeit der Homogenität der betrachteten Gruppen durchgeführt. In einer simulationsbasierten Studie werden exemplarisch verschiedene klassische Aggregationsfunktionen, sowie eine neue Aggregationsfunktion basierend auf den Beiträgen 4 und 5, verglichen. Beitrag 7 stellt einen Ansatz vor, um das Vorliegen stochastischer Dominanz zwischen zwei Zufallsvariablen zu überprüfen. Der Anstaz nutzt Techniken linearer Programmierung. Weiterhin werden Vorschläge für statistische Inferenz und Regularisierung gemacht. Die Methode wird anschließend auch auf den allgemeineren Fall des Optimierens einer linearen Funktion auf einem Hüllensystem ausgeweitet. Die flexible Anwendbarkeit wird durch verschiedene Anwendungsbeispiele illustriert. Beitrag 8 nutzt Ideen stochastischer Dominanz, um Datensätze der multidimensionalen Item Response Theory relational zu analysieren, indem Paare von sich gegenseitig empirisch stützenden Fähigkeitsrelationen der Personen und Schwierigkeitsrelationen der Aufgaben entwickelt werden. Alle entwickelten Methoden wurden in R ([R Development Core Team, 2014]) implementiert. Die Anwendungsbeispiele zeigen die Flexibilität der hier betrachteten Methoden relationaler bzw. "schwacher" Modellierung insbesondere zur Behandlung defizitärer Daten und unterstreichen die Tatsache, dass auch mit Methoden schwacher Modellierung oft immer noch nichttriviale substanzwissenschaftliche Rückschlüsse möglich sind, die aufgrund der inhaltlich vorsichtigeren Modellierung dann auch sehr viel stärker belastbar sind

    Analysis of Microarray Data using Machine Learning Techniques on Scalable Platforms

    Get PDF
    Microarray-based gene expression profiling has been emerged as an efficient technique for classification, diagnosis, prognosis, and treatment of cancer disease. Frequent changes in the behavior of this disease, generate a huge volume of data. The data retrieved from microarray cover its veracities, and the changes observed as time changes (velocity). Although, it is a type of high-dimensional data which has very large number of features rather than number of samples. Therefore, the analysis of microarray high-dimensional dataset in a short period is very much essential. It often contains huge number of data, only a fraction of which comprises significantly expressed genes. The identification of the precise and interesting genes which are responsible for the cause of cancer is imperative in microarray data analysis. Most of the existing schemes employ a two phase process such as feature selection/extraction followed by classification. Our investigation starts with the analysis of microarray data using kernel based classifiers followed by feature selection using statistical t-test. In this work, various kernel based classifiers like Extreme learning machine (ELM), Relevance vector machine (RVM), and a new proposed method called kernel fuzzy inference system (KFIS) are implemented. The proposed models are investigated using three microarray datasets like Leukemia, Breast and Ovarian cancer. Finally, the performance of these classifiers are measured and compared with Support vector machine (SVM). From the results, it is revealed that the proposed models are able to classify the datasets efficiently and the performance is comparable to the existing kernel based classifiers. As the data size increases, to handle and process these datasets becomes very bottleneck. Hence, a distributed and a scalable cluster like Hadoop is needed for storing (HDFS) and processing (MapReduce as well as Spark) the datasets in an efficient way. The next contribution in this thesis deals with the implementation of feature selection methods, which are able to process the data in a distributed manner. Various statistical tests like ANOVA, Kruskal-Wallis, and Friedman tests are implemented using MapReduce and Spark frameworks which are executed on the top of Hadoop cluster. The performance of these scalable models are measured and compared with the conventional system. From the results, it is observed that the proposed scalable models are very efficient to process data of larger dimensions (GBs, TBs, etc.), as it is not possible to process with the traditional implementation of those algorithms. After selecting the relevant features, the next contribution of this thesis is the scalable viii implementation of the proximal support vector machine classifier, which is an efficient variant of SVM. The proposed classifier is implemented on the two scalable frameworks like MapReduce and Spark and executed on the Hadoop cluster. The obtained results are compared with the results obtained using conventional system. From the results, it is observed that the scalable cluster is well suited for the Big data. Furthermore, it is concluded that Spark is more efficient than MapReduce due to its an intelligent way of handling the datasets through Resilient distributed dataset (RDD) as well as in-memory processing and conventional system to analyze the Big datasets. Therefore, the next contribution of the thesis is the implementation of various scalable classifiers base on Spark. In this work various classifiers like, Logistic regression (LR), Support vector machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), and Radial basis function network (RBFN) with two variants hybrid and gradient descent learning algorithms are proposed and implemented using Spark framework. The proposed scalable models are executed on Hadoop cluster as well as conventional system and the results are investigated. From the obtained results, it is observed that the execution of the scalable algorithms are very efficient than conventional system for processing the Big datasets. The efficacy of the proposed scalable algorithms to handle Big datasets are investigated and compared with the conventional system (where data are not distributed, kept on standalone machine and processed in a traditional manner). The comparative analysis shows that the scalable algorithms are very efficient to process Big datasets on Hadoop cluster rather than the conventional system

    Evaluation of optimal solutions in multicriteria models for intelligent decision support

    Get PDF
    La memoria se enmarca dentro de la optimización y su uso para la toma de decisiones. La secuencia lógica ha sido la modelación, implementación, resolución y validación que conducen a una decisión. Para esto, hemos utilizado herramientas del análisis multicrerio, optimización multiobjetivo y técnicas de inteligencia artificial. El trabajo se ha estructurado en dos partes (divididas en tres capítulos cada una) que se corresponden con la parte teórica y con la parte experimental. En la primera parte se analiza el contexto del campo de estudio con un análisis del marco histórico y posteriormente se dedica un capítulo a la optimización multicriterio en el se recogen modelos conocidos, junto con aportaciones originales de este trabajo. En el tercer capítulo, dedicado a la inteligencia artificial, se presentan los fundamentos del aprendizaje estadístico , las técnicas de aprendizaje automático y de aprendizaje profundo necesarias para las aportaciones en la segunda parte. La segunda parte contiene siete casos reales a los que se han aplicado las técnicas descritas. En el primer capítulo se estudian dos casos: el rendimiento académico de los estudiantes de la Universidad Industrial de Santander (Colombia) y un sistema objetivo para la asignación del premio MVP en la NBA. En el siguiente capítulo se utilizan técnicas de inteligencia artificial a la similitud musical (detección de plagios en Youtube), la predicción del precio de cierre de una empresa en el mercado bursátil de Nueva York y la clasificación automática de señales espaciales acústicas en entornos envolventes. En el último capítulo a la potencia de la inteligencia artificial se le incorporan técnicas de análisis multicriterio para detectar el fracaso escolar universitario de manera precoz (en la Universidad Industrial de Santander) y, para establecer un ranking de modelos de inteligencia artificial de se recurre a métodos multicriterio. Para acabar la memoria, a pesar de que cada capítulo contiene una conclusión parcial, en el capítulo 8 se recogen las principales conclusiones de toda la memoria y una bibliografía bastante exhaustiva de los temas tratados. Además, el trabajo concluye con tres apéndices que contienen los programas y herramientas, que a pesar de ser útiles para la comprensión de la memoria, se ha preferido poner por separado para que los capítulos resulten más fluidos

    Fuzzy Logic

    Get PDF
    Fuzzy Logic is becoming an essential method of solving problems in all domains. It gives tremendous impact on the design of autonomous intelligent systems. The purpose of this book is to introduce Hybrid Algorithms, Techniques, and Implementations of Fuzzy Logic. The book consists of thirteen chapters highlighting models and principles of fuzzy logic and issues on its techniques and implementations. The intended readers of this book are engineers, researchers, and graduate students interested in fuzzy logic systems

    Variable illumination and invariant features for detecting and classifying varnish defects

    Get PDF
    This work presents a method to detect and classify varnish defects on wood surfaces. Since these defects are only partially visible under certain illumination directions, one image doesn\u27t provide enough information for a recognition task. A classification requires inspecting the surface under different illumination directions, which results in image series. The information is distributed along this series and can be extracted by merging the knowledge about the defect shape and light direction

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore