2,862 research outputs found

    Holistic corpus-based dialectology

    Get PDF
    This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain

    Density-based algorithms for active and anytime clustering

    Get PDF
    Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen

    Proposal of an adaptive infotainment system depending on driving scenario complexity

    Get PDF
    Tesi en modalitat Doctorat industrialPla de Doctorat industrial de la Generalitat de CatalunyaThe PhD research project is framed within the plan of industrial doctorates of the “Generalitat de Catalunya”. During the investigation, most of the work was carried out at the facilities of the vehicle manufacturer SEAT, specifically at the information and entertainment (infotainment) department. In the same way, there was a continuous cooperation with the telematics department of the UPC. The main objective of the project consisted in the design and validation of an adaptive infotainment system dependent on the driving complexity. The system was created with the purpose of increasing driver’ experience while guaranteeing a proper level of road safety. Given the increasing number of application and services available in current infotainment systems, it becomes necessary to devise a system capable of balancing these two counterparts. The most relevant parameters that can be used for balancing these metrics while driving are: type of services offered, interfaces available for interacting with the services, the complexity of driving and the profile of the driver. The present study can be divided into two main development phases, each phase had as outcome a real physical block that came to be part of the final system. The final system was integrated in a vehicle and validated in real driving conditions. The first phase consisted in the creation of a model capable of estimating the driving complexity based on a set of variables related to driving. The model was built by employing machine learning methods and the dataset necessary to create it was collected from several driving routes carried out by different participants. This phase allowed to create a model capable of estimating, with a satisfactory accuracy, the complexity of the road using easily extractable variables in any modern vehicle. This approach simplify the implementation of this algorithm in current vehicles. The second phase consisted in the classification of a set of principles that allow the design of the adaptive infotainment system based on the complexity of the road. These principles are defined based on previous researches undertaken in the field of usability and user experience of graphical interfaces. According to these of principles, a real adaptive infotainment system with the most commonly used functionalities; navigation, radio and media was designed and integrated in a real vehicle. The developed system was able to adapt the presentation of the content according to the estimation of the driving complexity given by the block developed in phase one. The adaptive system was validated in real driving scenarios by several participants and results showed a high level of acceptance and satisfaction towards this adaptive infotainment. As a starting point for future research, a proof of concept was carried out to integrate new interfaces into a vehicle. The interface used as reference was a Head Mounted screen that offered redundant information in relation to the instrument cluster. Tests with participants served to understand how users perceive the introduction of new technologies and how objective benefits could be blurred by initial biases.El proyecto de investigación de doctorado se enmarca dentro del plan de doctorados industriales de la Generalitat de Catalunya. Durante la investigación, la mayor parte del trabajo se llevó a cabo en las instalaciones del fabricante de vehículos SEAT, específicamente en el departamento de información y entretenimiento (infotainment). Del mismo modo, hubo una cooperación continua con el departamento de telemática de la UPC. El objetivo principal del proyecto consistió en el diseño y la validación de un sistema de información y entretenimiento adaptativo que se ajustaba de acuerdo a la complejidad de la conducción. El sistema fue creado con el propósito de aumentar la experiencia del conductor y garantizar un nivel adecuado en la seguridad vial. El proyecto surge dado el número creciente de aplicaciones y servicios disponibles en los sistemas actuales de información y entretenimiento; es por ello que se hace necesario contar con un sistema capaz de equilibrar estas dos contrapartes. Los parámetros más relevantes que se pueden usar para equilibrar estas métricas durante la conducción son: el tipo de servicios ofrecidos, las interfaces disponibles para interactuar con los servicios, la complejidad de la conducción y el perfil del conductor. El presente estudio se puede dividir en dos fases principales de desarrollo, cada fase tuvo como resultado un componente que se convirtió en parte del sistema final. El sistema final fue integrado en un vehículo y validado en condiciones reales de conducción. La primera fase consistió en la creación de un modelo capaz de estimar la complejidad de la conducción en base a un conjunto de variables relacionadas con la conducción. El modelo se construyó empleando "Machine Learning Methods" y el conjunto de datos necesario para crearlo se recopiló a partir de varias rutas de conducción realizadas por diferentes participantes. Esta fase permitió crear un modelo capaz de estimar, con una precisión satisfactoria, la complejidad de la carretera utilizando variables fácilmente extraíbles en cualquier vehículo moderno. Este enfoque simplifica la implementación de este algoritmo en los vehículos actuales. La segunda fase consistió en la clasificación de un conjunto de principios que permiten el diseño del sistema de información y entretenimiento adaptativo basado en la complejidad de la carretera. Estos principios se definen en base a investigaciones anteriores realizadas en el campo de usabilidad y experiencia del usuario con interfaces gráficas. De acuerdo con estos principios, un sistema de entretenimiento y entretenimiento real integrando las funcionalidades más utilizadas; navegación, radio y audio fue diseñado e integrado en un vehículo real. El sistema desarrollado pudo adaptar la presentación del contenido según la estimación de la complejidad de conducción dada por el bloque desarrollado en la primera fase. El sistema adaptativo fue validado en escenarios de conducción reales por varios participantes y los resultados mostraron un alto nivel de aceptación y satisfacción hacia este entretenimiento informativo adaptativo. Como punto de partida para futuras investigaciones, se llevó a cabo una prueba de concepto para integrar nuevas interfaces en un vehículo. La interfaz utilizada como referencia era una pantalla a la altura de los ojos (Head Mounted Display) que ofrecía información redundante en relación con el grupo de instrumentos. Las pruebas con los participantes sirvieron para comprender cómo perciben los usuarios la introducción de nuevas tecnologías y cómo los sesgos iniciales podrían difuminar los beneficios.Postprint (published version

    Proposal of an adaptive infotainment system depending on driving scenario complexity

    Get PDF
    The PhD research project is framed within the plan of industrial doctorates of the “Generalitat de Catalunya”. During the investigation, most of the work was carried out at the facilities of the vehicle manufacturer SEAT, specifically at the information and entertainment (infotainment) department. In the same way, there was a continuous cooperation with the telematics department of the UPC. The main objective of the project consisted in the design and validation of an adaptive infotainment system dependent on the driving complexity. The system was created with the purpose of increasing driver’ experience while guaranteeing a proper level of road safety. Given the increasing number of application and services available in current infotainment systems, it becomes necessary to devise a system capable of balancing these two counterparts. The most relevant parameters that can be used for balancing these metrics while driving are: type of services offered, interfaces available for interacting with the services, the complexity of driving and the profile of the driver. The present study can be divided into two main development phases, each phase had as outcome a real physical block that came to be part of the final system. The final system was integrated in a vehicle and validated in real driving conditions. The first phase consisted in the creation of a model capable of estimating the driving complexity based on a set of variables related to driving. The model was built by employing machine learning methods and the dataset necessary to create it was collected from several driving routes carried out by different participants. This phase allowed to create a model capable of estimating, with a satisfactory accuracy, the complexity of the road using easily extractable variables in any modern vehicle. This approach simplify the implementation of this algorithm in current vehicles. The second phase consisted in the classification of a set of principles that allow the design of the adaptive infotainment system based on the complexity of the road. These principles are defined based on previous researches undertaken in the field of usability and user experience of graphical interfaces. According to these of principles, a real adaptive infotainment system with the most commonly used functionalities; navigation, radio and media was designed and integrated in a real vehicle. The developed system was able to adapt the presentation of the content according to the estimation of the driving complexity given by the block developed in phase one. The adaptive system was validated in real driving scenarios by several participants and results showed a high level of acceptance and satisfaction towards this adaptive infotainment. As a starting point for future research, a proof of concept was carried out to integrate new interfaces into a vehicle. The interface used as reference was a Head Mounted screen that offered redundant information in relation to the instrument cluster. Tests with participants served to understand how users perceive the introduction of new technologies and how objective benefits could be blurred by initial biases.El proyecto de investigación de doctorado se enmarca dentro del plan de doctorados industriales de la Generalitat de Catalunya. Durante la investigación, la mayor parte del trabajo se llevó a cabo en las instalaciones del fabricante de vehículos SEAT, específicamente en el departamento de información y entretenimiento (infotainment). Del mismo modo, hubo una cooperación continua con el departamento de telemática de la UPC. El objetivo principal del proyecto consistió en el diseño y la validación de un sistema de información y entretenimiento adaptativo que se ajustaba de acuerdo a la complejidad de la conducción. El sistema fue creado con el propósito de aumentar la experiencia del conductor y garantizar un nivel adecuado en la seguridad vial. El proyecto surge dado el número creciente de aplicaciones y servicios disponibles en los sistemas actuales de información y entretenimiento; es por ello que se hace necesario contar con un sistema capaz de equilibrar estas dos contrapartes. Los parámetros más relevantes que se pueden usar para equilibrar estas métricas durante la conducción son: el tipo de servicios ofrecidos, las interfaces disponibles para interactuar con los servicios, la complejidad de la conducción y el perfil del conductor. El presente estudio se puede dividir en dos fases principales de desarrollo, cada fase tuvo como resultado un componente que se convirtió en parte del sistema final. El sistema final fue integrado en un vehículo y validado en condiciones reales de conducción. La primera fase consistió en la creación de un modelo capaz de estimar la complejidad de la conducción en base a un conjunto de variables relacionadas con la conducción. El modelo se construyó empleando "Machine Learning Methods" y el conjunto de datos necesario para crearlo se recopiló a partir de varias rutas de conducción realizadas por diferentes participantes. Esta fase permitió crear un modelo capaz de estimar, con una precisión satisfactoria, la complejidad de la carretera utilizando variables fácilmente extraíbles en cualquier vehículo moderno. Este enfoque simplifica la implementación de este algoritmo en los vehículos actuales. La segunda fase consistió en la clasificación de un conjunto de principios que permiten el diseño del sistema de información y entretenimiento adaptativo basado en la complejidad de la carretera. Estos principios se definen en base a investigaciones anteriores realizadas en el campo de usabilidad y experiencia del usuario con interfaces gráficas. De acuerdo con estos principios, un sistema de entretenimiento y entretenimiento real integrando las funcionalidades más utilizadas; navegación, radio y audio fue diseñado e integrado en un vehículo real. El sistema desarrollado pudo adaptar la presentación del contenido según la estimación de la complejidad de conducción dada por el bloque desarrollado en la primera fase. El sistema adaptativo fue validado en escenarios de conducción reales por varios participantes y los resultados mostraron un alto nivel de aceptación y satisfacción hacia este entretenimiento informativo adaptativo. Como punto de partida para futuras investigaciones, se llevó a cabo una prueba de concepto para integrar nuevas interfaces en un vehículo. La interfaz utilizada como referencia era una pantalla a la altura de los ojos (Head Mounted Display) que ofrecía información redundante en relación con el grupo de instrumentos. Las pruebas con los participantes sirvieron para comprender cómo perciben los usuarios la introducción de nuevas tecnologías y cómo los sesgos iniciales podrían difuminar los beneficios

    Smart Energy and Intelligent Transportation Systems

    Get PDF
    With the Internet of Things and various information and communication technologies, a city can manage its assets in a smarter way, constituting the urban development vision of a smart city. This facilitates a more efficient use of physical infrastructure and encourages citizen participation. Smart energy and smart mobility are among the key aspects of the smart city, in which the electric vehicle (EV) is believed to take a key role. EVs are powered by various energy sources or the electricity grid. With proper scheduling, a large fleet of EVs can be charged from charging stations and parking infrastructures. Although the battery capacity of a single EV is small, an aggregation of EVs can perform as a significant power source or load, constituting a vehicle-to-grid (V2G) system. Besides acquiring energy from the grid, in V2G, EVs can also support the grid by providing various demand response and auxiliary services. Thanks to this, we can reduce our reliance on fossil fuels and utilize the renewable energy more effectively. This Special Issue “Smart Energy and Intelligent Transportation Systems” addresses existing knowledge gaps and advances smart energy and mobility. It consists of five peer-reviewed papers that cover a range of subjects and applications related to smart energy and transportation

    Algorithm Selection in Auction-based Allocation of Cloud Computing Resources

    Get PDF

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

    Machine learning in critical care: state-of-the-art and a sepsis case study

    Get PDF
    Background: Like other scientific fields, such as cosmology, high-energy physics, or even the life sciences, medicine and healthcare face the challenge of an extremely quick transformation into data-driven sciences. This challenge entails the daunting task of extracting usable knowledge from these data using algorithmic methods. In the medical context this may for instance realized through the design of medical decision support systems for diagnosis, prognosis and patient management. The intensive care unit (ICU), and by extension the whole area of critical care, is becoming one of the most data-driven clinical environments. Results: The increasing availability of complex and heterogeneous data at the point of patient attention in critical care environments makes the development of fresh approaches to data analysis almost compulsory. Computational Intelligence (CI) and Machine Learning (ML) methods can provide such approaches and have already shown their usefulness in addressing problems in this context. The current study has a dual goal: it is first a review of the state-of-the-art on the use and application of such methods in the field of critical care. Such review is presented from the viewpoint of the different subfields of critical care, but also from the viewpoint of the different available ML and CI techniques. The second goal is presenting a collection of results that illustrate the breath of possibilities opened by ML and CI methods using a single problem, the investigation of septic shock at the ICU. Conclusion: We have presented a structured state-of-the-art that illustrates the broad-ranging ways in which ML and CI methods can make a difference in problems affecting the manifold areas of critical care. The potential of ML and CI has been illustrated in detail through an example concerning the sepsis pathology. The new definitions of sepsis and the relevance of using the systemic inflammatory response syndrome (SIRS) in its diagnosis have been considered. Conditional independence models have been used to address this problem, showing that SIRS depends on both organ dysfunction measured through the Sequential Organ Failure (SOFA) score and the ICU outcome, thus concluding that SIRS should still be considered in the study of the pathophysiology of Sepsis. Current assessment of the risk of dead at the ICU lacks specificity. ML and CI techniques are shown to improve the assessment using both indicators already in place and other clinical variables that are routinely measured. Kernel methods in particular are shown to provide the best performance balance while being amenable to representation through graphical models, which increases their interpretability and, with it, their likelihood to be accepted in medical practice.Peer ReviewedPostprint (published version
    corecore