199 research outputs found

    EFFICIENT LAYOUTS AND ALGORITHMS FOR MANAGING VERSIONED DATASETS

    Get PDF
    Version Control Systems were primarily designed to keep track of and provide control over changes to source code and have since provided an excellent way to combat the problem of sharing and editing files in a collaborative setting. The recent surge in data-driven decision making has resulted in a proliferation of datasets elevating them to the level of source code which in turn has led the data analysts to resort to version control systems for the purpose of storing and managing datasets and their versions over time. Unfortunately existing version control systems are poor at handling large datasets primarily due to the underlying assumption that the stored files are relatively small text files with localized changes. Moreover the algorithms used by these systems tend to be fairly simple leading to suboptimal performance when applied to large datasets. In order to address the shortcomings, a key requirement here is to have a Dataset Version Control System (DVCS) that will serve as a common platform to enable data analysts to efficiently store and query dataset versions, track changes to datasets and share datasets between users at ease. Towards this goal, we address the fundamental problem of designing storage layouts for a wide range of datasets to serve as the primary building block for an efficient and scalable DVCS. The key problem in this setting is to compactly store a large number of dataset versions and efficiently retrieve any specific version (or a collection of partial versions). We initiate our study by considering storage-retrieval trade-offs for versions of unstructured dataset such as text files, blobs, etc. where the notion of a partial version is not well-defined. Next, we consider array datasets, i.e., a collection of temporal snapshots (or versions) of multi-dimensional arrays, where the data is predominantly represented in single precision or double precision format. The primary challenge here is to develop efficient compression techniques for the hard-to-compress floating point data due to the high degree of entropy. We observe that the underlying techniques developed for unstructured or array datasets are not well suited for more structured dataset versions -- a version in this setting is defined by a collection of records each of which is uniquely addressable. We carefully explore the design space for building such a system and the various storage-retrieval trade-offs, and discuss how different storage layouts influence those trade-offs. Next, we formulate several problems trading off the version storage and retrieval cost in various ways and design several offline storage layout algorithms that effectively minimize the storage costs while keeping the retrieval costs low. In addition to version retrieval queries, our system also provides support for record provenance queries. Through extensive experiments on large datasets, we demonstrate that our proposed designs can operate at the scale required in most practical scenarios

    How does rumination impact cognition? A first mechanistic model.

    Get PDF
    Rumination is a process of uncontrolled, narrowly-foused neg- ative thinking that is often self-referential, and that is a hall- mark of depression. Despite its importance, little is known about its cognitive mechanisms. Rumination can be thought of as a specific, constrained form of mind-wandering. Here, we introduce a cognitive model of rumination that we devel- oped on the basis of our existing model of mind-wandering. The rumination model implements the hypothesis that rumina- tion is caused by maladaptive habits of thought. These habits of thought are modelled by adjusting the number of memory chunks and their associative structure, which changes the se- quence of memories that are retrieved during mind-wandering, such that during rumination the same set of negative memo- ries is retrieved repeatedly. The implementation of habits of thought was guided by empirical data from an experience sam- pling study in healthy and depressed participants. On the ba- sis of this empirically-derived memory structure, our model naturally predicts the declines in cognitive task performance that are typically observed in depressed patients. This study demonstrates how we can use cognitive models to better un- derstand the cognitive mechanisms underlying rumination and depression

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    Emergent relational schemas for RDF

    Get PDF

    Emergent phonology

    Get PDF
    To what extent do complex phonological patterns require the postulation of universal mechanisms specific to language? In this volume, we explore the Emergent Hypothesis, that the innate language-specific faculty driving the shape of adult grammars is minimal, with grammar development relying instead on cognitive capacities of a general nature. Generalisations about sounds, and about the way sounds are organised into meaningful units, are constructed in a bottom-up fashion: As such, phonology is emergent. We present arguments for considering the Emergent Hypothesis, both conceptually and by working through an extended example in order to demonstrate how an adult grammar might emerge from the input encountered by a learner. Developing a concrete, data-driven approach, we argue that the conventional, abstract notion of unique underlying representations is unmotivated; such underlying representations would require some innate principle to ensure their postulation by a learner. We review the history of the concept and show that such postulated forms result in undesirable phonological consequences. We work through several case studies to illustrate how various types of phonological patterns might be accounted for in the proposed framework. The case studies illustrate patterns of allophony, of productive and unproductive patterns of alternation, and cases where the surface manifestation of a feature does not seem to correspond to its morphological source. We consider cases where a phonetic distinction that is binary seems to manifest itself in a way that is morphologically ternary, and we consider cases where underlying representations of considerable abstractness have been posited in previous frameworks. We also consider cases of opacity, where observed phonological properties do not neatly map onto the phonological generalisations governing patterns of alternation

    Workload-aware systems and interfaces for cognitive augmentation

    Get PDF
    In today's society, our cognition is constantly influenced by information intake, attention switching, and task interruptions. This increases the difficulty of a given task, adding to the existing workload and leading to compromised cognitive performances. The human body expresses the use of cognitive resources through physiological responses when confronted with a plethora of cognitive workload. This temporarily mobilizes additional resources to deal with the workload at the cost of accelerated mental exhaustion. We predict that recent developments in physiological sensing will increasingly create user interfaces that are aware of the user’s cognitive capacities, hence able to intervene when high or low states of cognitive workload are detected. In this thesis, we initially focus on determining opportune moments for cognitive assistance. Subsequently, we investigate suitable feedback modalities in a user-centric design process which are desirable for cognitive assistance. We present design requirements for how cognitive augmentation can be achieved using interfaces that sense cognitive workload. We then investigate different physiological sensing modalities to enable suitable real-time assessments of cognitive workload. We provide empirical evidence that the human brain is sensitive to fluctuations in cognitive resting states, hence making cognitive effort measurable. Firstly, we show that electroencephalography is a reliable modality to assess the mental workload generated during the user interface operation. Secondly, we use eye tracking to evaluate changes in eye movements and pupil dilation to quantify different workload states. The combination of machine learning and physiological sensing resulted in suitable real-time assessments of cognitive workload. The use of physiological sensing enables us to derive when cognitive augmentation is suitable. Based on our inquiries, we present applications that regulate cognitive workload in home and work settings. We deployed an assistive system in a field study to investigate the validity of our derived design requirements. Finding that workload is mitigated, we investigated how cognitive workload can be visualized to the user. We present an implementation of a biofeedback visualization that helps to improve the understanding of brain activity. A final study shows how cognitive workload measurements can be used to predict the efficiency of information intake through reading interfaces. Here, we conclude with use cases and applications which benefit from cognitive augmentation. This thesis investigates how assistive systems can be designed to implicitly sense and utilize cognitive workload for input and output. To do so, we measure cognitive workload in real-time by collecting behavioral and physiological data from users and analyze this data to support users through assistive systems that adapt their interface according to the currently measured workload. Our overall goal is to extend new and existing context-aware applications by the factor cognitive workload. We envision Workload-Aware Systems and Workload-Aware Interfaces as an extension in the context-aware paradigm. To this end, we conducted eight research inquiries during this thesis to investigate how to design and create workload-aware systems. Finally, we present our vision of future workload-aware systems and workload-aware interfaces. Due to the scarce availability of open physiological data sets, reference implementations, and methods, previous context-aware systems were limited in their ability to utilize cognitive workload for user interaction. Together with the collected data sets, we expect this thesis to pave the way for methodical and technical tools that integrate workload-awareness as a factor for context-aware systems.Tagtäglich werden unsere kognitiven Fähigkeiten durch die Verarbeitung von unzähligen Informationen in Anspruch genommen. Dies kann die Schwierigkeit einer Aufgabe durch mehr oder weniger Arbeitslast beeinflussen. Der menschliche Körper drückt die Nutzung kognitiver Ressourcen durch physiologische Reaktionen aus, wenn dieser mit kognitiver Arbeitsbelastung konfrontiert oder überfordert wird. Dadurch werden weitere Ressourcen mobilisiert, um die Arbeitsbelastung vorübergehend zu bewältigen. Wir prognostizieren, dass die derzeitige Entwicklung physiologischer Messverfahren kognitive Leistungsmessungen stets möglich machen wird, um die kognitive Arbeitslast des Nutzers jederzeit zu messen. Diese sind in der Lage, einzugreifen wenn eine zu hohe oder zu niedrige kognitive Belastung erkannt wird. Wir konzentrieren uns zunächst auf die Erkennung passender Momente für kognitive Unterstützung welche sich der gegenwärtigen kognitiven Arbeitslast bewusst sind. Anschließend untersuchen wir in einem nutzerzentrierten Designprozess geeignete Feedbackmechanismen, die zur kognitiven Assistenz beitragen. Wir präsentieren Designanforderungen, welche zeigen wie Schnittstellen eine kognitive Augmentierung durch die Messung kognitiver Arbeitslast erreichen können. Anschließend untersuchen wir verschiedene physiologische Messmodalitäten, welche Bewertungen der kognitiven Arbeitsbelastung in Realzeit ermöglichen. Zunächst validieren wir empirisch, dass das menschliche Gehirn auf kognitive Arbeitslast reagiert. Es zeigt sich, dass die Ableitung der kognitiven Arbeitsbelastung über Elektroenzephalographie eine geeignete Methode ist, um den kognitiven Anspruch neuartiger Assistenzsysteme zu evaluieren. Anschließend verwenden wir Eye-Tracking, um Veränderungen in den Augenbewegungen und dem Durchmesser der Pupille unter verschiedenen Intensitäten kognitiver Arbeitslast zu bewerten. Das Anwenden von maschinellem Lernen führt zu zuverlässigen Echtzeit-Bewertungen kognitiver Arbeitsbelastung. Auf der Grundlage der bisherigen Forschungsarbeiten stellen wir Anwendungen vor, welche die Kognition im häuslichen und beruflichen Umfeld unterstützen. Die physiologischen Messungen stellen fest, wann eine kognitive Augmentierung sich als günstig erweist. In einer Feldstudie setzen wir ein Assistenzsystem ein, um die erhobenen Designanforderungen zur Reduktion kognitiver Arbeitslast zu validieren. Unsere Ergebnisse zeigen, dass die Arbeitsbelastung durch den Einsatz von Assistenzsystemen reduziert wird. Im Anschluss untersuchen wir, wie kognitive Arbeitsbelastung visualisiert werden kann. Wir stellen eine Implementierung einer Biofeedback-Visualisierung vor, die das Nutzerverständnis zum Verlauf und zur Entstehung von kognitiver Arbeitslast unterstützt. Eine abschließende Studie zeigt, wie Messungen kognitiver Arbeitslast zur Vorhersage der aktuellen Leseeffizienz benutzt werden können. Wir schließen hierbei mit einer Reihe von Applikationen ab, welche sich kognitive Arbeitslast als Eingabe zunutze machen. Die vorliegende wissenschaftliche Arbeit befasst sich mit dem Design von Assistenzsystemen, welche die kognitive Arbeitslast der Nutzer implizit erfasst und diese bei der Durchführung alltäglicher Aufgaben unterstützt. Dabei werden physiologische Daten erfasst, um Rückschlüsse in Realzeit auf die derzeitige kognitive Arbeitsbelastung zu erlauben. Anschließend werden diese Daten analysiert, um dem Nutzer strategisch zu assistieren. Das Ziel dieser Arbeit ist die Erweiterung neuartiger und bestehender kontextbewusster Benutzerschnittstellen um den Faktor kognitive Arbeitslast. Daher werden in dieser Arbeit arbeitslastbewusste Systeme und arbeitslastbewusste Benutzerschnittstellen als eine zusätzliche Dimension innerhalb des Paradigmas kontextbewusster Systeme präsentiert. Wir stellen acht Forschungsstudien vor, um die Designanforderungen und die Implementierung von kognitiv arbeitslastbewussten Systemen zu untersuchen. Schließlich stellen wir unsere Vision von zukünftigen kognitiven arbeitslastbewussten Systemen und Benutzerschnittstellen vor. Durch die knappe Verfügbarkeit öffentlich zugänglicher Datensätze, Referenzimplementierungen, und Methoden, waren Kontextbewusste Systeme in der Auswertung kognitiver Arbeitslast bezüglich der Nutzerinteraktion limitiert. Ergänzt durch die in dieser Arbeit gesammelten Datensätze erwarten wir, dass diese Arbeit den Weg für methodische und technische Werkzeuge ebnet, welche kognitive Arbeitslast als Faktor in das Kontextbewusstsein von Computersystemen integriert

    Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction

    Get PDF
    Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds. By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training. MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.EThOS - Electronic Theses Online ServiceEPSRCChina Market AssociationGBUnited Kingdo
    corecore