8 research outputs found

    Complex queries and complex data

    Get PDF
    With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale Marktplätze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten veröffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu können bedarf es effizienter und effektiver Algorithmen zur Ähnlichkeitssuche, die zu einem gegebenen Anfrageobjekt ähnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der Ähnlichkeit über unterschiedliche Datentypen und Anwendungen hinweg hat sich die Ähnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschäftigen wir uns mit einem speziellen Anfrageprädikat im Bereich der Ähnlichkeitsanfragen, mit k-nächste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nächsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nächsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszuführen nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen Datensätzen. Nach diesem Kapitel über Ähnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell für unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche Abhängigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir Anfrageprädikate die nicht nur die Objekte zurückzugeben, die dem Anfrageobjekt am nächsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nächster Nachbar sind. Wir evaluieren die definierten Anfrageprädikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewährleisten. Ausgehend von den Ergebnissen für Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusätzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer öfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und höhere Laufzeiteffizienz auszeichnen, evaluieren wir außerdem Indexierungsverfahren für Binärvektoren

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Automatic Speech Recognition for Documenting Endangered First Nations Languages

    Get PDF
    Automatic speech recognition (ASR) for low-resource languages is an active field of research. Over the past years with the advent of deep learning, impressive achievements have been reported using minimal resources. As many of the world’s languages are getting extinct every year, with every dying language we lose intellect, culture, values, and tradition which generally pass down for long generations. Linguists throughout the world have already initiated many projects on language documentation to preserve such endangered languages. Automatic speech recognition is a solution to accelerate the documentation process reducing the annotation time for field linguists as well as the overall cost of the project. A traditional speech recognizer is trained on thousands of hours of acoustic data and a phonetic dictionary that includes all words from the language. End-to-End ASR systems have shown dramatic improvement for major languages. Especially, recent advancement in self-supervised representation learning which takes advantage of large corpora of untranscribed speech data has become the state-of-the-art for speech recognition technology. However, for resource-constrained languages, the technology is not tested in depth. In this thesis, we explore both traditional methods of ASR and state-of-the-art end-to-end systems for modeling a critically endangered Athabascan language known as Upper Tanana. In our first approach, we investigate traditional models with a comparative study on feature selection and a performance comparison with deep hybrid models. With limited resources at our disposal, we build a working ASR system based on a grapheme-to-phoneme (G2P) phonetic dictionary. The acoustic model can also be used as a separate forced alignment tool for the automatic alignment of training data. The results show that the GMM-HMM methods outperform deep hybrid models in low-resource acoustic modeling. In our second approach, we propose using Domain-adapted Cross-lingual Speech Recognition (DA-XLSR) for an ASR system, developed over the wav2vec 2.0 framework that utilizes pretrained transformer models leveraging cross lingual data for building an acoustic representation. The proposed system uses a multistage transfer learning process in order to fine tune the final model. To supplement the limited data, we compile a data augmentation strategy combining six augmentation techniques. The speech model uses Connectionist Temporal Classification (CTC) for an alignment free training and does not require any pronunciation dictionary or language model. Experiments from the second approach demonstrate that it can outperform the best traditional or end-to-end models in terms of word error rate (WER) and produce a powerful utterance level transcription. On top of that, the augmentation strategy is tested on several end-to-end models, and it provides a consistent improvement in performance. While the best proposed model can currently reduce the WER significantly, it may still require further research to completely replace the need for human transcribers

    Application of knowledge management principles to support maintenance strategies in healthcare organisations

    Get PDF
    Healthcare is a vital service that touches people's lives on a daily basis by providing treatment and resolving patients' health problems through the staff. Human lives are ultimately dependent on the skilled hands of the staff and those who manage the infrastructure that supports the daily operations of the service, making it a compelling reason for a dedicated research study. However, the UK healthcare sector is undergoing rapid changes, driven by rising costs, technological advancements, changing patient expectations, and increasing pressure to deliver sustainable healthcare. With the global rise in healthcare challenges, the need for sustainable healthcare delivery has become imperative. Sustainable healthcare delivery requires the integration of various practices that enhance the efficiency and effectiveness of healthcare infrastructural assets. One critical area that requires attention is the management of healthcare facilities. Healthcare facilitiesis considered one of the core elements in the delivery of effective healthcare services, as shortcomings in the provision of facilities management (FM) services in hospitals may have much more drastic negative effects than in any other general forms of buildings. An essential element in healthcare FM is linked to the relationship between action and knowledge. With a full sense of understanding of infrastructural assets, it is possible to improve, manage and make buildings suitable to the needs of users and to ensure the functionality of the structure and processes. The premise of FM is that an organisation's effectiveness and efficiency are linked to the physical environment in which it operates and that improving the environment can result in direct benefits in operational performance. The goal of healthcare FM is to support the achievement of organisational mission and goals by designing and managing space and infrastructural assets in the best combination of suitability, efficiency, and cost. In operational terms, performance refers to how well a building contributes to fulfilling its intended functions. Therefore, comprehensive deployment of efficient FM approaches is essential for ensuring quality healthcare provision while positively impacting overall patient experiences. In this regard, incorporating knowledge management (KM) principles into hospitals' FM processes contributes significantly to ensuring sustainable healthcare provision and enhancement of patient experiences. Organisations implementing KM principles are better positioned to navigate the constantly evolving business ecosystem easily. Furthermore, KM is vital in processes and service improvement, strategic decision-making, and organisational adaptation and renewal. In this regard, KM principles can be applied to improve hospital FM, thereby ensuring sustainable healthcare delivery. Knowledge management assumes that organisations that manage their organisational and individual knowledge more effectively will be able to cope more successfully with the challenges of the new business ecosystem. There is also the argument that KM plays a crucial role in improving processes and services, strategic decision-making, and adapting and renewing an organisation. The goal of KM is to aid action – providing "a knowledge pull" rather than the information overload most people experience in healthcare FM. Other motivations for seeking better KM in healthcare FM include patient safety, evidence-based care, and cost efficiency as the dominant drivers. The most evidence exists for the success of such approaches at knowledge bottlenecks, such as infection prevention and control, working safely, compliances, automated systems and reminders, and recall based on best practices. The ability to cultivate, nurture and maximise knowledge at multiple levels and in multiple contexts is one of the most significant challenges for those responsible for KM. However, despite the potential benefits, applying KM principles in hospital facilities is still limited. There is a lack of understanding of how KM can be effectively applied in this context, and few studies have explored the potential challenges and opportunities associated with implementing KM principles in hospitals facilities for sustainable healthcare delivery. This study explores applying KM principles to support maintenance strategies in healthcare organisations. The study also explores the challenges and opportunities, for healthcare organisations and FM practitioners, in operationalising a framework which draws the interconnectedness between healthcare. The study begins by defining healthcare FM and its importance in the healthcare industry. It then discusses the concept of KM and the different types of knowledge that are relevant in the healthcare FM sector. The study also examines the challenges that healthcare FM face in managing knowledge and how the application of KM principles can help to overcome these challenges. The study then explores the different KM strategies that can be applied in healthcare FM. The KM benefits include improved patient outcomes, reduced costs, increased efficiency, and enhanced collaboration among healthcare professionals. Additionally, issues like creating a culture of innovation, technology, and benchmarking are considered. In addition, a framework that integrates the essential concepts of KM in healthcare FM will be presented and discussed. The field of KM is introduced as a complex adaptive system with numerous possibilities and challenges. In this context, and in consideration of healthcare FM, five objectives have been formulated to achieve the research aim. As part of the research, a number of objectives will be evaluated, including appraising the concept of KM and how knowledge is created, stored, transferred, and utilised in healthcare FM, evaluating the impact of organisational structure on job satisfaction as well as exploring how cultural differences impact knowledge sharing and performance in healthcare FM organisations. This study uses a combination of qualitative methods, such as meetings, observations, document analysis (internal and external), and semi-structured interviews, to discover the subjective experiences of healthcare FM employees and to understand the phenomenon within a real-world context and attitudes of healthcare FM as the data collection method, using open questions to allow probing where appropriate and facilitating KM development in the delivery and practice of healthcare FM. The study describes the research methodology using the theoretical concept of the "research onion". The qualitative research was conducted in the NHS acute and non-acute hospitals in Northwest England. Findings from the research study revealed that while the concept of KM has grown significantly in recent years, KM in healthcare FM has received little or no attention. The target population was fifty (five FM directors, five academics, five industry experts, ten managers, ten supervisors, five team leaders and ten operatives). These seven groups were purposively selected as the target population because they play a crucial role in KM enhancement in healthcare FM. Face-to-face interviews were conducted with all participants based on their pre-determined availability. Out of the 50-target population, only 25 were successfully interviewed to the point of saturation. Data collected from the interview were coded and analysed using NVivo to identify themes and patterns related to KM in healthcare FM. The study is divided into eight major sections. First, it discusses literature findings regarding healthcare FM and KM, including underlying trends in FM, KM in general, and KM in healthcare FM. Second, the research establishes the study's methodology, introducing the five research objectives, questions and hypothesis. The chapter introduces the literature on methodology elements, including philosophical views and inquiry strategies. The interview and data analysis look at the feedback from the interviews. Lastly, a conclusion and recommendation summarise the research objectives and suggest further research. Overall, this study highlights the importance of KM in healthcare FM and provides insights for healthcare FM directors, managers, supervisors, academia, researchers and operatives on effectively leveraging knowledge to improve patient care and organisational effectiveness

    Meaning Behind the Metrics of Misery: Understanding Prevalence Estimates of Poor Mental Health in Two Samples of Older Rural Indonesians

    Get PDF
    Background: Late life is typically accompanied by unique physical and mental health challenges. Fewer older people are diagnosed with mood or anxiety-specific disorders than their younger counterparts. However, older people score more highly than younger people on symptom screens indicating high levels of clinically relevant depressive, anxiety, and nonspecific psychological distress symptoms which cause high morbidity, mortality, disability, and poor quality of life. The unique presentation of late life psychiatric syndromes, such as depression and anxiety, remain largely unaddressed in existing psychiatric nosology and measurement techniques, as do depictions of depression and anxiety across diverse cultural contexts. Very few studies exist investigating either the descriptive epidemiology of depression and anxiety among older adults living in low-middle income countries (LMIC) or the unique challenges of mental health measurement in LMIC contexts. This dissertation contributes to this developing evidence base by providing a critical analysis of point prevalence estimates of depression, anxiety, and nonspecific psychological distress (distress) symptoms in two samples of Indonesian rural older persons. Methods: We enumerated greater than or equal to 60-year-olds in 12 Indonesian rural villages as part of the Ageing in Rural Indonesia Study in 2015/16 (N=2526; sample 1). We re-enumerated two of the 12 villages surveyed in 2015 in 2017 (N=536; sample 2). Depressive and distress symptoms were each measured using three scales: PHQ-8/9, CES-D, GDS, and K6, DQ5 and SRQ-20 respectively. Anxiety symptoms were evaluated with the GAD-7. Classical Test Theory and Item Response Theory were used to investigate the psychometric properties of symptom screens. We also undertook mixed effects modelling and Moderated Nonlinear Factor Analysis to identify sources of variability in prevalence estimates. Results: Commonly used cut points of short symptom screens used to approximate diagnostic depressive disorders produced estimates that typically lacked comparability (e.g., sample 2 point-prevalence 3.2%-39.9%). Psychometric analysis further identified mental health scales with better (PHQ-8/9, GAD-7, K6, DQ5) and poorer (GDS, SRQ) construct validity. Sources of variability in point prevalence estimates of depression, anxiety and distress symptoms were identified, and related to study design, cognitive ability, marital status, financial means, level of social support, lifestyle, and health related status. Pervasive non-invariance was identified in participant responses to scale items related to gender, literacy, and ethnicity. However, when modelled, measurement non-invariance did not substantially modify means. Females, respondents with lower literacy levels, and Batak and Sundanese sample villages had significantly higher levels of depression, anxiety, and distress symptoms. Conclusion: The practice of using existing mental health symptom screens combined with commonly used cut points as proxies for depression and anxiety in older rural Indonesians and other diverse populations should be avoided. Rigorous psychometric and diagnostic validation evidence should be ascertained. In the interim, better performing symptom screening tools (i.e., PHQ-8/9, GAD-7, K6, DQ5) may be used as measures of continuous symptom severity. Future research should focus on evaluating the distinctive and overlapping features of mental ill-health in specific subpopulations of Indonesians
    corecore