399 research outputs found

    A sample of multimodal neuroimaging dataset from a single participant from MND-MFHC.

    No full text
    The sharing of multimodal magnetic resonance imaging (MRI) data is of utmost importance in the field, as it enables a deeper understanding of facial nerve-related pathologies. However, there is a significant lack of multi-modal neuroimaging databases specifically focused on these conditions, which hampers our comprehensive knowledge of the neural foundations of facial paralysis. To address this critical gap and propel advancements in this area, we have released the Multimodal Neuroimaging Dataset of Meige Syndrome, Facial Paralysis, and Healthy Controls (MND-MFHC). This dataset includes detailed clinical assessments of 53 individuals with facial paralysis (FP), 31 patients with Meige syndrome (MS), and 102 healthy controls (HC). To promote open access, the BIDS-formatted data and associated quality control reports can be accessed through the Science Data Bank (SciDB) as part of the Chinese Color Nest Community (\url{https://ccnp.scidb.cn/en}):the FP dataset (\url{https://doi.org/10.57760/sciencedb.09677}), the MEIGE dataset (\url{https://doi.org/10.57760/sciencedb.10796}), and the HC dataset (\url{https://doi.org/10.57760/sciencedb.10872}). By sharing this comprehensive dataset, our aim is to facilitate further research and exploration into the intricate neural mechanisms underlying facial nerve-related pathologies.</p

    Transparent Forecasting Strategies in Database Management Systems

    Get PDF
    Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models

    Uncertainty support in the spectral information System SPECCHIO

    Full text link
    The spectral information system SPECCHIO was updated to support the generic handling of uncertainty information in the form of uncertainty tree diagrams. The updates involve changes to the relations database model as well as dedicated methods provided by the SPECCHIO application programming interface. A case study selected from classic field spectroscopy demonstrates the use of the functionality. In conclusion, a database-centric automated uncertainty propagation in combination with measurement protocol standardization will provide a crucial step toward spectroscopy data accompanied by propagated, traceable, uncertainty information

    Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems

    Get PDF
    Since data sizes of analytical applications are continuously growing, many data scientists are switching from customized micro-solutions to scalable alternatives, such as statistical and scientific databases. However, many algorithms in data mining and science are expressed in terms of linear algebra, which is barely supported by major database vendors and big data solutions. On the other side, conventional linear algebra algorithms and legacy matrix representations are often not suitable for very large matrices. We propose a strategy for large matrix processing on modern multicore systems that is based on a novel, adaptive tile matrix representation (AT MATRIX). Our solution utilizes multiple techniques inspired from database technology, such as multidimensional data partitioning, cardinality estimation, indexing, dynamic rewrites, and many more in order to optimize the execution time. Based thereon we present a matrix multiplication operator ATMULT, which outperforms alternative approaches. The aim of our solution is to overcome the burden for data scientists of selecting appropriate algorithms and matrix storage representations. We evaluated AT MATRIX together with ATMULT on several real-world and synthetic random matrices

    Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security

    Get PDF
    Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness

    Tackling Lyme Disease by Identifying Potential Subunit Vaccine Candidates and Defining the Microbiota of Ixodes ricinus Ticks

    Get PDF
    Lyme disease (LD), one of the most common tick-borne diseases in the Northern Hemisphere, is caused by Borreliella (Borrelia) burgdorferi (B. burgdorferi). The lack of vaccine, increases in LD cases, and wide distribution of Ixodes ticks necessitate the development of an LD vaccine for humans. Identification of protection-associated (PA) epitopes that could lead to the development of a second-generation vaccine against LD is, therefore, needed. Despite the presence of numerous antigenic proteins on the borrelial surface, VlsE is the only variable protein that undergoes rigorous antigenic variation Although VlsE-mediated shielding of a surface antigen was recently demonstrated, it is unlikely that VlsE covers an entire surface of B. burgdorferi. Thus, it is hypothesized that only dominant epitopes are masked by VlsE, whereas subdominant epitopes remain exposed. Moreover, some exposed epitopes may induce protection when made dominant. The first study presented herein focused on the identification of surface epitopes that could provide protection despite VlsE. For that, immunocompetent mice were repeatedly immunized with VlsE-deficient B. burgdorferi and then challenged with wild-type (VlsEexpressing) B. burgdorferi. As a result, 50% of mice became protected due to the repeated exposure of surface epitopes in the absence of VlsE. Subsequently, antibody repertoires identified by random phage display libraries were defined and compared between protected and non-protected mice, which allowed us to pinpoint putative PA epitopes. The second study has examined a protective antibody response in the New Zealand White (NZW) rabbit model. In contrast to mice, NZW rabbits were previously shown to mount a protective antibody response against wild-type B. burgdorferi. A series of passive immunization experiments demonstrated that anti-B. burgdorferi rabbit antibodies provided 100% protection in mice against VlsE-expressing wild type despite the fully functional VlsE system. In addition to protecting against homologous and heterologous challenges, anti-B. burgdorferi rabbit antibodies significantly reduced LD-induced arthritis in actively B. burgdorferi-infected mice. Lastly, the third study has analyzed of the metabiota of Ixodes ricinus, the tick species which is critical in maintaining the enzootic cycle of LD in Europe. The results demonstrated extensive sex-specific and regional-specific variations in the bacterial flora of adult ticks

    Navigating Diverse Datasets in the Face of Uncertainty

    Get PDF
    When exploring big volumes of data, one of the challenging aspects is their diversity of origin. Multiple files that have not yet been ingested into a database system may contain information of interest to a researcher, who must curate, understand and sieve their content before being able to extract knowledge. Performance is one of the greatest difficulties in exploring these datasets. On the one hand, examining non-indexed, unprocessed files can be inefficient. On the other hand, any processing before its understanding introduces latency and potentially un- necessary work if the chosen schema matches poorly the data. We have surveyed the state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle data in-situ performantly. Another major difficulty is matching files from multiple origins since their schema and layout may not be compatible or properly documented. Most surveyed solutions overlook this problem, especially for numeric, uncertain data, as is typical in fields like astronomy. The main objective of our research is to assist data scientists during the exploration of unprocessed, numerical, raw data distributed across multiple files based solely on its intrinsic distribution. In this thesis, we first introduce the concept of Equally-Distributed Dependencies, which provides the foundations to match this kind of dataset. We propose PresQ, a novel algorithm that finds quasi-cliques on hypergraphs based on their expected statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can be assumed to be the same. Finally, we propose a two-sample statistical test based on Self-Organizing Maps (SOM). This method can outperform, in terms of power, other classifier-based two- sample tests, being in some cases comparable to kernel-based methods, with the advantage of being interpretable. Both PresQ and the SOM-based statistical test can provide insights that drive serendipitous discoveries
    corecore