290 research outputs found

    TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

    Get PDF
    The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w

    Query-Time Data Integration

    Get PDF
    Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

    Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience

    Get PDF
    A fundamental challenge for systems neuroscience is to quantitatively relate its three major branches of research: brain-activity measurement, behavioral measurement, and computational modeling. Using measured brain-activity patterns to evaluate computational network models is complicated by the need to define the correspondency between the units of the model and the channels of the brain-activity data, e.g., single-cell recordings or voxels from functional magnetic resonance imaging (fMRI). Similar correspondency problems complicate relating activity patterns between different modalities of brain-activity measurement (e.g., fMRI and invasive or scalp electrophysiology), and between subjects and species. In order to bridge these divides, we suggest abstracting from the activity patterns themselves and computing representational dissimilarity matrices (RDMs), which characterize the information carried by a given representation in a brain or model. Building on a rich psychological and mathematical literature on similarity analysis, we propose a new experimental and data-analytical framework called representational similarity analysis (RSA), in which multi-channel measures of neural activity are quantitatively related to each other and to computational theory and behavior by comparing RDMs. We demonstrate RSA by relating representations of visual objects as measured with fMRI in early visual cortex and the fusiform face area to computational models spanning a wide range of complexities. The RDMs are simultaneously related via second-level application of multidimensional scaling and tested using randomization and bootstrap techniques. We discuss the broad potential of RSA, including novel approaches to experimental design, and argue that these ideas, which have deep roots in psychology and neuroscience, will allow the integrated quantitative analysis of data from all three branches, thus contributing to a more unified systems neuroscience

    Learning Multi-dimensional Indexes

    Full text link
    Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system

    Comprehensive Modernization of Firearm Discharge Residue Analysis; Advanced Analytical Techniques, Complexing Agents, and Tandem Mass Spectrometry

    Get PDF
    The use of firearm discharge residue (FDR) evidence has been on the decline as a result of instrumental and analytical limitations and the inability to evaluate and assign evidentiary value. To utilize FDR evidence to its fullest extent, detection methods exploiting modern advancements in instrumentation must be explored and developed. Research has been performed in an effort to modernize FDR analysis but to date nothing has been implemented or found widespread use in forensic laboratories. This research investigated three analytical techniques for the detection of FDR; (1) ion mobility spectrometry (IMS), (2) thermal desorption gas chromatography mass spectrometry (TD-GC/MS), and (3) electrospray ionization tandem mass spectrometry (ESI-MSn). An IMS method for organic gunshot residues was validated and then employed in a population study to determine shooter from non-shooters by analyzing samples taken from a subject\u27s hands. Peaks corresponding to three organic gunshot residue (OGSR) compounds were detected in approximately 70% of shooter samples. Matrix issues associated with the swab material and the hands of subjects inherently complicated spectra. The results show a need of a pattern-based analysis rather than relying on peak identification for characterizing shooters vs. non-shooters hand swabs.;The next phase of this research was prompted by the need to develop confirmatory detection methods and reach lower limits of detection. A thermal separation probe was affixed to a GC/MS and allowed direct analysis of hand swabs without any prior sample preparation. A method was developed and authentic shooter swabs were analyzed. Although, three OGSR compounds were detected in 14-81% of authentic samples, additional work remains before the technique can begin to be implemented. Finally, experiments on detecting gunshot residue with ESI-MSn via complexing with a macrocyclic host were performed. The macrocyclic host, 15-crown-5, was evaluated for complexation with known GSR metals. Foundational parameters were established and single and double ligand complexes were identified using isotopic ratios and fragment ions. Mass spectral intensities were used to determine the binding selectivities of the metals to the crown ether and in turn the preferential binding of the target metals. Additionally, preliminary molecular modeling provided insight into some experimental observations. Overall, three methods were evaluated in an effort to modernize the analysis of firearm discharge residues and in doing so increase the evidentiary value. IMS and thermal desorption GC/MS proved adequate as screening methods for OGSR and while additional work is required, ESI-MSn proved promising for detecting complexed GSR metals. The advantage of coupling ESI-MSn and complexation is that it allows for the dual detection of OGSR and GSR. While modernizing analysis is key to increasing the evidentiary value it is apparent that coupling the detection of OGSR and GSR is the future of FDR analysis

    Novel functionalized fillers for mixed matrix membranes for C02/CH4 separation

    Get PDF
    Il y a des réservoirs de gaz naturel à travers le monde qui ne sont pas exploités en raison de leur haute teneur en C0₂. Il serait donc intéressant que la technologie soit améliorée pbur la purification du gaz naturel. La grande majorité des systèmes commerciaux de séparation des gaz par membrane utilise des polymères en raison de leur compacité, leur facilité d'utilisation et de leur coût. Cependant, les membranes polymériques conçues pour des séparations de gaz sont reconnues pour avoir un compromis entre la perméabilité et la sélectivité représenté par les courbes limites supérieures de Robeson. La recherche pour les matériaux membraines qui transcendent la limite supérieure de Robeson a été une question critique dans la recherche axée sur les membranes pour la séparation de gaz durant la dernière décennie. Ainsi, de nombreux chercheurs ont exploré l'idée de membranes à matrice mixte (MMM) pour surmonter ces limitations. Ces membranes combinent une matrice polymère avec un tamis moléculaire inorganique tel que les zéolithes. Ce travail présente une étude de la synthèse et de la caractérisation de nouvelles charges pouvant être utilisées dans les membranes à matrice mixte (MMM) pour la séparation du C0₂/CH₄. En première partie de cette thèse, nous avons développé une stratégie pour surmonter les approches précédentes qui sont problématiques pour greffer les charges zéolithes. Nous avons synthétisé et caractérisé la zéolithe FAU/EMT et étudié les effets de la polarité du solvant et de la nature des aminosilanes sur les propriétés physico-chimiques des charges, ainsi que sur les propriétés d'adsorption du C0₂. Après cela, avec l'aide d'un plan expérimental de Taguchi, nous avons optimisé les paramètres de la réaction de greffage de la zéolithe FAU/EMT avec l'agent 3-aminopropylméthyldiéthoxysilane (APMDES) pour préparer de bons remplissages greffés pour une utilisation dans les MMM. Par la suit, les charges préparées dans les conditions optimisées, sont greffées et incorporées dans une matrice de polyimide pour fabriquer des MMM pour la séparation du C0₂/CH₄. Les résultats obtenus ont montré qu'à 25 % (m/m), les charges greffées et imprégnées sur le polymère augmentaient à la fois la perméabilité et la sélectivité par rapport à des membrane de polyimide seul. Lors de la deuxième partie de ma thèse, nous avons développé la préparation, la caractérisation et les propriétés de séparation des gaz de C0₂/CH₄ de MMM comportant différents MOF et le polyimide 6FDA-ODA afin d'étudier l'effet de la fonctionnalisation par le ligand (-NH₂) sur la performance de séparation du C0₂/CH₄ par les MMM. Pour la première fois, nous avons choisi de nouveaux MOF à base de Zr (UiO-66, NH₂-UIO-66, UiO-67), ainsi que MOF-199 (HKUST-1) avec un ligand mixte fonctionnalisé (NH₂-MOF-199) basés sur des calculs de simulation pour la séparation de C0₂/CH₄ à partir des résultats expérimentaux rapportés précédemment. Les résultats obtenus ont montré une augmentation de la sélectivité pour la MMM sauf pour le remplissage avec UiO-67. La présence de groupes fonctionnels d'aminés dans le NH₂-UIO-66 a augmenté à la fois la sélectivité et la perméabilité du C0₂. D'autre part, une MMM faite avec UiO-66 a augmenté de manière significative la perméabilité du C0₂ par rapport à la membrane 6FDA-polyimide seul sans aucune perte de la sélectivité idéale

    Generative models for group fMRI data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 151-174).In this thesis, we develop an exploratory framework for design and analysis of fMRI studies. In our framework, the experimenter presents subjects with a broad set of stimuli/tasks relevant to the domain under study. The analysis method then automatically searches for likely patterns of functional specificity in the resulting data. This is in contrast to the traditional confirmatory approaches that require the experimenter to specify a narrow hypothesis a priori and aims to localize areas of the brain whose activation pattern agrees with the hypothesized response. To validate the hypothesis, it is usually assumed that detected areas should appear in consistent anatomical locations across subjects. Our approach relaxes the conventional anatomical consistency constraint to discover networks of functionally homogeneous but anatomically variable areas. Our analysis method relies on generative models that explain fMRI data across the group as collections of brain locations with similar profiles of functional specificity. We refer to each such collection as a functional system and model it as a component of a mixture model for the data. The search for patterns of specificity corresponds to inference on the hidden variables of the model based on the observed fMRI data. We also develop a nonparametric hierarchical Bayesian model for group fMRI data that integrates the mixture model prior over activations with a model for fMRI signals. We apply the algorithms in a study of high level vision where we consider a large space of patterns of category selectivity over 69 distinct images. The analysis successfully discovers previously characterized face, scene, and body selective areas, among a few others, as the most dominant patterns in the data. This finding suggests that our approach can be employed to search for novel patterns of functional specificity in high level perception and cognition.by Danial Lashkari.Ph.D

    Process intensification of oxidative coupling of methane

    No full text

    Graph Processing in Main-Memory Column Stores

    Get PDF
    Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access. Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries. A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language. In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals. Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators. Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes

    Membrane processes for the dehydration of organic compounds

    Get PDF
    [no abstract
    corecore