63 research outputs found

    Unsupervised Anomaly Detection in Data Quality Control

    Get PDF

    Context-Aware Notebook Search in a Jupyter-Based Virtual Research Environment

    Get PDF
    Computational notebook environments such as the Jupyter play an increasingly important role in data-centric research for prototyping computational experiments, documenting code implementations, and sharing scientific results. Effectively discovering and reusing notebooks available on the web can reduce repetitive work and facilitate scientific innovations. However, general-purpose web search engines (e.g., Google Search) do not explicitly index the contents of notebooks, and notebook repositories (e.g., Kaggle and GitHub) require users to create domain-specific queries based on the metadata in the notebook catalogs, which fail to capture the working contexts in the notebook environment. This poster presents a Context-aware Notebook Search Framework (CANSF) to enable a researcher to seamlessly discover external notebooks based on semantic contexts of the literate programming activities in the Jupyter environment.Non

    A decision model for decentralized autonomous organization platform selection: Three industry case studies

    Get PDF
    Context Decentralized autonomous organizations are a new form of smart contract-based governance. Decentralized autonomous organization platforms, which support the creation of such organizations, are becoming increasingly popular, such as Aragon and Colony. Selecting the best fitting platform is challenging for organizations, as a significant number of decision criteria, such as popularity, developer availability, governance issues, and consistent documentation of such platforms, should be considered. Additionally, decision-makers at the organizations are not experts in every domain, so they must continuously acquire volatile knowledge regarding such platforms. Objective Supporting decision-makers in selecting the right decentralized autonomous organization platforms by designing an effective decision model is the main objective of this study. We aim to provide more insight into their selection process and reduce time and effort significantly by designing a decision model. Method This study presents a decision model for the decentralized autonomous organization platform selection problem. The decision model captures knowledge regarding such platforms and concepts systematically. This model is based on an existing theoretical framework that assists software engineers with a set of multi-criteria decision-making problems in software production. Results We conducted three industry case studies in the context of three decentralized autonomous organizations to evaluate the effectiveness and efficiency of the decision model in assisting decision-makers. The case study participants declared that the decision model provides significantly more insight into their selection process and reduces time and effort. Conclusion We observe in the empirical evidence from the case studies that decision-makers can make more rational, efficient, and effective decisions with the decision model. Furthermore, the reusable form of the captured knowledge regarding decentralized autonomous organization platforms can be employed by other researchers in their future investigations

    Notebook-as-a-VRE (NaaVRE): From private notebooks to a collaborative cloud virtual research environment

    Get PDF
    Virtual Research Environments (VREs) provide user-centric support in the lifecycle of research activities, e.g., discovering and accessing research assets, or composing and executing application workflows. A typical VRE is often implemented as an integrated environment, which includes a catalog of research assets, a workflow management system, a data management framework, and tools for enabling collaboration among users. Notebook environments, such as Jupyter, allow researchers to rapidly prototype scientific code and share their experiments as online accessible notebooks. Jupyter can support several popular languages that are used by data scientists, such as Python, R, and Julia. However, such notebook environments do not have seamless support for running heavy computations on remote infrastructure or finding and accessing software code inside notebooks. This paper investigates the gap between a notebook environment and a VRE and proposes an embedded VRE solution for the Jupyter environment called Notebook-as-a-VRE (NaaVRE). The NaaVRE solution provides functional components via a component marketplace and allows users to create a customized VRE on top of the Jupyter environment. From the VRE, a user can search research assets (data, software, and algorithms), compose workflows, manage the lifecycle of an experiment, and share the results among users in the community. We demonstrate how such a solution can enhance a legacy workflow that uses Light Detection and Ranging (LiDAR) data from country-wide airborne laser scanning surveys for deriving geospatial data products of ecosystem structure at high resolution over broad spatial extents. This enables users to scale out the processing of multi-terabyte LiDAR point clouds for ecological applications to more data sources in a distributed cloud environment.Comment: A revised version has been published in the journal software practice and experienc

    The relationship between a plant-based diet and mental health: Evidence from a cross-sectional multicentric community trial (LIPOKAP study)

    Get PDF
    BACKGROUND: Dietary patterns emphasizing plant foods might be neuroprotective and exert health benefits on mental health. However, there is a paucity of evidence on the association between a plant-based dietary index and mental health measures. OBJECTIVE: This study sought to examine the association between plant-based dietary indices, depression and anxiety in a large multicentric sample of Iranian adults. METHODS: This cross-sectional study was performed in a sample of 2,033 participants. A validated food frequency questionnaire was used to evaluate dietary intakes of participants. Three versions of PDI including an overall PDI, a healthy PDI (hPDI), and an unhealthy PDI (uPDI) were created. The presence of anxiety and depression was examined via a validated Iranian version of the Hospital Anxiety and Depression Scale (HADS). RESULTS: PDI and hPDI were not associated to depression and anxiety after adjustment for potential covariates (age, sex, energy, marital status, physical activity level and smoking). However, in the crude model, the highest consumption of uPDI approximately doubled the risk of depression (OR= 2.07, 95% CI: 1.49, 2.87; P<0.0001) and increased the risk of anxiety by almost 50% (OR= 1.56, 95% CI: 1.14, 2.14; P= 0.001). Adjustment for potential confounders just slightly changed the associations (OR for depression in the fourth quartile= 1.96; 95% CI: 1.34, 2.85, and OR for anxiety in the fourth quartile= 1.53; 95% CI: 1.07, 2.19). CONCLUSIONS: An unhealthy plant-based dietary index is associated with a higher risk of depression and anxiety, while plant-based dietary index and healthy plant-based dietary index were not associated to depression and anxiety

    Acoustical Excitation for Damping Estimation in Rotating Machinery

    Get PDF
    In experimental modal analysis a structure is excited with a force in order to estimate the frequency response function. Typically, this force is generated by a shaker or a hammer impact. Both methods have proven their usefulness, but have some well-known disadvantages. A main disadvantage of the shaker is that it has to be fixed to the structure whereas with a hammer it is not possible to excite a specific frequency. To overcome these disadvantages, alternative non-contact methods can be used. There are several non-contact techniques, i.e. pressurized air, laser, acoustics, etc. By using acoustics as an excitation technique it is easy to select an excitation signal going from random noise to a simple sine. Also the equipment to produce the acoustic excitation is rather cheap. However, the state of the art does not offer a straightforward technique to estimate the excitation force, making it difficult for applications such as experimental modal analysis. In this research, acoustic excitation is compared with hammer excitation to estimate the frequency response function of two shafts. Especially a method to validate the force induced by the acoustics is derived. The final purpose of this research is to estimate the damping properties of rotating machinery

    An Adaptable Indexing Pipeline for Enriching Meta Information of Datasets from Heterogeneous Repositories

    Get PDF
    Dataset repositories publish a significant number of datasets continuously within the context of a variety of domains, such as biodiversity and oceanography. To conduct multidisciplinary research, scientists and practitioners must discover datasets from various disciplines unfamiliar with them. Well-known search engines, such as Google dataset and Mendeley data, try to support researchers with cross-domain dataset discovery based on their contents. However, as datasets typically contain scientific observations or collected data from service providers, their contextual information is limited. Accordingly, effective dataset indexing can be impossible to increase the Findability, Accessibility, Interoperability, and Reusability (FAIRness) based on their contextual information. This paper presents an indexing pipeline to extend contextual information of datasets based on their scientific domains by using topic modeling and a set of suggested rules and domain keywords (such as essential variables in environment science) based on domain experts’ suggestions. The pipeline relies on an open ecosystem, where dataset providers publish semantically enhanced metadata on their data repositories. We aggregate, normalize, and reconcile such metadata, providing a dataset search engine that enables research communities to find, access, integrate, and reuse datasets. We evaluated our approach on a manually created gold standard and a user study

    A Decision Model for Programming LanguageEcosystem Selection: Seven Industry Case Studies

    No full text
    Software development is a continuous decision-making process that mainly relies on the software engineer's experience and intuition. One of the essential decisions in the early stages of the process is selecting the best fitting programming language based on the project requirements. A significant number of criteria, such as developer availability and consistent documentation, besides potential programming languages in the market, lead to a challenging decision-making process. A decision model is required to analyze the selection problem using systematic identification and evaluation of potential alternatives for a development project. Method: Recently, we introduced a framework to build decision models for technology selection problems in software production. Furthermore, we designed and implemented a decision support system that uses such decision models to support software engineers with their decision-making problems. This study presents a decision model based on the framework for the programming language selection problem. Results: The decision model has been evaluated through seven real-world case studies at seven software development companies. The case study participants declared that the approach provides significantly more insight into the programming language selection process and decreases the decision-making process's time and cost. Conclusion: With the knowledge available through the decision model, software engineers can more rapidly evaluate programming languages. Having this knowledge readily available supports software engineers in making more efficient and effective decisions that meet their requirements and priorities
    • …
    corecore