Search CORE

33 research outputs found

Identifying Relationships between Scientific Datasets

Author: Alawini Abdussalam
Publication venue: PDXScholar
Publication date: 01/01/2016
Field of study

Scientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between datasets can help scientists recall their original derivation history. For instance, if dataset A is contained in dataset B, then the connection between A and B could be that A was extended to create B. We present a relationship-identification methodology as a solution to this problem. To examine the feasibility of our approach, we articulated a set of relevant relationships, developed algorithms for efficient discovery of these relationships, and organized these algorithms into a new system called ReConnect to assist scientists in relationship discovery. We also evaluated existing alternative approaches that rely on flagging differences between two spreadsheets and found that they were impractical for many relationship-discovery tasks. Additionally, we conducted a user study, which showed that relationships do occur in real-world spreadsheets, and that ReConnect can improve scientists\u27 ability to detect such relationships between datasets. The promising results of ReConnect\u27s evaluation encouraged us to explore a more automated approach for relationship discovery. In this dissertation, we introduce an automated end-to-end prototype system, ReDiscover, that identifies, from a collection of datasets, the pairs that are most likely related, and the relationship between them. Our experimental results demonstrate the overall effectiveness of ReDiscover in predicting relationships in a scientist\u27s or a small group of researchers\u27 collections of datasets, and the sensitivity of the overall system to the performance of its various components

PDXScholar (Portland State University)

ProQuest OAI Repository

Data Citation: A New Provenance Challenge

Author: Alawini Abdussalam
DAVIDSON SUSAN B
SILVELLO GIANMARIA
Tannen Val
Wu Yinjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università di Padova

Curriculum analysis for data systems education.

Author: Ajanovski Vangel V.
Alawini Abdussalam
Goodfellow Martin
Liut Michael
Miedema Daphne
Peltsverger Svetlana
Taipalus Toni
Young Tiffany
Publication venue: Association for Computing Machinery (ACM)
Publication date: 08/07/2024
Field of study

The field of data systems has seen quick advances due to the popularization of data science, machine learning, and real-time analytics. In industry contexts, system features such as recommendation systems, chatbots and reverse image search require efficient infrastructure and data management solutions. Due to recent advances, it remains unclear (i) which topics are recommended to be included in data systems studies in higher education, (ii) which topics are a part of data systems courses and how they are taught, and (iii) which data-related skills are valued for roles such as software developers, data engineers, and data scientists. This working group aims to answer these points to explain the state of data systems education today and to uncover knowledge gaps and possible discrepancies between recommendations, course implementations, and industry needs. We expect the results to be applicable in tailoring various data systems courses to better cater to the needs of industry, and for teachers to share best practices

Open Access Institutional Repository at Robert Gordon University

Data systems education : curriculum recommendations, course syllabi, and industry needs

Author: Ajanovski Vangel V.
Alawini Abdussalam
Goodfellow Martin
Liut Michael
Miedema Daphne
Peltsverger Svetlana
Taipalus Toni
Young Tiffany
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 22/01/2025
Field of study

Data systems have been an important part of computing curricula for decades, and an integral part of data-focused industry roles such as software developers, data engineers, and data scientists. However, the field of data systems encompasses a large number of topics ranging from data manipulation and database distribution to creating data pipelines and data analytics solutions. Due to the slow nature of curriculum development, it remains unclear (i) which data systems topics are recommended across diverse higher education curriculum guidelines, (ii) which topics are taught in higher education data systems courses, and (iii) which data systems topics are actually valued in data-focused industry roles. In this study, we analyzed computing curriculum guidelines, course contents, and industry needs regarding data systems to uncover discrepancies between them. Our results show, for example, that topics such as data visualization, data warehousing, and semi-structured data models are valued in industry, yet seldom taught in courses. This work allows professionals to further align curriculum guidelines, higher education, and data systems industry to better prepare students for their working life by focusing on relevant skills in data systems education

University of Strathclyde Institutional Repository

Data systems education: curriculum recommendations, course syllabi, and industry needs.

Author: Ajanovski Vangel V.
Alawini Abdussalam
Goodfellow Martin
Liut Michael
Miedema Daphne
Peltsverger Svetlana
Taipalus Toni
Young Tiffany
Publication venue: Association for Computing Machinery (ACM)
Publication date: 23/01/2025
Field of study

Open Access Institutional Repository at Robert Gordon University

Identifying Relationships between Scientific Datasets

Author: Abdussalam Alawini
Publication venue: Portland State University Library
Publication date: 01/01/2000
Field of study

Crossref

Green BIM Adoption,an Agile Approach

Author: Alawini Abdussalam
Tanatammatorn Napong
Tucker David
Publication venue: PDXScholar
Publication date: 01/04/2011
Field of study

The energy consumption issues of the United States cannot be discussed without the inclusion of the energy needs in the building sector. Currently there are approximately 76 million residential structures and 5 million commercial structures in the United States [1]. As the population grows upward of 311 million people, the need for additional buildings will correspondingly increase [2]. Currently, buildings account for approximately 40% of total energy and 70% of electricity usage [4]. Additionally, the cost of energy in the United States has also been increasing. As the rest of world develops and industrializes, the demand for energy is going to increase due to the economic elasticity in the energy sector

PDXScholar (Portland State University)

DBLP-NSF dataset SQL dump

Author: Alawini A (via Mendeley Data)
Publication venue
Publication date: 07/04/2018
Field of study

This dataset is called DBLP-NSF, which is a Postgresql database dump file that connects computer science publications—extracted from DBLP—to their NSF funding grants—extracted from the National Science Foundation grant dataset. This dataset was used in an NSF-funded research project on data citation as an example of extending bibliographic citations to include funding information (NSF IIS 1302212, URLs: https://alliance.seas.upenn.edu/~citation/wiki/, https://www.researchgate.net/project/CiteDB). It is not a complete dataset — not all publications or all grants are included — and is not intended as an authoritatively complete data set to be used for data mining. Special thanks to Shivendra Pandey for his work on developing this dataset

Electronic Archiving System

DBLP-NSF dataset SQL dump

Author: Alawini A (via Mendeley Data)
Publication venue
Publication date: 27/03/2018
Field of study

Electronic Archiving System

Insights from Student Solutions to MongoDB Homework Problems

Author: Abdussalam Alawini
Mei Chen
Ridha Alkhabaz
Seth Poulsen
Publication venue: Association for Computing Machinery (ACM)
Publication date: 26/06/2021
Field of study

Crossref