Search CORE

28 research outputs found

bdbms -- A Database Management System for Biological Data

Author: Aref Walid G.
Eltabakh Mohamed Y.
Ouzzani Mourad
Publication venue
Publication date: 01/12/2006
Field of study

Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

arXiv.org e-Print Archive

CiteSeerX

Purdue E-Pubs

Record Linkage Based on Entities\u27 Behavior

Author: Elmagarmid Ahmed K.
Elmeleegy Hazen
Ouzzani Mourad
Yakout Mohamed
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2008
Field of study

Record linkage is the problem of identifying similar records across different data sources. Traditional record linkage techniques focus on using simple database attributes in a textual similarity comparison to decide on matched and non-matched records. Recently, record linkage techniques have considered useful extracted knowledge and domain information to help enhancing the matching accuracy. In this paper, we present a new technique for record linkage that is based on entity’s behavior, which can be extracted from a transaction log. In the matching process, we measure the improvement of identifying a behavior when comparing two entities by merging their transaction log. To do so, we use two matching phases; first, a candidate generation phase, which is fast and provide almost no false negatives, while producing low precision. Second, an accurate matching phase, which enhances the precision of the matching at high run time cost. In the candidates phase generation, behavior is represented by points in the complex plan, where we perform approximate evaluations. In the accurate matching phase, we use a heuristic called compressibility, where identified behaviors are more compressible. Our experiments show that the proposed technique can be used to enhance the record linkage quality while being practical for large logs. We also perform extensive sensitivity analysis for the technique’s accuracy and performance

CiteSeerX

Purdue E-Pubs

Syndromic surveillance: STL for modeling, visualizing, and monitoring disease counts

Author: Ahmad Abusalah
B Reis
C Farrington
D Franz
D Stroup
David E Anderson
David S Ebert
H Burkom
H Burkom
J Brillman
K Kleinman
K Mandl
L Hutwagner
L Hutwagner
L Hutwagner
L Simonsen
L Stern
M Jackson
M Meselson
Mohamed Yakout
Mourad Ouzzani
NL Johnson
P Sartwell
R Cleveland
R Development Core Team
R Olszewski
Ross Maciejewski
Ryan P Hafen
S Grannis
S Wallenstein
Shaun J Grannis
T Burr
U Dafni
W Cleveland
W Wong
William S Cleveland
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Public health surveillance is the monitoring of data to detect and quantify unusual health events. Monitoring pre-diagnostic data, such as emergency department (ED) patient chief complaints, enables rapid detection of disease outbreaks. There are many sources of variation in such data; statistical methods need to accurately model them as a basis for timely and accurate disease outbreak methods. Methods Our new methods for modeling daily chief complaint counts are based on a seasonal-trend decomposition procedure based on loess (STL) and were developed using data from the 76 EDs of the Indiana surveillance program from 2004 to 2008. Square root counts are decomposed into inter-annual, yearly-seasonal, day-of-the-week, and random-error components. Using this decomposition method, we develop a new synoptic-scale (days to weeks) outbreak detection method and carry out a simulation study to compare detection performance to four well-known methods for nine outbreak scenarios. Result The components of the STL decomposition reveal insights into the variability of the Indiana ED data. Day-of-the-week components tend to peak Sunday or Monday, fall steadily to a minimum Thursday or Friday, and then rise to the peak. Yearly-seasonal components show seasonal influenza, some with bimodal peaks. Some inter-annual components increase slightly due to increasing patient populations. A new outbreak detection method based on the decomposition modeling performs well with 90 days or more of data. Control limits were set empirically so that all methods had a specificity of 97%. STL had the largest sensitivity in all nine outbreak scenarios. The STL method also exhibited a well-behaved false positive rate when run on the data with no outbreaks injected. Conclusion The STL decomposition method for chief complaint counts leads to a rapid and accurate detection method for disease outbreaks, and requires only 90 days of historical data to be put into operation. The visualization tools that accompany the decomposition and outbreak methods provide much insight into patterns in the data, which is useful for surveillance operations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

PhDAY 2020 -FOO (Facultad de Óptica y Optometría)

Author: Almalki Wael
Alonso Castellanos Miriam
Antona Peñalba Beatriz
Arranz Márquez Esther
Arriola Villalobos Pedro
Awad Alkozi Hanan
Barrio de Santos Ana Rosa
Benítez AntoJ.
Bernárdez Vilaboa Ricardo
Bodas Romero Julia
Bonnin Arias Cristina
Calderón García Raquel
Carballo Álvarez Jesús
Carpena Torres Carlos
Carracedo Rodríguez Juan Gonzalo
Cayuela López Ana
Cebrián José Luis
Charbel Carla
Crooke Almudena
Diz Arias Elena
Durán Prieto Elena
El Youssfi Asmae Igalla
Fernández Jiménez Elena
Fernández Torres Miguel Ángel
García Montero María
García Rojo Marta María
Garrido Mercado Rafaela
Garzón Jiménez Nuria
González Pérez Mariano
Gutiérrez Jorrín Sara Carmen
Guzmán Aránguez Ana Isabel
Gómez de Liaño Rosario
Gómez Manzanares Ángela
Gómez Pedrero José Antonio
Hernández Verdejo José Luis
Huete Toral Fernando
Jurado Sandra
Laucirica Sáenz Gorka
León Álvarez Alejandro
LLedó Mayans Victoria Eugenia
López Alonso José Manuel
Madrid Costa David
Martín García Beatriz
Martín Gil Alba
Martínez Alberquilla Irene
Martínez Antón Juan Carlos
Martínez Florentín Gema
Martínez Águila Alejandro
Mayorga Pinilla Santiago
Medrano Muñoz Sandra Milena
Molina Nancy
Mármol Errasti Esther
Mínguez Caro N
Navarro Gil Francisco Javier
Oliveiros López Juan
Orduña Azcona Javier
Ouzzani Mohamed
Palomo Álvarez Catalina
Papas Eric B.
Pastrana Robles Cristina
Pastrana Cristina
Paune Jaume
Peral Cerdá Assumpta
Pintor Jesús
Pitarch Velasco Aida
Platero Alvarado Nadiuska Cristine
Privado Aroco Ana
Pérez de Lara María Jesús
Pérez Garmendia Carlos
Pérez Garmendia Carlos
Rodríguez Alonso Xabier
Rodríguez Pomar Candela
Ruiz Alcocer Javier
Serramito Blanco María
Sorzano Sánchez Óscar
Sánchez Naves Juan
Sánchez Pérez Isabel
Sánchez Ramos Celia
Teus Guezala Miguel Ángel
Tomé de la Torre Miguel Ángel
Toral Fernando
Vázquez Moliní Daniel
Álvarez Fernández-Balbuena Antonio
Publication venue: Facultad de Óptica y Optometría (UCM)
Publication date: 01/01/2020
Field of study

Por cuarto año consecutivo los doctorandos de la Facultad de Óptica y Optometría de la Universidad Complutense de Madrid cuentan con un congreso propio organizado por y para ellos, el 4º PhDAY- FOO. Se trata de un congreso gratuito abierto en la que estos jóvenes científicos podrán presentar sus investigaciones al resto de sus compañeros predoctorales y a toda la comunidad universitaria que quiera disfrutar de este evento. Apunta en tu agenda: el 15 de octubre de 2020. En esta ocasión será un Congreso On-line para evitar que la incertidumbre asociada a la pandemia Covid-19 pudiera condicionar su celebración

Docta Complutense

Detecting Inconsistencies in Private Data with Secure Function Evaluation

Author: Elmagarmid Ahmed K.
Ouzzani Mourad
Talukder Nilothpal
Yakout Mohamed
Publication venue: 'Purdue University (bepress)'
Publication date: 21/02/2011
Field of study

Purdue E-Pubs

Discovering Consensus Patterns in Biological Databases

Author: Ali Mohamed
Aref Walid G.
Eltabakh Mohamed
Ouzzani Mourad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques

Purdue E-Pubs

Supporting Real-world Activities in Database Management Systems

Author: Aref Walid G.
Elmagarmid Ahmed K.
Eltabakh Mohamed
Laura-Silva Yasin
Ouzzani Mourad
Publication venue: 'Purdue University (bepress)'
Publication date: 18/03/2009
Field of study

The cycle of processing the data in many application domains is complex and may involve real-world activities that are external to the database, e.g., wet-lab experiments, instrument readings, and manual measurements. These real-world activities may take long time to prepare for and to perform, and hence introduce inherently long time delays between the updates in the database. The presence of these long delays between the updates, along with the need for the intermediate results to be instantly available, makes supporting real-world activities in the database engine a challenging task. In this paper, we address these challenges through a system that enables users to reflect their updates immediately into the database while keeping track of the dependent and potentially invalid data items until they are re-validated. The proposed system includes: (1) semantics and syntax for interfaces through which users can express the dependencies among data items, (2) new operators to alert users when the returned query results contain potentially invalid or out-of-date data, and to enable evaluating queries on either valid data only, or both valid and potentially invalid data, and (3) mechanisms for data invalidation and revalidation. The proposed system is being realized via extensions to PostgreSQL

Crossref

Purdue E-Pubs

Duplicate Elimination in Space-partitioning Tree Indexes

Author: Aref Walid G.
Eltabakh Mohamed
Ouzzani Mourad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects? coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan

Crossref

Purdue E-Pubs

Recommended from our members

HandsOn DB: Managing Data Dependencies involving Human Actions

Author: Aref Walid G.
Elmagarmid Ahmed
Eltabakh Mohamed
Ouzzani Mourad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

IEEE Transactions on Knowledge and Data Engineering, 16 July 2013. IEEE computer Society Digital Library. IEEE Computer Societ

Digital WPI

Purdue E-Pubs