559 research outputs found
A Rule-Based Approach to Analyzing Database Schema Objects with Datalog
Database schema elements such as tables, views, triggers and functions are
typically defined with many interrelationships. In order to support database
users in understanding a given schema, a rule-based approach for analyzing the
respective dependencies is proposed using Datalog expressions. We show that
many interesting properties of schema elements can be systematically determined
this way. The expressiveness of the proposed analysis is exemplarily shown with
the problem of computing induced functional dependencies for derived relations.
The propagation of functional dependencies plays an important role in data
integration and query optimization but represents an undecidable problem in
general. And yet, our rule-based analysis covers all relational operators as
well as linear recursive expressions in a systematic way showing the depth of
analysis possible by our proposal. The analysis of functional dependencies is
well-integrated in a uniform approach to analyzing dependencies between schema
elements in general.Comment: Pre-proceedings paper presented at the 27th International Symposium
on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur,
Belgium, 10-12 October 2017 (arXiv:1708.07854
Modelos de compressão e ferramentas para dados ómicos
The ever-increasing growth of the development of high-throughput sequencing
technologies and as a consequence, generation of a huge volume of data,
has revolutionized biological research and discovery. Motivated by that, we
investigate in this thesis the methods which are capable of providing an
efficient representation of omics data in compressed or encrypted manner,
and then, we employ them to analyze omics data.
First and foremost, we describe a number of measures for the purpose
of quantifying information in and between omics sequences. Then, we
present finite-context models (FCMs), substitution-tolerant Markov models
(STMMs) and a combination of the two, which are specialized in modeling
biological data, in order for data compression and analysis.
To ease the storage of the aforementioned data deluge, we design two lossless
data compressors for genomic and one for proteomic data. The methods
work on the basis of (a) a combination of FCMs and STMMs or (b) the mentioned
combination along with repeat models and a competitive prediction
model. Tested on various synthetic and real data showed their outperformance
over the previously proposed methods in terms of compression ratio.
Privacy of genomic data is a topic that has been recently focused by developments
in the field of personalized medicine. We propose a tool that is
able to represent genomic data in a securely encrypted fashion, and at the
same time, is able to compact FASTA and FASTQ sequences by a factor
of three. It employs AES encryption accompanied by a shuffling mechanism
for improving the data security. The results show it is faster than
general-purpose and special-purpose algorithms.
Compression techniques can be employed for analysis of omics data. Having
this in mind, we investigate the identification of unique regions in a species
with respect to close species, that can give us an insight into evolutionary
traits. For this purpose, we design two alignment-free tools that can accurately
find and visualize distinct regions among two collections of DNA or
protein sequences. Tested on modern humans with respect to Neanderthals,
we found a number of absent regions in Neanderthals that may express new
functionalities associated with evolution of modern humans.
Finally, we investigate the identification of genomic rearrangements, that
have important roles in genetic disorders and cancer, by employing a compression
technique. For this purpose, we design a tool that is able to accurately
localize and visualize small- and large-scale rearrangements between
two genomic sequences. The results of applying the proposed tool on several
synthetic and real data conformed to the results partially reported by
wet laboratory approaches, e.g., FISH analysis.O crescente crescimento do desenvolvimento de tecnologias de sequenciamento
de alto rendimento e, como consequência, a geração de um enorme
volume de dados, revolucionou a pesquisa e descoberta biológica. Motivados
por isso, nesta tese investigamos os métodos que fornecem uma
representação eficiente de dados ómicros de maneira compactada ou criptografada
e, posteriormente, os usamos para análise.
Em primeiro lugar, descrevemos uma série de medidas com o objetivo de
quantificar informação em e entre sequencias ómicas. Em seguida, apresentamos
modelos de contexto finito (FCMs), modelos de Markov tolerantes
a substituição (STMMs) e uma combinação dos dois, especializados na
modelagem de dados biológicos, para compactação e análise de dados.
Para facilitar o armazenamento do dilúvio de dados acima mencionado, desenvolvemos
dois compressores de dados sem perda para dados genómicos e
um para dados proteómicos. Os métodos funcionam com base em (a) uma
combinação de FCMs e STMMs ou (b) na combinação mencionada, juntamente
com modelos de repetição e um modelo de previsão competitiva.
Testados em vários dados sintéticos e reais mostraram a sua eficiência sobre
os métodos do estado-de-arte em termos de taxa de compressão.
A privacidade dos dados genómicos é um tópico recentemente focado nos
desenvolvimentos do campo da medicina personalizada. Propomos uma
ferramenta capaz de representar dados genómicos de maneira criptografada
com segurança e, ao mesmo tempo, compactando as sequencias FASTA e
FASTQ para um fator de três. Emprega criptografia AES acompanhada de
um mecanismo de embaralhamento para melhorar a segurança dos dados.
Os resultados mostram que ´e mais rápido que os algoritmos de uso geral e
específico.
As técnicas de compressão podem ser exploradas para análise de dados
ómicos. Tendo isso em mente, investigamos a identificação de regiões
únicas em uma espécie em relação a espécies próximas, que nos podem
dar uma visão das características evolutivas. Para esse fim, desenvolvemos
duas ferramentas livres de alinhamento que podem encontrar e visualizar
com precisão regiões distintas entre duas coleções de sequências de DNA
ou proteínas. Testados em humanos modernos em relação a neandertais,
encontrámos várias regiões ausentes nos neandertais que podem expressar
novas funcionalidades associadas à evolução dos humanos modernos.
Por último, investigamos a identificação de rearranjos genómicos, que têm
papéis importantes em desordens genéticas e cancro, empregando uma
técnica de compressão. Para esse fim, desenvolvemos uma ferramenta capaz
de localizar e visualizar com precisão os rearranjos em pequena e grande
escala entre duas sequências genómicas. Os resultados da aplicação da ferramenta
proposta, em vários dados sintéticos e reais, estão em conformidade
com os resultados parcialmente relatados por abordagens laboratoriais, por
exemplo, análise FISH.Programa Doutoral em Engenharia Informátic
Diseño, implementación y optimización del sistema de compresión de imágenes sobre el ordenador de a bordo del proyecto de nanosátelite Eye-Sat
Eye-Sat es un Proyecto de nano satélites, dirigido por el CNES (Centre National d’Etudes Spatiales) y desarrollado principalmente por estudiantes de varias escuelas de ingeniería del territorio francés. El objetivo de este pequeño telescopio no solo radica en la oportunidad de realizar la demostración de distintos dispositivos tecnológicos, sino que también tiene como misión la adquisición de fotografías en la bandas de color e infrarrojo de la vía Láctea, así como el estudio de la intensidad y polarización de la luz Zodiacal. Los requerimientos de la misión exigen el desarrollo de un algoritmo de compresión de imágenes sin pérdidas para las imágenes “Color Filter Array” CFA (Bayer) e infrarrojas adquiridas por el satélite. Como miembro de la comisión consultativa para los sistemas espaciales, CNES ha seleccionado el estándar CCSDS-123.0-B como algoritmo base para cumplir los requerimientos de la misión. A este algoritmo se le añadirán modificaciones o mejoras, adaptadas a las imágenes tipo, con el fin de mejorar las prestaciones de compresión y de complejidad. La implementación y la optimización del algoritmo será desarrollada sobre la plataforma Xilinx Zynq® All Programmable SoC, el cual incluye una FPGA y un Dual-core ARM® Cortex™-A9 processor with NEONTM DSP/FPU Engine
Web Engineering for Workflow-based Applications: Models, Systems and Methodologies
This dissertation presents novel solutions for the construction of Workflow-based Web applications: The Web Engineering DSL Framework, a stakeholder-oriented Web Engineering methodology based on Domain-Specific Languages; the Workflow DSL for the efficient engineering of Web-based Workflows with strong stakeholder involvement; the Dialog DSL for the usability-oriented development of advanced Web-based dialogs; the Web Engineering Reuse Sphere enabling holistic, stakeholder-oriented reuse
Model-based programming environments for spreadsheets
Spreadsheets can be seen as a flexible programming environment. However, they lack some of the concepts of
regular programming languages, such as structured data types. This can lead the user to edit the spreadsheet in a
wrong way and perhaps cause corrupt or redundant data.
We devised a method for extraction of a relational model from a spreadsheet and the subsequent embedding of
the model back into the spreadsheet to create a model-based spreadsheet programming environment. The extraction
algorithm is specific for spreadsheets since it considers particularities such as layout and column arrangement. The
extracted model is used to generate formulas and visual elements that are then embedded in the spreadsheet helping
the user to edit data in a correct way.
We present preliminary experimental results from applying our approach to a sample of spreadsheets from the
EUSES Spreadsheet Corpus.
Finally, we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach.
A set of spreadsheet end users worked with two different model-based spreadsheets, and we present and analyze here
the results achieved.This work is funded by ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048. The first author is supported by the FCT grant SFRH/BPD/73358/2010
The SP theory of intelligence: benefits and applications
This article describes existing and expected benefits of the "SP theory of
intelligence", and some potential applications. The theory aims to simplify and
integrate ideas across artificial intelligence, mainstream computing, and human
perception and cognition, with information compression as a unifying theme. It
combines conceptual simplicity with descriptive and explanatory power across
several areas of computing and cognition. In the "SP machine" -- an expression
of the SP theory which is currently realized in the form of a computer model --
there is potential for an overall simplification of computing systems,
including software. The SP theory promises deeper insights and better solutions
in several areas of application including, most notably, unsupervised learning,
natural language processing, autonomous robots, computer vision, intelligent
databases, software engineering, information compression, medical diagnosis and
big data. There is also potential in areas such as the semantic web,
bioinformatics, structuring of documents, the detection of computer viruses,
data fusion, new kinds of computer, and the development of scientific theories.
The theory promises seamless integration of structures and functions within and
between different areas of application. The potential value, worldwide, of
these benefits and applications is at least $190 billion each year. Further
development would be facilitated by the creation of a high-parallel,
open-source version of the SP machine, available to researchers everywhere.Comment: arXiv admin note: substantial text overlap with arXiv:1212.022
- …