17 research outputs found
LEAN DATA ENGINEERING. COMBINING STATE OF THE ART PRINCIPLES TO PROCESS DATA EFFICIENTLYS
The present work was developed during an internship, under Erasmus+ Traineeship
program, in Fieldwork Robotics, a Cambridge based company that develops robots to
operate in agricultural fields. They collect data from commercial greenhouses with sensors
and real sense cameras, as well as with gripper cameras placed in the robotic arms. This
data is recorded mainly in bag files, consisting of unstructured data, such as images and
semi-structured data, such as metadata associated with both the conditions where the
images were taken and information about the robot itself.
Data was uploaded, extracted, cleaned and labelled manually before being used to
train Artificial Intelligence (AI) algorithms to identify raspberries during the harvesting
process. The amount of available data quickly escalates with every trip to the fields, which
creates an ever-growing need for an automated process.
This problem was addressed via the creation of a data engineering platform encom-
passing a data lake, data warehouse and its needed processing capabilities. This platform
was created following a series of principles entitled Lean Data Engineering Principles
(LDEP), and the systems that follows them are called Lean Data Engineering Systems
(LDES). These principles urge to start with the end in mind: process incoming batch or
real-time data with no resource wasting, limiting the costs to the absolutely necessary for
the job completion, in other words to be as lean as possible.
The LDEP principles are a combination of state-of-the-art ideas stemming from several
fields, such as data engineering, software engineering and DevOps, leveraging cloud
technologies at its core.
The proposed custom-made solution enabled the company to scale its data operations,
being able to label images almost ten times faster while reducing over 99.9% of its associated
costs in comparison to the previous process. In addition, the data lifecycle time has been
reduced from weeks to hours while maintaining coherent data quality results, being able,
for instance, to correctly identify 94% of the labels in comparison to a human counterpart.Este trabalho foi desenvolvido durante um estágio no âmbito do programa Erasmus+
Traineeship, na Fieldwork Robotics, uma empresa sediada em Cambridge que desenvolve
robôs agrÃcolas. Estes robôs recolhem dados no terreno com sensores e câmeras real-
sense, localizados na estrutura de alumÃnio e nos pulsos dos braços robóticos. Os dados
recolhidos são ficheiros contendo dados não estruturados, tais como imagens, e dados semi-
-estruturados, associados às condições em que as imagens foram recolhidas. Originalmente,
o processo de tratamento dos dados recolhidos (upload, extração, limpeza e etiquetagem)
era feito de forma manual, sendo depois utilizados para treinar algoritmos de Inteligência
Artificial (IA) para identificar framboesas durante o processo de colheita.
Como a quantidade de dados aumentava substancialmente com cada ida ao terreno,
verificou-se uma necessidade crescente de um processo automatizado. Este problema foi
endereçado com a criação de uma plataforma de engenharia de dados, composta por um
data lake, uma data warehouse e o respetivo processamento, para movimentar os dados nas
diferentes etapas do processo. Esta plataforma foi criada seguindo uma série de princÃpios
intitulados Lean Data Engineering Principles (LDEP), sendo os sistemas que os seguem
intitulados de Lean Data Engineering Systems (LDES). Estes princÃpios incitam a começar
com o fim em mente: processar dados em batch ou em tempo real, sem desperdÃcio de
recursos, limitando os custos ao absolutamente necessário para a concluir o trabalho, ou
seja, tornando-os o mais lean possÃvel.
Os LDEP combinam vertentes do estado da arte em diversas áreas, tais como engenharia
de dados, engenharia de software, DevOps, tendo no seu cerne as tecnologias na cloud. O
novo processo permitiu à empresa escalar as suas operações de dados, tornando-se capaz
de etiquetar imagens quase 10× mais rápido e reduzindo em mais de 99,9% os custos
associados, quando comparado com o processo anterior. Adicionalmente, o ciclo de vida
dos dados foi reduzido de semanas para horas, mantendo uma qualidade equiparável, ao
ser capaz de identificar corretamente 94% das etiquetas em comparação com um homólogo
humano
Desenvolvimento de um sistema de gestão técnica centralizado
A building management system has user confort and comodity, as well as
reduction of energy consumption, as its main goals. To accomplish this, it is
necessary to integrate sensors and actuators as to control and retrieve information
about the physical processes of a building. These processes include
control over illumination and temperature of a room, and even access control.
The information, after processed, allows a more intelligent and efficient way
of controlling electronic and mechanical systems of a building, such as HVAC
and illumination, while also trying to reduce energy expenditure. The emergence
of IoT allowed to increment the number of low level devices on these
systems, thanks to their cost reduction, increased performance and improved
connectivity. To better make use of the new paradigm, it is required a modern
system with multi-protocol capabilities, as well as tools for data processing and
presentation. Therefore, the most relevant industrial and building automation
technologies were studied, as to define a modern, IoT compatible, architecture
and choose its constituting software platforms. InfluxDB, EdgeX Foundry
and Node-Red were the selected technologies for the database, gateway and
dashboard, respectively, as they closely align with the requirements set. This
way, a demonstrator was developed in order to assess a systems’s operation,
using these technologies, as well as to evaluate EdgeX’s performance for jitter
and latency. From the obtained results, it was verified that, although versatile
and complete, this platform underperforms for real-time applications and high
reading rate workloads.Um Sistema de Gestão Centralizado tem por objetivo aumentar a comodidade
e conforto dos utilizadores de um edifÃcio, ao mesmo tempo que tenta
reduzir os consumos energéticos do mesmo. Para isso, torna-se necessário
integrar sensores e atuadores para controlar e recolher informação acerca
dos processos fÃsicos existentes. Nestes processos estão incluÃdos a iluminação
e temperatura de, por exemplo, uma sala, ou até controlo de acesso.
Esta informação, após processamento, permite, de uma maneira mais inteligente
e eficiente, controlar os sistemas eletrónicos e mecânicos de um
edifÃcio, tais como os sistemas de AVAC ou iluminação, tentando, simultaneamente,
diminuir gastos energéticos. O aparecimento do IoT, tornou possÃvel o
aumento do número de dispositivos de baixo nÃvel nestes sistemas, graças Ã
redução de custo e aumento de performance e conectividade que estes têm
sofrido. Para melhor usufruir deste paradigma, é necessário um sistema moderno,
com capacidade de conexão multi-protocolo e ferramentas para processamento
e apresentação de informação. Neste sentido, fez-se um estudo das
tecnologias mais relevantes da área da automação industrial e de edifÃcios,
de modo a definir uma arquitetura moderna compatÃvel com IoT e a escolher
as plataformas de software que a constituem. InfluxDB, EdgeX Foundry e
Node-Red foram as tecnologias escolhidas para a base de dados, gateway
e dashboard, respetivamente, por serem as que mais se aproximaram dos
requisitos definidos. Assim, foi desenvolvido um demonstrador que permitiu
verificar o funcionamento de um sistema com a utilização destas tecnologias,
assim como avaliar a performance da plataforma EdgeX em termos de jitter
e latência. Verificou-se a partir dos resultados obtidos, que embora versátil e
completa, esta plataforma ficou aquém do que se pretendia, tanto para aplicações
real-time, como para as que necessitem de uma taxa de leitura de
sensores elevada.Mestrado em Engenharia Eletrónica e Telecomunicaçõe
Abstraction-Based Program Specialization
Abstraction-based program specialization (ABPS) was investigated so that it could be applied to Java and make automated improvements to help with finite state verification. Research was conducted on partial evaluation and abstract interpretation. A prototype to do abstraction-based program specialization was constructed by Hatcliff, Dwyer, and Laubach. This work scaled the prototype to a subset of Java and mad some general improvements. Today's software is large and complex. Because of this complexity, traditional validation and program testing techniques are hard to apply. One method in use is finite-state verification (FSV). FSV requires a program to be modeled as a finite-state transition system. Currently, the modeling is done by hand, an error-prone process. Also, the state space of a non-trivial program is extremely large (potentially infinite). This thesis created an ABPS that uses partial evaluation and abstract interpretation to reduce a program model's state pace. Partial evaluation performs symbolic execution; it specializes programs by folding constants and pruning infeasible branches from the computation tree. The abstract interpretation component r places program data types with small sets of abstract tokens that capture information relevant to properties being verified. This can dramatically reduce a program's stat space. Abstraction-based program specialization is a viable option for improving code and automating the use of finite state verifiers. Much work still needs to be done to completely scale abstraction-based program specialization to include all of Java and to make the process more automatic. Finally, several examples illustrate how ABPS can be applied to automatically create models of simple software systems
The Reflex Sandbox : an experimentation environment for an aspect-oriented Kernel
Reflex es un núcleo versátil para la programación orientada aspectos en Java. Provee de las abstracciones básicas, estructurales y de comportamiento, que permiten implementar una variedad de técnicas orientadas a aspectos. Esta tesis estudia dos tópicos fundamentales. En primer lugar, el desarrollo formal, utilizando el lenguaje Haskell, de las construcciones fundamentales del modelo Reflex para reflexión parcial de comportamiento. Este desarrollo abarca el diseño de un lenguaje, llamado Kernel, el cual es una extensión reflexiva de un lenguaje orientado a objetos simple. La semántica operacional del lenguaje Kernel es presentada mediante una máquina de ejecución abstracta. El otro tópico fundamental que estudia esta tesis es validar que el modelo de reflexión parcial de comportamiento es suficientemente expresivo para proveer de semántica a un subconjunto del lenguaje AspectJ. Con este fin, se desarrolló el Reflex Sandbox: un ambiente de experimentación en Haskell para el modelo Reflex. Tanto el desarrollo formal del modelo de reflexión parcial de comportamiento como la validación del soporte de AspectJ, son estudiados en el contexto del Reflex Sandbox. La validación abarca la definición de un lenguaje orientado a aspectos que caracteriza el enfoque de AspectJ a la programación orientada a aspectos, asà como la definición de su máquina de ejecución abstracta. También se presenta un compilador que transforma programas escritos en este lenguaje al lenguaje Kernel. Este proceso de compilación provee los fundamentos para entender como dicha transformación puede ser realizada. El proceso de compilación también fue implementado en Java, pero transformando programas AspectJ a programas Reflex. También se presentan mediciones preliminares del desempeño de un programa compilado y ejecutado en Reflex y un programa compilado, y ejecutado con el compilador AspectJ
Abstract machine design for increasingly more powerful ALGOL-languages
This thesis presents the work and results of an investigation into language implementation. Some work on language design has also been undertaken. Three languages have been implemented which may be described as members of the Algol family with features and constructs typical of that family. These include block structure, nested routines, variables, and dynamic allocation of data structures such as vectors and user-defined structures. The underlying technique behind these Implementations has been that of abstract machine modelling. For each language an abstract intermediate code has been designed. Unlike other such codes we have raised the level of abstraction so that the code lies closer to the language than that of the real machine on which the language may be implemented. Each successive language is more powerful than the previous by the addition of constructs which were felt to be useful. These were routines as assignable values, dynamically initialised constant locations, types as assignable values and lists. The three languages were, Algol R a "typical" Algol based on Algol W h an Algol with routines as assignable values, enumerated types, restriction of pointers to sets of user-defined structures, and constant locations. nsl a polymorphic Algol with types as assignable values, routines as assignable values, lists, and type- and value-constant locations. The intermediate code for Algol R was based on an existing abstract machine. The code level was raised and designed so that it should be used as the input to a code generator. Such a code generator was written improving a technique called simulated evaluation. The language h was designed and a recursive descent compiler written for it which produced an intermediate code similar in level to the previous one. Again a simulated evaluation code generator was written, this time generating code for an interpreted abstract machine which implemented routines as assignable and storable values. Finally the language nsl was designed. The compiler for it produced code for an orthogonal, very high level tagged architecture abstract machine which was implemented by interpretation. This machine implemented polymorphism, assignable routine values and type- and value- constancy. Descriptions of the intermediate codes/abstract machines are given in appendices
Automated Design Simplification of Quantum Program Synthesis
Quantum computers are a new and emerging technology that offer promises of being able to outperform classical machines. However, they differ from classical machines so much that they provide unique challenges to development. Working on quantum machines is currently very difficult, requiring a large amount of expertise in a great deal of areas. In order to facilitate practical software engineering methods it will be necessary to greatly simplify this process. To provide this process simplification we identify automation methods and approaches that can perform steps of quantum program compilation to greatly reduce the need for human expertise.
The first contribution looks at integrating an existing classical method into the quantum model. This is done through the application of a Genetic Improvement algorithm. The second contribution looks at modelling the quantum compilation problem in a way compatible with a classical model. This is done through the generation of a Planning Domain Definition Language (PDDL) model. The third and final contribution looks at simplifying the building of a compilation stack. This is done by using a neural network to make decisions about what steps to add to the compilation stack.
The results of this show a set of automated methods that produce error rates competitive with the standard quantum compilation methods. In addition, these methods require much less expertise about specific quantum hardware or the quantum compilation stack and are built to be compatible with the current IBM Quantum Experience software stack
Virtualisation pour Specialisation et Extension d'Environnements d'Execution
An application runtime is the set of software elements that represent an application during its execution. Application runtimes should be adaptable todifferent contexts. Advances in computing technology both in hardware and software indeed demand it. For example, on one hand we can think aboutextending a programming language to enhance the developers’ productivity.On the other hand we can also think about transparently reducing the memory footprint of applications to make them fit in constrained resourcescenarios e.g., low networks or limited memory availability. We propose Espell, a virtualization infrastructure for object-oriented high-level language runtimes. Espell provides a general purpose infrastructure to control and manipulate object-oriented runtimes in different situations. A first-class representation of an object-oriented runtime provides a high-level API for the manipulation of such runtime. A hypervisor uses this first-class object and manipulates it either directly or by executing arbitrary expressions into it. We show with our prototype that this infrastructure supports language bootstrapping and application runtime tailoring. Using bootstrapping we describe an object-oriented high-level language initialization in terms of itself. A bootstrapped language takes advantage of its own abstractions and is easier to extend. With application runtime tailoring we generate specialized applications by extracting the elements of a program that are used during execution. A tailored application encompasses only the classes and methods it needs and avoids the code bloat that appears from the usage of third-party libraries and frameworks.Un environnement d’exécution est l’ensemble des éléments logiciels qui représentent une application pendant son exécution. Les environnements d’exécution doivent être adaptables à différents contextes. Les progrès des technologies de l’information, tant au niveau logiciel qu’au niveau matériel, rendent ces adaptations nécessaires. Par exemple, nous pouvons envisager d’étendre un language de programmation pour améliorer la productivité des developpeurs. Aussi, nous pouvons envisager de réduire la consommation memoire des applications de manière transparente afin de les adapter à certaines contraintes d’exécution e.g., des réseaux lents ou de la mémoire limités.Nous proposons Espell, une infrastructure pour la virtualisation d’environnement d’execution de langages orienté-objets haut-niveau. Espell fournit une infrastructure généraliste pour le contrôle et la manipulation d’environnements d’exécution pour différentes situations. Une représentation de ’premier-ordre’ de l’environnement d’exécution orienté objet fournit une interface haut-niveau qui permet la manipulation de ces environnements. Un hyperviseur est client de cette représentation de ’premier-ordre’ et le manipule soit directement, soit en y exécutant des expressions arbitraires. Nous montrons au travers de notre prototype que cet infrastructure supporte le bootstrapping (i.e., l’amorçage ou initialisation circulaire) des languages et le tailoring (i.e., la construction sur-mesure ou ’taille’) d’environnement d’exécution. En utilisant l’amorçage nous initialisons un language orienté-objet haut-niveau qui est auto-décrit. Un langage amorcé profite des ses propres abstractions se montrant donc plus simple à étendre. La taille d’environnements d’exécution est une technique qui génère une application spécialisé en extrayant seulement le code utilisé pendant l’exécution d’un programme. Une application taillée inclut seulement les classes et méthodes qu’elle nécessite, et évite que des librairies et desframeworks externes surchargent inutilement la base de code