897 research outputs found
The Use of Controlled Vocabularies and Structured Expressions in the Assurance of CPS
To date, work on the development of assurance cases has largely been concerned with the broad structure and content of arguments to contextualise the data. However, at a more detailed level, use of natural language in an argument can lead to conflicting terminology, to difficulties in understanding the nature of the claims being made or to logical inferences which are obscure to the readers of the argument. This problem has become increasingly complex as more and more suppliers are involved in the development chain, making it more difficult to evaluate the strengths and weaknesses of assurance data or to re-use it. This paper explores the development of controlled vocabulary and structured expressions for CPS in the automotive domain, using the Semantics of Business Vocabulary and Business Rules (SBVR) to improve communication and to provide presents some formal consistency checking of content. We highlight the challenges this work has exposed. Keywords: safety, assurance, controlled language, SBVR, automotive
A methodological approach on the creation of trustful test suites for grammar error detection
Machine translationâs research has been expanding over time and so has the need to
automatically detect and correct errors in texts. As such, Unbabel combines machine translation
with human editors in post-edition to provide high quality translations. In order to assist
post-editors in these tasks, a proprietary error detection tool called Smartcheck was developed by
Unbabel to identify errors and suggest corrections.
The state-of-the-art method of identifying translation errors depends on curated annotated
texts (associated with error-type categories), which are fed to machine translation systems as
their evaluation standard, i.e. the test suites to evaluate a systemâs error detection accuracy. It is
commonly assumed that evaluation sets are reliable and representative of the content the systems
translate, leading to the assumption that the root problem usually relates to grammar-checking
rules. However, the issue may instead lie in the quality of the evaluation set. If so, then the
decisions made upon evaluation will possibly even have the opposite effect to the one intended.
Thus, it is of utmost importance to have suitable datasets with representative data of the
structures needed for each system, the same for Smartcheck.
With this in mind, this dissertation developed and implemented a new methodology on
creating reliable and revised test suites to be applied on the evaluation process of MT systems
and error detection tools. Using the resulting curated test suites to evaluate proprietary systems
and tools to Unbabel, it became possible to trust the conclusions and decisions made from said
evaluations. This methodology accomplished robust identification of problematic error types,
grammar-checking rules, and language- and/or register-specific issues, therefore allowing
production measures to be adopted. With Smartcheckâs (now reliable and accurate) correction
suggestions and the improvement on post-edition revision, the work presented hereafter led to an
improvement on the translation quality provided to customers.O presente trabalho focou-se na avaliação do desempenho de uma ferramenta proprietåria
da Unbabel, para detecção automåtica de erros, baseada em segmentos previamente anotados
pela comunidade de anotadores, o Smartcheck. Assim, foi proposta uma metodologia para
criação de um corpus de teste (do inglĂȘs test suites) baseado em dados de referĂȘncia com
estruturas relevantes (do inglĂȘs gold data). Deste modo, tornou-se possĂvel melhorar a qualidade
das sugestÔes de correção de erros do Smartcheck e, consequentemente, das traduçÔes facultadas.
Para além do objetivo inicial, a nova metodologia permitiu assegurar uma avaliação rigorosa,
apropriada e fundamentada relativamente Ă s regras usadas pelo Smartcheck, para identificar
possĂveis erros de tradução, assim como avaliar outras ferramentas e sistemas de tradução
automåtica da Unbabel. Recentemente, assistiu-se também a uma fusão da Lingo24 com a
Unbabel e, por essa razĂŁo, os dados presentes no corpus incluem conteĂșdo traduzido por ambas.
Como tal, o trabalho desenvolvido contribuiu inclusivamente para a recente integração da
Lingo24.
A Secção 2 foi dedicada à apresentação da Unbabel, na qual se referem os processos de
controlo de qualidade utilizados para assegurar nĂveis de qualidade exigidos e se descreve
pormenorizadamente a ferramenta em foco, o Smartcheck. A Secção 3 focou-se no estado da arte
da Tradução Automåtica e em processos de controlo de qualidade, dando especial atenção a
corpora de teste e Ă influĂȘncia dos mesmos. AlĂ©m disso, foi tambĂ©m incluĂda uma descrição
relativa ao desenvolvimento de ferramentas automåticas de deteção e correção de erros, criadas
para aperfeiçoar os textos provenientes de traduçÔes automåticas.
A metodologia criada, descrita na Secção 4, foi dividida em trĂȘs partes principais:
avaliação piloto relativa às regras preexistentes do Smartcheck; anålise de causas de erros (do
inglĂȘs root-cause analysis); e, por fim, construção de um novo corpus de teste, com dados mais
recentes e corrigidos.
O primeiro passo na metodologia consistiu na avaliação do desempenho da ferramenta
em foco na presente tese. Para tal, foi realizada uma anĂĄlise piloto na qual cada regra utilizada
pelo Smartcheck foi avaliada de acordo com métricas comumente aplicadas para avaliação de
sistemas de deteção de erros, como o nĂșmero de verdadeiros positivos (true positives) - casos em
que o sistema conseguiu corretamente identificar erros -, de falsos negativos (false negatives) -
casos em que existia um erro, mas o sistema não o identificou - e de falsos positivos (false positives) - casos em que o sistema incorretamente considerou existir erros. Outras métricas
utilizadas para avaliação consistiram no cålculo de Precision, Recall, e F1-score, a partir dos
valores obtidos das métricas anteriormente mencionadas. Tendo terminado a avaliação piloto,
concluiu-se que nem todas as regras foram passĂveis de avaliação (razĂŁo pela qual se tornou
impossĂvel averiguar o desempenho individual para cada regra) e, quanto Ă s que foram avaliadas,
os resultados nĂŁo foram considerados satisfatĂłrios. Isto porque, as regras nĂŁo identificavam erros
existentes nas traduçÔes e consideravam como problemĂĄticos inĂșmeros segmentos
gramaticalmente corretos.
A segunda etapa da metodologia surgiu, entĂŁo, como tentativa de identificar possĂveis
razÔes pelas quais o Smartcheck e as regras associadas demonstraram um baixo desempenho. Em
vista desse objetivo, foi feita uma anĂĄlise na qual foi colocada a hipĂłtese de que as regras teriam
sido avaliadas com um corpus de teste não apropriado e obsoleto, explicando assim as métricas
muito baixas da avaliação piloto. Esta hipótese surgiu uma vez que foi não só considerada a
possibilidade de os dados do corpus não serem representativos das traduçÔes feitas atualmente,
mas também pelo facto de as estruturas consideradas problemåticas para os sistemas de tradução
serem alteradas constantemente. De modo a corroborar a hipĂłtese colocada, o corpus foi
analisado com base em variados critérios: qual o tipo de tradução dos dados - se os segmentos
analisados tinham ou nĂŁo sido previamente revisto por pĂłs-editores antes da respetiva submissĂŁo;
existĂȘncia de segmentos duplicados ou cujo texto de partida (do inglĂȘs source text) poderia
conter erros - i.e. dados ruidosos; e revisão das anotaçÔes e das severidades associadas a cada
erro, de acordo com tipologias e diretrizes especĂficas da Unbabel - considerando o nĂșmero de
anotaçÔes/severidades correta e incorretamente atribuĂdas, assim como em falta. Uma vez
finalizada a anĂĄlise, concluĂmos que cerca de 20% dos dados correspondiam a duplicaçÔes -
tanto para o registo formal como para o informal -, que entre 15-25% das anotaçÔes foram
consideradas incorretas e que apenas metade das severidades foram corretamente atribuĂdas.
Assim sendo, considerĂĄmos que seria mais vantajoso criar um novo corpus representativo e
refinado, ao invés de corrigir todas as anotaçÔes incorretas do corpus previamente usado.
O terceiro e Ășltimo passo da metodologia consistiu na construção de um novo corpus de
teste com 27 500 exemplos previamente anotados de traduçÔes automåticas. Os procedimentos
para a criação deste novo corpus incluĂram: filtragem de um conjunto de traduçÔes automĂĄticas,
com dados representativos para todas as lĂnguas suportadas pela Unbabel; distinção entre segmentos dependentes e nĂŁo dependentes de contexto (uma limitação do corpus prĂ©vio);
exclusĂŁo de exemplos duplicados e de casos com textos de partida problemĂĄticos; e, por fim,
revisĂŁo por parte de linguistas e tradutores das anotaçÔes atribuĂdas, seguindo tipologias
proprietĂĄrias. Este Ășltimo procedimento foi ainda subdividido em: uma avaliação geral, de modo
a garantir que as traduçÔes transmitiam de forma coerente, fluĂda e apropriada a mensagem do
texto de partida e que, para alĂ©m disso, seguiam regras especĂficas para cada lĂngua; uma
avaliação focada em especificidades por cliente, de modo a assegurar diretrizes existentes; e uma
revisão de severidades associadas a cada anotação.
Tendo sido a metodologia dada como terminada, o corpus de teste consistia agora num
conjunto de dados de confiança, capaz de avaliar sistemas de tradução automåtica e ferramentas
como o Smartcheck de uma forma objetiva e fundamentada. Posto isto, as vårias avaliaçÔes
realizadas - descritas na Secção 5 - usaram os dados compreendidos no corpus como termo de
comparação. A primeira avaliação teve como objetivo principal comparar os resultados obtidos
na anålise piloto quanto às regras do Smartcheck com os resultados de uma nova avaliação das
mesmas usando o novo corpus de teste, de forma a chegar a conclusĂ”es mais fiĂĄveis e credĂveis.
A partir desta, foi possĂvel concluir nĂŁo sĂł que, contrariamente Ă s conclusĂ”es anteriores, todas as
regras sĂŁo agora passĂveis de avaliação, mas tambĂ©m que o nĂșmero de casos em que o
Smartcheck incorretamente identificava segmentos como problemĂĄticos foi reduzido. A
avaliação seguinte comparou anotaçÔes recorrendo a uma matriz de confusĂŁo (do inglĂȘs
confusion matrix) entre previsÔes concedidas tanto pelo Smartcheck como pelo corpus de teste.
Deste modo, foi possĂvel identificar quais os tipos de erros mais frequentes e quais os tipos mais
(e menos) problemĂĄticos de identificar pelo sistema. Assim, o corpus de teste foi considerado
como gold standard de modo a realizar uma avaliação global do Smartcheck, calculando o
nĂșmero total de falsos positivos (atingindo cerca de 45%), falsos negativos (com 35%) e
verdadeiros positivos (aproximadamente 20%). Quanto aos verdadeiros positivos, estes foram
divididos em dois tipos: segmentos corretamente identificados pelo Smartcheck como erro, mas
que foram classificados incorretamente (cerca de 11%); e erros em que tanto a extensĂŁo como a
classificação foram atribuĂdas corretamente (a rondar os 8% do nĂșmero total de anotaçÔes). A
terceira e Ășltima anĂĄlise recorreu aos totais obtidos na avaliação anterior para calcular valores
para mĂ©tricas como Precision, Recall e F1-score para cada lĂngua e para cada registo suportado.
Desta forma, foi possĂvel concluir que, quanto Ă primeira mĂ©trica, a mĂ©dia entre registos estava bastante equilibrada, mas o mesmo nĂŁo se verificou em Recall nem F1-score, uma vez que o
registo formal atingiu valores superiores. Para além disso, recorremos ainda ao corpus para
avaliar spell checkers usados pela Unbabel e, analisando os resultados obtidos, pudemos concluir
que o spell checker em uso obteve a avaliação mais baixa. Tendo isto em conta, foi decidido que
seria entĂŁo preferĂvel substituĂ-lo pelo spell checker com a melhor avaliação, de modo a reduzir o
nĂșmero de erros nas traduçÔes e assim melhorar a qualidade das mesmas.
Todo o trabalho realizado pÎde ser implementado em vårios outros campos para além do
inicialmente estabelecido, i.e. para além da avaliação sistemåtica da ferramenta Smartcheck.
Demonstrando, deste modo, todo o impacto que uma anĂĄlise bem fundamentada pode ter no
processo de tomada de decisĂŁo. Isto porque, sem um corpus de teste representativo e estruturado,
as avaliaçÔes feitas não seriam vålidas e os resultados obtidos facilmente levariam a conclusÔes
impróprias ou até nocivas para o desenvolvimento dos sistemas e ferramentas em questão
Generative AI in the Construction Industry: A State-of-the-art Analysis
The construction industry is a vital sector of the global economy, but it
faces many productivity challenges in various processes, such as design,
planning, procurement, inspection, and maintenance. Generative artificial
intelligence (AI), which can create novel and realistic data or content, such
as text, image, video, or code, based on some input or prior knowledge, offers
innovative and disruptive solutions to address these challenges. However, there
is a gap in the literature on the current state, opportunities, and challenges
of generative AI in the construction industry. This study aims to fill this gap
by providing a state-of-the-art analysis of generative AI in construction, with
three objectives: (1) to review and categorize the existing and emerging
generative AI opportunities and challenges in the construction industry; (2) to
propose a framework for construction firms to build customized generative AI
solutions using their own data, comprising steps such as data collection,
dataset curation, training custom large language model (LLM), model evaluation,
and deployment; and (3) to demonstrate the framework via a case study of
developing a generative model for querying contract documents. The results show
that retrieval augmented generation (RAG) improves the baseline LLM by 5.2,
9.4, and 4.8% in terms of quality, relevance, and reproducibility. This study
provides academics and construction professionals with a comprehensive analysis
and practical framework to guide the adoption of generative AI techniques to
enhance productivity, quality, safety, and sustainability across the
construction industry.Comment: 74 pages, 11 figures, 20 table
Latent Print Examination and Human Factors: Improving the Practice Through a Systems Approach: The Report of the Expert Working Group on Human Factors in Latent Print Analysis
Fingerprints have provided a valuable method of personal identification in forensic science and criminal investigations for more than 100 years. Fingerprints left at crime scenes generally are latent printsâunintentional reproductions of the arrangement of ridges on the skin made by the transfer of materials (such as amino acids, proteins, polypeptides, and salts) to a surface. Palms and the soles of feet also have friction ridge skin that can leave latent prints. The examination of a latent print consists of a series of steps involving a comparison of the latent print to a known (or exemplar) print. Courts have accepted latent print evidence for the past century. However, several high-profile cases in the United States and abroad have highlighted the fact that human errors can occur, and litigation and expressions of concern over the evidentiary reliability of latent print examinations and other forensic identification procedures has increased in the last decade.
âHuman factorsâ issues can arise in any experience- and judgment-based analytical process such as latent print examination. Inadequate training, extraneous knowledge about the suspects in the case or other matters, poor judgment, health problems, limitations of vision, complex technology, and stress are but a few factors that can contribute to errors. A lack of standards or quality control, poor management, insufficient resources, and substandard working conditions constitute other potentially contributing factors
Roberto Gerhardâs Sound Compositions: A Historical-Philological Perspective. Archive, Process, Intent and reenactment
This research advances the current state of knowledge in the field of early tape music both empirically and methodologically. The purpose of this study is to evaluate the impact that the electronic medium exerted in the musical thinking of Roberto Gerhard, one of the most outspoken, prolific and influential composers in the Spanish diaspora whose musical legacy, for the most part unknown, is a major landmark in the early history of electroacoustic music. Gerhardâs personal tape collection, one of the largest historical archives of its kind reported in the literature, is exceptional for both its antiquity (50+-year-old tapes) and its abundance of production materials. Through the digitisation and analysis of the composerâs tape collection this research argues that the empirical study of audio documents sets out a basis for a broader understanding of textual processes. More specifically, the research demonstrates that the reconstruction of works based on magnetic tape sketches is a powerful method to advance the understanding of early tape music. This research also examines Gerhardâs sound compositions in relation to the post-war context in which they were composed. Finally, this research presents performance documentation that proposes an approach to the electroacoustic music repertoire in which creativity is not at odds with rigor and critical discernment demonstrating that archival study can be closely aligned to the concept of re-enactment
Smart Tech is all Around us â Bridging Employee Vulnerability with Organizational Active Trust-Building
Public and academic opinion remains divided regarding the benefits and pitfalls of datafication technology in organizations, particularly regarding their impact on employees. Taking a dual-process perspective on trust, we propose that datafication technology can create small, erratic surprises in the workplace that highlight employee vulnerability and increase employeesâ reliance on the systematic processing of trust. We argue that these surprises precipitate a phase in the employment relationship in which employees more actively weigh trust-related cues, and the employer should therefore engage in active trust management to protect and strengthen the relationship. Our paper develops a framework of symbolic and substantive strategies to guide organizationsâ active trust management efforts to (re-)create situational normality, root goodwill intentions, and enable a more balanced interdependence between the organization and its employees. We discuss the implications of our paper for reconciling competing narratives about the future of work and for developing an understanding of trust processes.</p
Principles of Security and Trust
This open access book constitutes the proceedings of the 8th International Conference on Principles of Security and Trust, POST 2019, which took place in Prague, Czech Republic, in April 2019, held as part of the European Joint Conference on Theory and Practice of Software, ETAPS 2019. The 10 papers presented in this volume were carefully reviewed and selected from 27 submissions. They deal with theoretical and foundational aspects of security and trust, including on new theoretical results, practical applications of existing foundational ideas, and innovative approaches stimulated by pressing practical problems
Proceedings of the International Workshop on EuroPLOT Persuasive Technology for Learning, Education and Teaching (IWEPLET 2013)
"This book contains the proceedings of the International Workshop on EuroPLOT Persuasive Technology for Learning, Education and Teaching (IWEPLET) 2013 which was held on 16.-17.September 2013 in Paphos (Cyprus) in conjunction with the EC-TEL conference. The workshop and hence the proceedings are divided in two parts: on Day 1 the EuroPLOT project and its results are introduced, with papers about the specific case studies and their evaluation. On Day 2, peer-reviewed papers are presented which address specific topics and issues going beyond the EuroPLOT scope. This workshop is one of the deliverables (D 2.6) of the EuroPLOT project, which has been funded from November 2010 â October 2013 by the Education, Audiovisual and Culture Executive Agency (EACEA) of the European Commission through the Lifelong Learning Programme (LLL) by grant #511633. The purpose of this project was to develop and evaluate Persuasive Learning Objects and Technologies (PLOTS), based on ideas of BJ Fogg. The purpose of this workshop is to summarize the findings obtained during this project and disseminate them to an interested audience. Furthermore, it shall foster discussions about the future of persuasive technology and design in the context of learning, education and teaching. The international community working in this area of research is relatively small. Nevertheless, we have received a number of high-quality submissions which went through a peer-review process before being selected for presentation and publication. We hope that the information found in this book is useful to the reader and that more interest in this novel approach of persuasive design for teaching/education/learning is stimulated. We are very grateful to the organisers of EC-TEL 2013 for allowing to host IWEPLET 2013 within their organisational facilities which helped us a lot in preparing this event. I am also very grateful to everyone in the EuroPLOT team for collaborating so effectively in these three years towards creating excellent outputs, and for being such a nice group with a very positive spirit also beyond work. And finally I would like to thank the EACEA for providing the financial resources for the EuroPLOT project and for being very helpful when needed. This funding made it possible to organise the IWEPLET workshop without charging a fee from the participants.
Music Composed For Calm And Catharsis Using A Compositional Toolkit For Emotional Evocation - Inspired By And Directed Towards Healthcare Contexts And Self-Managed Wellness
Emotional experience through music listening is a universal experience. In the age of COVID-19 and an ever-mentally enslaved population, music that encourages calm and/or catharsis is more relevant than ever (Gallagher et al., 2020).
As composers, can we form a framework for and create music to pointedly evoke an intentional
emotion?
This dissertation seeks to build on the solid foundation of music and emotion researchersâ past theories, and demonstrate how to further utilise the power that music has in both our everyday lives, and also in healthcare settings â providing an output of a large suite of music for use for calm and catharsis, and a Compositional Toolbox for Emotional Evocation that composers might use to effect positive emotional change.
In two pilot studies: one for children and one for adults, this dissertation tests music written using said Toolbox, to observe its effect on arousal and pleasure.
The studies also utilise visuals as a secondary means of sensory control, and to investigate whether the multisensory application of music and visuals enhances emotional evocation over isolated experience.
Participants rated on a Likert-type scale, how they think each sample would make someone feel, or how it made them feel. An analysis of pieces from these studies is included in this dissertation. Mixed-method, deductive, and thematic analysis was used for data, which was collected via surveys and interviews.
It was found that music using the Toolbox was more emotionally evocative, more calming, and happier overall than that written without. Most of the pieces achieved their emotional aims, and positive correlations between the use of music and visuals together have arisen. Music without the visuals appeared to be calmer than that with visuals in one of the studies. This dissertation begins to promote the use of the Compositional Toolbox for Emotional Evocation as a framework for emotional composition
- âŠ