41 research outputs found
Basque-to-Spanish and Spanish-to-Basque machine translation for the health domain
[EU]Master Amaierako Lan honek medikuntza domeinuko euskara eta gaztelera arteko itzulpen automatiko sistema bat garatzeko helburuarekin emandako lehenengo urratsak aurkezten ditu. Corpus elebidun nahikoaren faltan, hainbat esperimentu burutu dira Itzulpen Automatiko Neuronalean erabiltzen diren parametroak domeinuz kanpoko corpusean aztertzeko; medikuntza domeinuan izandako jokaera ebaluatzeko ordea, eskuz itzulitako corpusa erabili da medikuntza domeinuko corpusen presentzia handituz entrenatutako sistema desberdinak probatzeko. Lortutako emaitzek deskribatutako helbururako bidean lehenengo aurrerapausoa suposatzen dute.[EN]This project presents the initial steps towards the objective of
developing a Machine Translation system for the health domain between
Basque and Spanish. In the absence of a big enough bilingual corpus,
several experiments have been carried out to test different Neural
Machine Translation parameters on an out-of-domain corpus; while
performance on the health domain has been evaluated with a manually
translated corpus in different systems trained with increasing presence
of health domain corpora. The results obtained represent a first step
forward to the described objective
A method for estimation of elasticities in metabolic networks using steady state and dynamic metabolomics data and linlog kinetics
BACKGROUND: Dynamic modeling of metabolic reaction networks under in vivo conditions is a crucial step in order to obtain a better understanding of the (dis)functioning of living cells. So far dynamic metabolic models generally have been based on mechanistic rate equations which often contain so many parameters that their identifiability from experimental data forms a serious problem. Recently, approximative rate equations, based on the linear logarithmic (linlog) format have been proposed as a suitable alternative with fewer parameters. RESULTS: In this paper we present a method for estimation of the kinetic model parameters, which are equal to the elasticities defined in Metabolic Control Analysis, from metabolite data obtained from dynamic as well as steady state perturbations, using the linlog kinetic format. Additionally, we address the question of parameter identifiability from dynamic perturbation data in the presence of noise. The method is illustrated using metabolite data generated with a dynamic model of the glycolytic pathway of Saccharomyces cerevisiae based on mechanistic rate equations. Elasticities are estimated from the generated data, which define the complete linlog kinetic model of the glycolysis. The effect of data noise on the accuracy of the estimated elasticities is presented. Finally, identifiable subset of parameters is determined using information on the standard deviations of the estimated elasticities through Monte Carlo (MC) simulations. CONCLUSION: The parameter estimation within the linlog kinetic framework as presented here allows the determination of the elasticities directly from experimental data from typical dynamic and/or steady state experiments. These elasticities allow the reconstruction of the full kinetic model of Saccharomyces cerevisiae, and the determination of the control coefficients. MC simulations revealed that certain elasticities are potentially unidentifiable from dynamic data only. Addition of steady state perturbation of enzyme activities solved this problem
When and Why Did Human Brains Decrease in Size? A New Change-Point Analysis and Insights From Brain Evolution in Ants
Human brain size nearly quadrupled in the six million years since Homo last shared a common ancestor with chimpanzees, but human brains are thought to have decreased in volume since the end of the last Ice Age. The timing and reason for this decrease is enigmatic. Here we use change-point analysis to estimate the timing of changes in the rate of hominin brain evolution. We find that hominin brains experienced positive rate changes at 2.1 and 1.5 million years ago, coincident with the early evolution of Homo and technological innovations evident in the archeological record. But we also find that human brain size reduction was surprisingly recent, occurring in the last 3,000 years. Our dating does not support hypotheses concerning brain size reduction as a by-product of body size reduction, a result of a shift to an agricultural diet, or a consequence of self-domestication. We suggest our analysis supports the hypothesis that the recent decrease in brain size may instead result from the externalization of knowledge and advantages of group-level decision-making due in part to the advent of social systems of distributed cognition and the storage and sharing of information. Humans live in social groups in which multiple brains contribute to the emergence of collective intelligence. Although difficult to study in the deep history of Homo, the impacts of group size, social organization, collective intelligence and other potential selective forces on brain evolution can be elucidated using ants as models. The remarkable ecological diversity of ants and their species richness encompasses forms convergent in aspects of human sociality, including large group size, agrarian life histories, division of labor, and collective cognition. Ants provide a wide range of social systems to generate and test hypotheses concerning brain size enlargement or reduction and aid in interpreting patterns of brain evolution identified in humans. Although humans and ants represent very different routes in social and cognitive evolution, the insights ants offer can broadly inform us of the selective forces that influence brain size
Basque-to-Spanish and Spanish-to-Basque machine translation for the health domain
[EU]Master Amaierako Lan honek medikuntza domeinuko euskara eta gaztelera arteko itzulpen automatiko sistema bat garatzeko helburuarekin emandako lehenengo urratsak aurkezten ditu. Corpus elebidun nahikoaren faltan, hainbat esperimentu burutu dira Itzulpen Automatiko Neuronalean erabiltzen diren parametroak domeinuz kanpoko corpusean aztertzeko; medikuntza domeinuan izandako jokaera ebaluatzeko ordea, eskuz itzulitako corpusa erabili da medikuntza domeinuko corpusen presentzia handituz entrenatutako sistema desberdinak probatzeko. Lortutako emaitzek deskribatutako helbururako bidean lehenengo aurrerapausoa suposatzen dute.[EN]This project presents the initial steps towards the objective of
developing a Machine Translation system for the health domain between
Basque and Spanish. In the absence of a big enough bilingual corpus,
several experiments have been carried out to test different Neural
Machine Translation parameters on an out-of-domain corpus; while
performance on the health domain has been evaluated with a manually
translated corpus in different systems trained with increasing presence
of health domain corpora. The results obtained represent a first step
forward to the described objective
Online Learning in Neural Machine Translation
[EN] High quality translations are in high demand these days. Although machine
translation offers acceptable performance, it is not sufficient in some cases and
human supervision is required. In order to ease the translation task of the human,
machine translation systems take part in this process. When a sentence in the
source language needs to be translated, it is fed to the system which outputs a
hypothesis translation. The human then, corrects this hypothesis (also known as
post-editing) in order to obtain a high quality translation. Being able to transfer
the knowledge that a human translator exhibit when post-editing a translation to
the machine translation system is a desirable feature, as it has been proven that a
more accurate machine translation system helps to increase the efficiency of the
post-editing process.
Because the post-editing scenario requires an already trained system, online
learning techniques are suited for this task. In this work, three online learning
algorithms have been proposed and applied to a neural machine translation sys-
tem in a post-editing scenario. They rely on the Passive-Aggressive online learn-
ing approach in which the model is updated after every sample in order to fulfil
a correctness criterion while remembering previously learned information. The
goal is to adapt and refine an already trained system with new samples on-the-
fly as the post-editing process takes place (hence, the update time must be kept
under control).
Moreover, these new algorithms are compared with well-stablished online
learning variants of the stochastic gradient descent algorithm. Results show im-
provements on the translation quality of the system after applying these algo-
rithms, reducing human effort in the post-editing process.[ES] La traducción de gran calidad está muy demandada en la actualidad. A pesar
de que la traducción automática ofrece unas prestaciones aceptables, en algunos
casos no es suficiente y es necesaria la supervisión humana. Para facilitar la tarea
de traducción del humano, los sistemas de traducción automática toman parte en
este proceso. Cuando una nueva oración en el idioma origen necesita ser tradu-
cida, esta se introduce en el sistema, el cual obtiene como salida una hipótesis de
traducción. El humano entonces, corrige esta hipótesis (también conocido como
post-editar) para obtener una traducción de mayor calidad. Ser capaz de transfe-
rir el conocimiento que el humano exhibe cuando realiza la tarea de post-edición
al sistema de traducción automática es una característica deseable puesto que se
ha demostrado que un sistema de traducción mas preciso ayuda a aumentar la
eficiencia del proceso de post-edición.
Debido a que el proceso de post-edición requiere un sistema ya entrenado, las
técnicas de aprendizaje en línea son las adecuadas para esta tarea. En este traba-
jo, se proponen tres algoritmos de aprendizaje en línea aplicados a un traductor
automático neuronal en un escenario de post-edición. Estos algoritmos se basan
en la aproximación en línea Passive-Aggressive en la cual el modelo se actualiza
después de cada muestra con el objetivo de cumplir un criterio de corrección a
la vez que manteniendo información previa aprendida. El objetivo es adaptar y
refinar un sistema ya entrenado con nuevas muestras al vuelo mientras el pro-
ceso de post-edición se lleva a cabo (por tanto, el tiempo de actualización debe
mantenerse bajo control).
Además, estos algoritmos se comparan con otras bien conocidas variantes en
línea del algoritmo de descenso por gradiente estocástico. Los resultados mues-
tran una mejora en la calidad de las traducciones después de aplicar estos algo-
ritmos, reduciendo así el esfuerzo humano en el proceso de post-edición.[CA] La traducció de gran qualitat es troba molt demanada en l’actualitat. Tot i
que la traducció automàtica oferix unes prestacions acceptables, en alguns casos
no és suficient i és necessària la supervisió humana. Per a facilitar la tasca de
traducció de l’humà, els sistemes de traducció automàtica prenen part en aquest
procés. Quan una nova oració en el llenguatge origen necessita ser traduïda,
esta s’introduïx en el sistema, el qual obté com a eixida una hipòtesi de traducció.
Llavors, l’humà corregix aquesta hipòtesi (també conegut com a post-editar) per a
obtindre una traducció de major qualitat. Ser capaços de transferir el coneixement
que l’ humà exhibix quan realitza la tasca de post-edició al sistema de traducció
automàtica és una característica desitjable ja que s’ha demostrat que un sistema
de traducció mes precís ajuda a augmentar l‘eficiència del procés de post-edició.
Pel fet que el procés de post-edició requerix un sistema ja entrenat, les tècniques
d’aprenentatge en línia són les adequades per aquesta tasca. En este treball,
es proposen tres algoritmes d’aprenentatge en línia aplicats a un traductor automàtic
neuronal en un escenari de post-edició. Estos algoritmes es basen en
l’aproximació en línia Passive-Aggressive en la qual el model s’actualitza després
de cada mostra amb l’objectiu de complir un criteri de correcció al mateix temps
que manté informació prèvia apresa. L’objectiu és adaptar i refinar un sistema ja
entrenat amb noves mostres al vol mentre el procés de post-edició es du a terme
(per tant, el temps d’actualització ha de mantenir-se controlat).
A més, estos algoritmes es comparen amb altres ben conegudes variants en
línia de l’algoritme de descens per gradient estocàstic. Els resultats mostren una
millora en la qualitat de les traduccions després d’aplicar estos algoritmes, reduint
així l’esforç humà en el procés de post-edició.Cebrián Chuliá, L. (2017). Aprendizaje en línea en traducción automática basada en redes neuronales. http://hdl.handle.net/10251/86299TFG
Snowmass Theory Frontier Report
This report summarizes the recent progress and promising future directions in
theoretical high-energy physics (HEP) identified within the Theory Frontier of
the 2021 Snowmass Process.Comment: Contribution to the US Community Study on the Future of Particle
Physics (Snowmass 2021), v2: fixed typo
Failure-awareness and dynamic adaptation in data scheduling
Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance
gbeta - a Language with Virtual Attributes, Block Structure, and Propagating, Dynamic Inheritance
A language design development process is presented which leads to a language, gbeta, with a tight integration of virtual classes, general block structure, and a multiple inheritance mechanism based on coarse-grained structural type equivalence. From this emerges the concept of propagating specialization. The power lies in the fact that a simple expression can have far-reaching but well-organized consequences, e.g., in one step causing the combination of families of classes, then by propagation the members of those families, and finally by propagation the methods of the members. Moreover, classes are first class values which can be constructed at run-time, and it is possible to inherit from classes whether or not they are compile-time constants, and whether or not they were created dynamically. It is also possible to change the class and structure of an existing object at run-time, preserving object identity. Even though such dynamism is normally not seen in statically type-checked languages, these constructs have been integrated without compromising the static type safety of the language