7 research outputs found
Personalizacijaprocesaelektronskogučenjaprimenomsistemazagenerisanjepreporukazasnovanognatehnikamakolaborativnogtagovanja
The research topic involves personalization of an e‐learning system based on collaborative tagging techniques integrated in a recommender system. Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so‐called tags. The systems can be distinguished according to what kind of resources are supported. Besides helping user to organize his or her personal collections, a tag also can be regarded as a user’s personal opinion expression. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag‐based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. Dissertation research aims to analyze and define an enhanced model to select tags that reveal the preferences and characteristics of users required to generate personalized recommendations. Options on the use of models for personalized tutoring system were also considered. Personalized learning occurs when e‐learning systems make deliberate efforts to design educational experiences that fit the needs, goals, talents, learning styles, interests of their learners and learners with similar characteristics. In practice, models defined in the dissertation were evaluated on tutoring system for teaching Java programming language.Predmet istraživanja disertacije obuhvata personalizaciju tutorskih sistema za elektronsko učenje primenom tehnika kolaborativnog tagovanja (collaborative tagging techniques) integrisanih u sisteme za generisanje preporuka (recommender systems). Tagovi, kao oblik meta podataka, predstavljaju proizvoljne ključne reči ili fraze koje korisnik može da upotrebi za označavanje različitih sadržaja. Pored toga što tagovi korisnicima pružaju pomoć u organizaciji sadržaja, oni su korisni i u izražavanju mišljenja korisnika. Veliki broj informacija koje korisnici pružaju o sebi kroz aktivnosti tagovanja otvorio je mogućnost primene tagova u generisanju preporuka. Istraživanje disertacije je usmereno na analizu i definisanje poboljšanih modela za odabir tagova koji otkrivaju sklonosti i osobine korisnika potrebne za generisanje personalizovanih preporuka. Razmatrane su i mogućnosti primene tako dobijenih modela za personalizaciju tutorskih sistema. Personalizovani tutorski sistemi korisniku pružaju optimalne putanje kretanja i adekvatne aktivnosti učenja na osnovu njegovih osobina, njegovog stila učenja, znanja koje on poseduje u toj oblasti, kao i prethodnog iskustva korisnika sistema koji imaju slične karakteristike. Modeli definisani u disertaciji u praksi su evaluirani na tutorskom sistemu za učenje programskog jezika Java
Personalizacijaprocesaelektronskogučenjaprimenomsistemazagenerisanjepreporukazasnovanognatehnikamakolaborativnogtagovanja
The research topic involves personalization of an e‐learning system based on collaborative tagging techniques integrated in a recommender system. Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so‐called tags. The systems can be distinguished according to what kind of resources are supported. Besides helping user to organize his or her personal collections, a tag also can be regarded as a user’s personal opinion expression. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag‐based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. Dissertation research aims to analyze and define an enhanced model to select tags that reveal the preferences and characteristics of users required to generate personalized recommendations. Options on the use of models for personalized tutoring system were also considered. Personalized learning occurs when e‐learning systems make deliberate efforts to design educational experiences that fit the needs, goals, talents, learning styles, interests of their learners and learners with similar characteristics. In practice, models defined in the dissertation were evaluated on tutoring system for teaching Java programming language.Predmet istraživanja disertacije obuhvata personalizaciju tutorskih sistema za elektronsko učenje primenom tehnika kolaborativnog tagovanja (collaborative tagging techniques) integrisanih u sisteme za generisanje preporuka (recommender systems). Tagovi, kao oblik meta podataka, predstavljaju proizvoljne ključne reči ili fraze koje korisnik može da upotrebi za označavanje različitih sadržaja. Pored toga što tagovi korisnicima pružaju pomoć u organizaciji sadržaja, oni su korisni i u izražavanju mišljenja korisnika. Veliki broj informacija koje korisnici pružaju o sebi kroz aktivnosti tagovanja otvorio je mogućnost primene tagova u generisanju preporuka. Istraživanje disertacije je usmereno na analizu i definisanje poboljšanih modela za odabir tagova koji otkrivaju sklonosti i osobine korisnika potrebne za generisanje personalizovanih preporuka. Razmatrane su i mogućnosti primene tako dobijenih modela za personalizaciju tutorskih sistema. Personalizovani tutorski sistemi korisniku pružaju optimalne putanje kretanja i adekvatne aktivnosti učenja na osnovu njegovih osobina, njegovog stila učenja, znanja koje on poseduje u toj oblasti, kao i prethodnog iskustva korisnika sistema koji imaju slične karakteristike. Modeli definisani u disertaciji u praksi su evaluirani na tutorskom sistemu za učenje programskog jezika Java
Italiano controlado para tradução automática : italiano - portugês
A importância da Tradução Automática (TA) prende-se com a necessidade de rapidez
na comunicação, a qual está em constante evolução. Nas últimas décadas, de facto, a
tradução automática tornou-se uma ferramenta imprescindível para um número cada vez
maior de pessoas, visando a investigação neste âmbito a criação de sistemas de tradução
automática de elevado desempenho, tanto em termos de eficiência como de qualidade.
Apesar disso, posto que a máquina não pode substituir de forma totalmente satisfatória a
tarefa do ser humano nesta matéria, os resultados da tradução podem apresentar
inadequação ou agramaticalidade. Contudo, através da utilização das Linguagens
Controladas (LC), ou seja da aplicação de um conjunto de restrições linguísticas aos
textos de input, é possível obter resultados de melhor qualidade no texto de output.
O presente trabalho tem como objetivo estabelecer um conjunto de restrições sintáticas
bem definidas e apresentadas sob a forma de regras declarativas para o par linguístico
italiano-português, sendo o italiano a língua de input e o português a língua de output.
Desta maneira, são estabelecidas restrições aplicáveis à língua italiana que permitem a
obtenção de melhores resultados na tradução para o português. Os tópicos linguísticos
objeto de análise são o modo, a modalidade e o aspeto. O sistema de tradução
automática de referência é o SystraNet, disponível gratuitamente online. A linguagem
controlada elaborada no presente trabalho é de tipo MOCL (Machine-Oriented
Controlled Language), sendo que a alteração dos elementos linguísticos depende
exclusivamente da aceitabilidade do resultado de tradução.The importance of Machine Translation (MT) rooted in the needs and constant
evolution of rapid communication. During the last decade machine translation has
become an essential tool for many people, as its main purpose has been aimed at the
creation of automatic, high quality and high performance MT systems. Nevertheless,
since a machine cannot fully replace the task of human beings, translation results can be
inadequate or grammatically ill-formed. Through the application of Controlled
Languages (CL), the usage of a set of language restrictions applied to natural languages,
better quality results in the output text are obtained.
This study aims at establishing a set of well defined syntax restrictions presented in the
form of declarative rules for the Italian-Portuguese language pair, where Italian is the
input language and Portuguese is the output language. When applied to the Italian
language, these restrictions determine the improvement of translation results in
Portuguese. The linguistic issues analysed are mood, modality and aspect. The machine
translation system used is SystraNet, available online for free. The controlled language
elaborated on is classified as MOCL (Machine-Oriented Controlled Language), as the
modification of the linguistic elements depends exclusively on the acceptability of the
translation result
Learning regulatory compliance data for data governance in financial services industry by machine learning models
While regulatory compliance data has been governed in the financial services industry for a long time to identify, assess, remediate and prevent risks, improving data governance (“DG”) has emerged as a new paradigm that uses machine learning models to enhance the level of data management.
In the literature, there is a research gap. Machine learning models have not been extensively applied to DG processes by a) predicting data quality (“DQ”) in supervised learning and taking temporal sequences and correlations of data noise into account in DQ prediction; b) predicting DQ in unsupervised learning and learning the importance of data noise jointly with temporal sequences and correlations of data noise in DQ prediction; c) analyzing DQ prediction at a granular level; d) measuring network run-time saving in DQ prediction; and e) predicting information security compliance levels.
Our main research focus is whether our ML models accurately predict DQ and information security compliance levels during DG processes of financial institutions by learning regulatory compliance data from both theoretical and experimental perspectives.
We propose five machine learning models including a) a DQ prediction sequential learning model in supervised learning; b) a DQ prediction sequential learning model with an attention mechanism in unsupervised learning; c) a DQ prediction analytical model; d) a DQ prediction network efficiency improvement model; and e) an information security compliance prediction model.
Experimental results demonstrate the effectiveness of these models by accurately predicting DQ in supervised learning, precisely predicting DQ in unsupervised learning, analyzing DQ prediction by divergent dimensions such as risk types and business segments, saving significant network run-time in DQ prediction for improving the network efficiency, and accurately predicting information security compliance levels.
Our models strengthen DG capabilities of financial institutions by improving DQ, data risk management, bank-wide risk management, and information security based on regulatory requirements in the financial services industry including Basel Committee on Banking Supervision Standard Number 239, Australia Prudential Regulation Authority (“APRA”) Standard Number CPG 235 and APRA Standard Number CPG 234. These models are part of DG programs under the DG framework of financial institutions
Portabilidad de aplicaciones en Astrofísica a la infraestructura de computación Grid
Tesis Univ. Granada. Programa Oficial de Posgrado en Ingeniería de Computadores y Redes (P38.56.1).Es bien conocida la importancia de la computación en la resolución de problemas, como simulaciones (de modelos físicos, de entornos, estadísticas, etc), el procesamiento de imágenes, la compresión de información, el almacenamiento, el análisis de datos, etc. Esto es incluso más evidente en ciencia.
La necesidad de altas prestaciones de computación y almacenamiento de datos en la comunidad científica ha promovido la búsqueda de nuevas soluciones computacionales. Por ejemplo, la biología [CAR14] o la física de altas energías [DAG12] son dos disciplinas científicas donde la computación de alto rendimiento es
cada vez más necesaria.
Este trabajo se enmarca en el campo de la Astrofísica, donde el crecimiento exponencial de datos observacionales hacen de la computación distribuida prácticamente un requerimiento para la óptima interpretación de dichos datos en tiempos razonables. En concreto, esta tesis cubre diversos campos de la Astronomía, desde la física estelar [GAR13] y planetaria [DAB14] hasta análisis de galaxias
[PER13].
Esta tesis se centra en el uso de infraestructuras distribuidas para coordinar recursos que no puedan manejarse mediante un control centralizado [FOS02]. Concretamente, aquí trabajamos con la infraestructura Grid, que no sólo aporta mayores recursos computacionales y de almacenamiento, sino también disponibilidad y fiabilidad en archivos críticos.
Objetivos
El objetivo principal de esta tesis doctoral es el desarrollo de una metodología que permita el uso eficiente de plataformas distribuidas en en diversas disciplinas astrofísicas. Esta eficiencia se busca mediante la optimización de los tres pasos que constituyen la portabilidad de una aplicación a una infraestructura de computación distribuida:
¿ El análisis de idoneidad de la infraestructura de computación distribuida.
¿ La paralelización del problema.
¿ La gestión de la ejecución en una plataforma distribuida de computación.
Desarrollo de una herramienta robusta de gestión de la computación en plataformas distribuidas, versátil y modular que garantice no sólo la portabilidad de aplicaciones (en este caso astrofísicas) sino su manejo por usuarios no expertos, así como su integración en paquetes de software científico.
Por último, el análisis de rendimiento de la metodología aplicada a casos científicos concretos.
Conclusiones
1. Hemos desarrollado una metodología basada en la optimización de los elementos de portabilidad de herramientas a entornos de computación distribuidos, incluido la gestión de la computación. La búsqueda de una metodología ya desarrollada que proporcionara las tres optimizaciones descritas como objetivo de esta tesis fue infructuosa, haciéndose necesario realizar una metodología novedosa, coherente y autoconsistente y focalizada en el ámbito de aplicaciones científicas.
2. Hemos desarrollado un paquete de herramientas para el uso de Grid, llamado GSG, ampliando la usabilidad, la monitorización, la información acerca del estado, la seguridad, la optimización en la distribución de tareas, y el envío de trabajos frente al uso del middleware estándar de Grid.
3. Gracias a GSG, hemos minimizado el tiempo de gestión en entornos distribuidos, tanto de computadora como de usuario. Desarrollamos GSG con una estructura modular preparada para su integración en aplicaciones. Al implementar este desarrollo, los usuarios científicos se han beneficiado al aumentar la usabilidad del entorno Grid, viéndose incrementada la productividad científica.4. Hemos aplicado la metodología a una muestra de aplicaciones astrofísicas para las que el uso de plataformas distribuidas es imprescindible. Éstas cubren una tipología de requerimientos computacionales y de almacenamiento muy diversa y complementaria, lo que nos ha permitido evaluar mejor la eficiencia del método.
5. Para cada una de las aplicaciones adaptadas a Grid, hemos generado una solución específica que ha permitido reducir significativamente el tiempo de computación y aumentar la capacidad de almacenamiento en todos los casos estudiados, llegando a multiplicar los recursos disponibles. Todos los estudios
científicos descritos en esta memoria no se hubieran podido realizar sin la aplicación de esta metodología debido a los altos requerimientos computacionales.
6. Se ha realizado diversos análisis comparativos entre un servidor dedicado no distribuido frente a una estructura distribuida Grid durante la ejecución de aplicaciones con diferentes necesidades computacionales y de almacenamiento. La aplicación de una metodología coherente y autoconsistente optimiza al máximo
el uso de entornos distribuidos, acercándose en todos los casos analizados a la cota de máxima de mejora impuesta por la infraestructura utilizada.
7. Hemos desarrollado un servidor de aplicaciones astrosismológicas, que denominados ATILA, para el tratamiento y la ejecución de este tipo de aplicaciones. Este servidor incorpora funciones como la utilización de modelos de ejecución de flujos de datos, la generación de perfiles físicos, la generación de modelos astrosismológicos, la clasificación de los datos o el uso de múltiples infraestructuras distribuidas. Además, hemos integrado el paquete de herramientas GSG en servidor de aplicaciones ATILA, lo que permite usar automáticamente de las funcionalidades de GSG en el entorno ATILA.
8. Como consecuencia de este desarrollo, se ha mejorado la usabilidad y el tiempo de ejecución del servidor de aplicaciones ATILA al integrar el sistema GSG. Para llegar a esta conclusión se han realizado experimentos comparativos de ATILA con GSG integrado, frente al uso independiente de ATILA y GSG.
Bibliografía
[CAR14] Carapito C. et Al. ¿MSDA, a proteomics software suite for in-depth Mass Spectrometry Data Analysis using grid computing¿ Proteomics. 14(9):1014-9. doi: 10.1002/pmic.201300415. 2014.
[DAB14] Dabrowska, D.D., Rodón, J.R. et al. ¿Scattering matrices of Martian dust analogs at 488 nm 647 nm.¿ A&A, 2014.
[DAG12]
Dagmar Adamová, Pablo Saiz ¿Grid Computing in High Energy Physics Experiments¿ INTECH Open Access Publisher, 2012
[FOS02] Foster, I. ¿The Grid: A new infrastructure for 21st century science¿. PHYSICS TODAY. Vol.55 Issue: 2 pp. 42-47. 2002
[GAR13] García Hernández, A. Rodón, JR et al. ¿An in-depth study of HD 174966 with CoRoT photometry and HARPS spectroscopy. Large separation as a new observable for stars¿. Astronomy & Astrophysics, Vol. 559 pp. 2013
[PER13] Perez, E. Rodón, JR. et al. ¿The Evolution of Galaxies Resolved in Space and Time: A View of Inside-out Growth from the CALIFA Survey¿. Astrophysical Journal Letters. Vol: 764 Iss.1 Article Number: L1 2013.N
On the integration of linguistic features into statistical and neural machine translation
Recent years have seen an increased interest in machine translation technologies and applications due to an increasing need to overcome language barriers in many sectors. New machine translations technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators [on some test sets]" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Läubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation.
Establishing the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate has been the starting point of our research. By looking at machine translation output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural machine translation has surpassed statistical machine translation in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural machine translation, aiming to analyse and provide a solution to some of them.
Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and machine translation in general. By taking linguistic theory as a starting point we examine to what extent theory is reflected in the current systems. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural machine translation and link it to many of the remaining linguistic issues