10 research outputs found
Multi-word unit processing in machine translation. Developing and using language resources for multi-word unit processing in machine translation
2011 - 2012XI n.s
Proceedings of the 17th Annual Conference of the European Association for Machine Translation
Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT
An empirical examination of interdisciplinary collaboration within the practice of localisation and development of international software
Acceptance on international markets is an important selling proposition for software products and a key to new markets. The adaptation of software products for specific markets is called software localisation. Practitioner reports and research suggests that activities of developers and translators do not mesh seamlessly, leading to problems such as disproportionate cost, lack of quality, and delayed product release. Yet, there is little research on localisation as a comprehensive activity and its human factors.
This thesis examines how software localisation is handled in practice, how the localisation process is integrated into development, and how software developers and localisers work individually and collaboratively on international software. The research aims to understand how localisation issues around the above-mentioned classifications of cost, quality and time issues are caused. Qualitative and quantitative data is gathered through semi-structured interviews and an online survey. The interviews focused on the individual experiences of localisation and development professionals in a range of relevant roles. The online survey measured cultural competence, attitude towards and self-efficacy in localisation, and properties of localisation projects. Interviews were conducted and analysed following Straussian Grounded Theory. The survey was statistically analysed to test a number of hypotheses regarding differences between localisers and developers, as well as relationships between project properties and software quality.
Results suggest gaps in knowledge, procedure and motivation between developers and translators, as well as a lack of cross-disciplinary knowledge and coordination. Further, a grounded theory of interdisciplinary collaboration in software localisation explains how collaboration strategies and conflicts reciprocally affect each other and are affected by external influences. A number of statistically significant differences between developers and localisers and the relevance of certain project properties to localisation were confirmed. The findings give new insights into interdisciplinary issues in the development of international software and suggest new ways to handle interdisciplinary collaboration in general
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
Statistical machine translation system and computational domain adaptation
StatistiÄko strojno prevoÄenje temeljeno na frazama jedan je od moguÄih pristupa
automatskom strojnom prevoÄenju. U radu su predloĆŸene metode za poboljĆĄanje kvalitete
strojnog prijevoda prilagodbom odreÄenih parametara u modelu sustava za statistiÄko
strojno prevoÄenje. Ideja rada bila jest izgraditi sustave za statistiÄko strojno prevoÄenje
temeljeno na frazama za hrvatski i engleski jezik. Sustavi su trenirani za dva jeziÄna smjera,
na dvije domene, na paralelnim korpusima razliÄitih veliÄina i obiljeĆŸja za hrvatsko-engleski i
englesko-hrvatski jeziÄni par, nakon Äega proveden postupak ugaÄanja sustava. IstraĆŸeni su
hibridni sustavi koji objedinjuju znaÄajke obiju domena. Time je ispitan izravan utjecaj
adaptacije domene na kvalitetu automatskog strojnog prijevoda hrvatskog jezika, a nova
saznanja mogu koristiti pri izgradnji novih sustava. Provedena je automatska i ljudska
evaluacija (vrednovanje) strojnih prijevoda, a dobiveni rezultati usporeÄeni su s rezultatima
strojnih prijevoda dobivenih primjenom postojeÄih web servisa za statistiÄko strojno
prevoÄenje.Phrase-based statistical machine translation is one of possible automatic machine translation
approaches. This work proposes methods for increasing the quality of machine translation
by adapting certain parameters in the statistical machine translation model. The idea was to
build phrase-based statistical machine translation systems for Croatian and English language.
The systems were be trained for two directions, on two domains, on parallel corpora of
different sizes and characteristics for Croatian-English and English-Croatian language pair,
after which the tuning procedure was conducted. Afterwards, hybrid systems which combine
features of both domains were investigated. Thereby the direct impact of domain adaptation
on the quality of automatic machine translation of Croatian language was explored, whereas
new findings can be utilised for building new systems. Automatic and human evaluation of
machine translations were carried out, while obtained results were compared with results
obtained from applying existing statistical machine translation web services