4,679 research outputs found

    Dimensionality reduction methods for machine translation quality estimation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-013-9139-3[EN] Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.This work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no. 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Dimensionality reduction methods for machine translation quality estimation. Machine Translation. 27(3-4):281-301. https://doi.org/10.1007/s10590-013-9139-3S281301273-4Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1–2):237–260Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New YorkAvramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, pp 84–90Bellman RE (1961) Adaptive control processes: a guided tour. Rand Corporation research studies. Princeton University Press, PrincetonBisani M, Ney H (2004) Bootstrap estimates for confidence intervals in asr performance evaluation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 409–412Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the international conference on Computational Linguistics, pp 315–321Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 10–51Chong I, Jun C (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamon M, Aue A, Smets M (2005) Sentence-Level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the conference of the European Association for Machine TranslationGandrabur S, Foster G (2003) Confidence estimation for text prediction. In: Proceedings of the conference on computational natural language learning, pp 315–321Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185(1):1–17González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedinss of the meeting of the association for computational linguistics, pp 173–177González-Rubio J, Sanchís A, Casacuberta F (2012) Prhlt submission to the wmt12 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, pp 104–108Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Machine Learning Research 3:1157–1182Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat 2(3):360–378Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics, demonstration sessionKohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572Platt JC (1999) Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of the conference on advances in neural information processing systems II, pp 557–563Quinlan RJ (1992) Learning with continuous classes. In: Proceedings of the Australian joint conference on artificial intelligence, pp 343–348Quirk C (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of conference on language resources and evaluation, pp 825–828Sanchis A, Juan A, Vidal E (2007) Estimation of confidence measures for machine translation. In: Proceedings of the machine translation summit XI, pp 407–412Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the fifteenth symposium on the interface, computer science and statistics, pp 173–179Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the meeting of the association for computational linguistics, pp 612–621Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Montreal, Canada, pp 145–151Specia L, Saunders C, Wang Z, Shawe-Taylor J, Turchi M (2009a) Improving the confidence of machine translation quality estimates. In: Proceedings of the machine translation summit XIISpecia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the meeting of the European Association for Machine Translation, pp 28–35Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Ling 33:9–40Ueffing N, Macherey K, Ney H (2003) Confidence measures for statistical machine translation. In: Proceedings of the MT summit IX, pp 394–401Wold H (1966) Estimation of principal components and related models by iterative least squares. Academic Press, New Yor

    CRM Strategies for A Small-Sized Online Shopping Mall Based on Association Rules and Sequential Patterns

    Get PDF
    Data mining has a tremendous contribution to the extraction of knowledge and information which have been hidden in a large volume of data. This study has proposed customer relationship management (CRM) strategies for a small-sized online shopping mall based on association rules and sequential patterns obtained by analyzing the transaction data of the shop. We first defined the VIP customer in terms of recency, frequency and monetary value. Then, we developed a model which classifies customers into VIP or non-VIP, using various techniques such as decision tree, artificial neural network and bagging with each of these as a base classifier. Last, we identified association rules and sequential patterns from the transactions of VIPs, and then these rules and patterns were utilized to propose CRM strategies for the online shopping mall

    Impact of Big Data Analytics on Banking: A Case Study

    Get PDF
    Purpose – The paper aims to help enterprises gain valuable knowledge about big data implementation in practice and improve their information management ability, as they accumulate experience, to reuse or adapt the proposed method to achieve a sustainable competitive advantage. Design/methodology/approach – Guided by the theory of technological frames of reference (TFR) and transaction cost theory (TCT), this paper describes a real-world case study in the banking industry to explain how to help enterprises leverage big data analytics for changes. Through close integration with bank\u27s daily operations and strategic planning, the case study shows how the analytics team frame the challenge and analyze the data with two analytic models – customer segmentation (unsupervised) and product affinity prediction (supervised), to initiate the adoption of big data analytics in precise marketing. Findings – The study reported relevant findings from a longitudinal data analysis and identified some key success factors. First, non-technical factors, for example intuitive analytics results, appropriate evaluation baseline, multiple-wave implementation and selection of marketing channels critically influence big data implementation progress in organizations. Second, a successful campaign also relies on technical factors. For example, the clustering analytics could promote customers\u27 response rates, and the product affinity prediction model could boost efficient transaction and lower time costs. Originality/value – For theoretical contribution, this paper verified that the outstanding characteristics of online mutual fund platforms brought up by Nagle, Seamans and Tadelis (2010) could not guarantee organizations\u27 competitive advantages from the aspect of TCT

    Comparative proteomic profiling of the serum differentiates pancreatic cancer from chronic pancreatitis

    Get PDF
    Finland ranks sixth among the countries having highest incidence rate of pancreatic cancer with mortality roughly equaling incidence. The average age of diagnosis for pancreatic cancer is 69years in Nordic males, whereas the average age of diagnosis of chronic pancreatitis is 40-50years, however, many cases overlap in age. By radiology, the evaluation of a pancreatic mass, that is, the differential diagnosis between chronic pancreatitis and pancreatic cancer is often difficult. Preoperative needle biopsies are difficult to obtain and are demanding to interpret. New blood based biomarkers are needed. The accuracy of the only established biomarker for pancreatic cancer, CA 19-9 is rather poor in differentiating between benign and malignant mass of the pancreas. In this study, we have performed mass spectrometry analysis (High Definition MSE) of serum samples from patients with chronic pancreatitis (13) and pancreatic cancer (22). We have quantified 291 proteins and performed detailed statistical analysis such as principal component analysis, orthogonal partial least square discriminant analysis and receiver operating curve analysis. The proteomic signature of chronic pancreatitis versus pancreatic cancer samples was able to separate the two groups by multiple statistical techniques. Some of the enriched pathways in the proteomic dataset were LXR/RXR activation, complement and coagulation systems and inflammatory response. We propose that multiple high-confidence biomarker candidates in our pilot study including Inter-alpha-trypsin inhibitor heavy chain H2 (Area under the curve, AUC: 0.947), protein AMBP (AUC: 0.951) and prothrombin (AUC: 0.917), which should be further evaluated in larger patient series as potential new biomarkers for differential diagnosis.Peer reviewe

    Radiation-driven winds of hot luminous stars. XVI. Expanding atmospheres of massive and very massive stars and the evolution of dense stellar clusters

    Full text link
    Context: Starbursts, and particularly their high-mass stars, play an essential role in the evolution of galaxies. The winds of massive stars not only significantly influence their surroundings, but the mass loss also profoundly affects the evolution of the stars themselves. In addition to the evolution of each star, the evolution of the dense cores of massive starburst clusters is affected by N-body interactions, and the formation of very massive stars via mergers may be decisive for the evolution of the cluster. Aims: To introduce an advanced diagnostic method of O-type stellar atmospheres with winds, including an assessment of the accuracy of the determinations of abundances, stellar and wind parameters. Methods: We combine consistent models of expanding atmospheres with detailed stellar evolutionary calculations of massive and very massive single stars with regard to the evolution of dense stellar clusters. Accurate predictions of the mass loss rates of very massive stars requires a highly consistent treatment of the statistical equilibrium and the hydrodynamic and radiative processes in the expanding atmospheres. Results: We present computed mass loss rates, terminal wind velocities, and spectral energy distributions of massive and very massive stars of different metallicities, calculated from atmospheric models with an improved level of consistency. Conclusions: Stellar evolutionary calculations using our computed mass loss rates show that low-metallicity very massive stars lose only a very small amount of their mass, making it unlikely that very massive population III stars cause a significant helium enrichment of the interstellar medium. Solar-metallicity stars have higher mass-loss rates, but these are not so high to exclude very massive stars formed by mergers in dense clusters from ending their life massive enough to form intermediate-mass black holes.Comment: Accepted by A&

    Work Organisation and Innovation - Case Study: LHT, Germany

    Get PDF
    [Excerpt] Lufthansa Technik AG (LHT) provides aircraft-related technical services to a worldwide customer base comprising airlines, aircraft leasing companies, maintenance organisations, and operators of business and VIP aircrafts. Besides the maintenance, repair, and overhaul (MRO) services that form the organisation’s core business, activities also include development and production activities, as well as logistics

    GSU View, 2012-10

    Get PDF
    Newsletter published by Governors State University 2007-current
    • …
    corecore