108 research outputs found

    Are some brain injury patients improving more than ohers?

    Get PDF
    Predicting the evolution of individuals is a rather new mining task with applications in medicine. Medical researchers are interested in the progress of a disease and in the evolution of individuals subjected to treatment. We investigate the evolution of patients on the basis of medical tests before and during treatment after brain trauma: we want to understand how similar patients can become to healthy participants. We face two challenges. First, we have less information on healthy participants than on the patients. Second, the values of the medical tests for patients, even after treatment started, remain well-separated from those of healthy people; this is typical for neurodegenerative diseases, but also for further brain impairments. Our approach encompasses methods for modelling patient evolution and for predicting the health improvement of different patient subpopulations, dealing with the above challenges. We test our approach on a cohort of patients treated after brain trauma and a corresponding cohort of controls

    Cross-Language Plagiarism Detection

    Full text link
    Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well.This work was partially supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 project and the CONACyT-Mexico 192021 grant.Potthast, M.; Barrón Cedeño, LA.; Stein, B.; Rosso, P. (2011). Cross-Language Plagiarism Detection. Language Resources and Evaluation. 45(1):45-62. https://doi.org/10.1007/s10579-009-9114-zS4562451Ballesteros, L. A. (2001). Resolving ambiguity for cross-language information retrieval: A dictionary approach. PhD thesis, University of Massachusetts Amherst, USA, Bruce Croft.Barrón-Cedeño, A., Rosso, P., Pinto, D., & Juan A. (2008). On cross-lingual plagiarism analysis using a statistical model. In S. Benno, S. Efstathios, & K. Moshe (Eds.), ECAI 2008 workshop on uncovering plagiarism, authorship, and social software misuse (PAN 08) (pp. 9–13). Patras, Greece.Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1–8.Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In SIGIR’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (vol. 4629, pp. 222–229). Berkeley, California, United States: ACM.Brin, S., Davis, J., & Garcia-Molina, H. (1995). Copy detection mechanisms for digital documents. In SIGMOD ’95 (pp. 398–409). New York, NY, USA: ACM Press.Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Ceska, Z., Toman, M., & Jezek, K. (2008). Multilingual plagiarism detection. In AIMSA’08: Proceedings of the 13th international conference on artificial intelligence (pp. 83–92). Berlin, Heidelberg: Springer.Clough, P. (2003). Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service, http://www.ir.shef.ac.uk/cloughie/papers/pas_plagiarism.pdf .Dempster A. P., Laird N. M., Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.Dumais, S. T., Letsche, T. A., Littman, M. L., & Landauer, T. K. (1997). Automatic cross-language retrieval using latent semantic indexing. In D. Hull & D. Oard (Eds.), AAAI-97 spring symposium series: Cross-language text and speech retrieval (pp. 18–24). Stanford University, American Association for Artificial Intelligence.Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference for artificial intelligence, Hyderabad, India.Hoad T. C., & Zobel, J. (2003). Methods for identifying versioned and plagiarised documents. American Society for Information Science and Technology, 54(3), 203–215.Levow, G.-A., Oard, D. W., & Resnik, P. (2005). Dictionary-based techniques for cross-language information retrieval. Information Processing & Management, 41(3), 523–547.Littman, M., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In Cross-language information retrieval, chap. 5 (pp. 51–62). Kluwer.Maurer, H., Kappe, F., & Zaka, B. (2006). Plagiarism—a survey. Journal of Universal Computer Science, 12(8), 1050–1084.McCabe, D. (2005). Research report of the Center for Academic Integrity. http://www.academicintegrity.org .Mcnamee, P., & Mayfield, J. (2004). Character N-gram tokenization for European language text retrieval. Information Retrieval, 7(1–2), 73–97.Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. In M. Lalmas, A. MacFarlane, S. M. Rüger, A. Tombros, T. Tsikrika, & A. Yavlinsky (Eds.), Proceedings of the European conference on information retrieval (ECIR 2006), volume 3936 of Lecture Notes in Computer Science (pp. 565–569). Springer.Meyer zu Eissen, S., Stein, B., & Kulig, M. (2007). Plagiarism detection without reference collections. In R. Decker & H. J. Lenz (Eds.), Advances in data analysis (pp. 359–366), Springer.Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.Pinto, D., Juan, A., & Rosso, P. (2007). Using query-relevant documents pairs for cross-lingual information retrieval. In V. Matousek & P. Mautner (Eds.), Lecture Notes in Artificial Intelligence (pp. 630–637). Pilsen, Czech Republic.Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., & Rosso, P. (2009). A statistical approach to cross-lingual natural language tasks. Journal of Algorithms, 64(1), 51–60.Potthast, M. (2007). Wikipedia in the pocket-indexing technology for near-duplicate detection and high similarity search. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 909–909). ACM.Potthast, M., Stein, B., & Anderka, M. (2008). A Wikipedia-based multilingual retrieval model. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), 30th European conference on IR research, ECIR 2008, Glasgow , volume 4956 LNCS of Lecture Notes in Computer Science (pp. 522–530). Berlin: Springer.Pouliquen, B., Steinberger, R., & Ignat, C. (2003a). Automatic annotation of multilingual text collections with a conceptual thesaurus. In Proceedings of the workshop ’ontologies and information extraction’ at the Summer School ’The Semantic Web and Language Technology—its potential and practicalities’ (EUROLAN’2003) (pp. 9–28), Bucharest, Romania.Pouliquen, B., Steinberger, R., & Ignat, C. (2003b). Automatic identification of document translations in large multilingual document collections. In Proceedings of the international conference recent advances in natural language processing (RANLP’2003) (pp. 401–408). Borovets, Bulgaria.Stein, B. (2007). Principles of hash-based text retrieval. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 527–534). ACM.Stein, B. (2005). Fuzzy-fingerprints for text-based information retrieval. In K. Tochtermann & H. Maurer (Eds.), Proceedings of the 5th international conference on knowledge management (I-KNOW 05), Graz, Journal of Universal Computer Science. (pp. 572–579). Know-Center.Stein, B., & Anderka, M. (2009). Collection-relative representations: A unifying view to retrieval models. In A. M. Tjoa & R. R. Wagner (Eds.), 20th International conference on database and expert systems applications (DEXA 09) (pp. 383–387). IEEE.Stein, B., & Meyer zu Eissen, S. (2007). Intrinsic plagiarism analysis with meta learning. In B. Stein, M. Koppel, & E. Stamatatos (Eds.), SIGIR workshop on plagiarism analysis, authorship identification, and near-duplicate detection (PAN 07) (pp. 45–50). CEUR-WS.org.Stein, B., & Potthast, M. (2007). Construction of compact retrieval models. In S. Dominich & F. Kiss (Eds.), Studies in theory of information retrieval (pp. 85–93). Foundation for Information Society.Stein, B., Meyer zu Eissen, S., & Potthast, M. (2007). Strategies for retrieving plagiarized documents. In C. Clarke, N. Fuhr, N. Kando, W. Kraaij, & A. de Vries (Eds.), 30th Annual international ACM SIGIR conference (pp. 825–826). ACM.Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (LREC’2006).Steinberger, R., Pouliquen, B., & Ignat, C. (2004). Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In Proceedings of the 4th Slovenian language technology conference. Information Society 2004 (IS’2004).Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003). Inferring a semantic representation of text via cross-language correlation analysis. In S. Becker, S. Thrun, & K. Obermayer (Eds.), NIPS-02: Advances in neural information processing systems (pp. 1473–1480). MIT Press.Yang, Y., Carbonell, J. G., Brown, R. D., & Frederking, R. E. (1998). Translingual information retrieval: Learning from bilingual corpora. Artificial Intelligence, 103(1–2), 323–345

    The Size, Shape, Albedo, Density, and Atmospheric Limit of Transneptunian Object (50000) Quaoar from Multi-chord Stellar Occultations

    Get PDF
    We present results derived from the first multi-chord stellar occultations by the transneptunian object (50000) Quaoar, observed on 2011 May 4 and 2012 February 17, and from a single-chord occultation observed on 2012 October 15. If the timing of the five chords obtained in 2011 were correct, then Quaoar would possess topographic features (crater or mountain) that would be too large for a body of this mass. An alternative model consists in applying time shifts to some chords to account for possible timing errors. Satisfactory elliptical fits to the chords are then possible, yielding an equivalent radius R [SUB]equiv[/SUB] = 555 ± 2.5 km and geometric visual albedo p[SUB]V[/SUB] = 0.109 ± 0.007. Assuming that Quaoar is a Maclaurin spheroid with an indeterminate polar aspect angle, we derive a true oblateness of \epsilon = 0.087^{+0.0268}_{-0.0175}, an equatorial radius of 569^{+24}_{-17} km, and a density of 1.99 ± 0.46 g cm[SUP]–3[/SUP]. The orientation of our preferred solution in the plane of the sky implies that Quaoar's satellite Weywot cannot have an equatorial orbit. Finally, we detect no global atmosphere around Quaoar, considering a pressure upper limit of about 20 nbar for a pure methane atmosphere.Peer reviewe

    A highly resolved food web for insect seed predators in a species-rich tropical forest

    Get PDF
    The top-down and indirect effects of insects on plant communities depend on patterns of host use, which are often poorly documented, particularly in species-rich tropical forests. At Barro Colorado Island, Panama, we compiled the first food web quantifying trophic interactions between the majority of co-occurring woody plant species and their internally feeding insect seed predators. Our study is based on more than 200 000 fruits representing 478 plant species, associated with 369 insect species. Insect host-specificity was remarkably high: only 20% of seed predator species were associated with more than one plant species, while each tree species experienced seed predation from a median of two insect species. Phylogeny, but not plant traits, explained patterns of seed predator attack. These data suggest that seed predators are unlikely to mediate indirect interactions such as apparent competition between plant species, but are consistent with their proposed contribution to maintaining plant diversity via the Janzen-Connell mechanism

    Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes

    Get PDF
    The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described

    Yeasts associated with the production of distilled alcoholic beverages

    Get PDF
    Distilled alcoholic beverages are produced firstly by fermenting sugars emanating from cereal starches (in the case of whiskies), sucrose-rich plants (in the case of rums), fructooligosaccharide-rich plants (in the case of tequila) or from fruits (in the case of brandies). Traditionally, such fermentations were conducted in a spontaneous fashion, relying on indigenous microbiota, including wild yeasts. In modern practices, selected strains of Saccharomyces cerevisiae are employed to produce high levels of ethanol together with numerous secondary metabolites (eg. higher alcohols, esters, carbonyls etc.) which greatly influence the final flavour and aroma characteristics of spirits following distillation of the fermented wash. Therefore, distillers, like winemakers, must carefully choose their yeast strain which will be very important in providing the alcohol content and the sensory profiles of spirit beverages. This Chapter discusses yeast and fermentation aspects associated with the production of selected distilled spirits and highlights similarities and differences with the production of wine

    4to. Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad. Memoria académica

    Get PDF
    Este volumen acoge la memoria académica de la Cuarta edición del Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad, CITIS 2017, desarrollado entre el 29 de noviembre y el 1 de diciembre de 2017 y organizado por la Universidad Politécnica Salesiana (UPS) en su sede de Guayaquil. El Congreso ofreció un espacio para la presentación, difusión e intercambio de importantes investigaciones nacionales e internacionales ante la comunidad universitaria que se dio cita en el encuentro. El uso de herramientas tecnológicas para la gestión de los trabajos de investigación como la plataforma Open Conference Systems y la web de presentación del Congreso http://citis.blog.ups.edu.ec/, hicieron de CITIS 2017 un verdadero referente entre los congresos que se desarrollaron en el país. La preocupación de nuestra Universidad, de presentar espacios que ayuden a generar nuevos y mejores cambios en la dimensión humana y social de nuestro entorno, hace que se persiga en cada edición del evento la presentación de trabajos con calidad creciente en cuanto a su producción científica. Quienes estuvimos al frente de la organización, dejamos plasmado en estas memorias académicas el intenso y prolífico trabajo de los días de realización del Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad al alcance de todos y todas
    corecore