10,139 research outputs found

    Information Access in a Multilingual World: Transitioning from Research to Real-World Applications

    Get PDF
    Multilingual Information Access (MLIA) is at a turning point wherein substantial real-world applications are being introduced after fifteen years of research into cross-language information retrieval, question answering, statistical machine translation and named entity recognition. Previous workshops on this topic have focused on research and small- scale applications. The focus of this workshop was on technology transfer from research to applications and on what future research needs to be done which facilitates MLIA in an increasingly connected multilingual world

    FFAS server: novel features and applications.

    Get PDF
    The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Research, 33, W284-W288] implements the algorithm for protein profile-profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232-241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user's protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server

    Topic Map Generation Using Text Mining

    Get PDF
    Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation

    Get PDF
    Background: Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts. Results: We employ the Textpresso category-based information retrieval and extraction system http://www.textpresso.org webcite, developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed. Conclusion: Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation

    On the Topic of Jets: Disentangling Quarks and Gluons at Colliders

    Get PDF
    We introduce jet topics: a framework to identify underlying classes of jets from collider data. Because of a close mathematical relationship between distributions of observables in jets and emergent themes in sets of documents, we can apply recent techniques in "topic modeling" to extract jet topics from data with minimal or no input from simulation or theory. As a proof of concept with parton shower samples, we apply jet topics to determine separate quark and gluon jet distributions for constituent multiplicity. We also determine separate quark and gluon rapidity spectra from a mixed Z-plus-jet sample. While jet topics are defined directly from hadron-level multi-differential cross sections, one can also predict jet topics from first-principles theoretical calculations, with potential implications for how to define quark and gluon jets beyond leading-logarithmic accuracy. These investigations suggest that jet topics will be useful for extracting underlying jet distributions and fractions in a wide range of contexts at the Large Hadron Collider.Comment: 8 pages, 4 figures, 1 table. v2: Improved discussion to match PRL versio

    Topic Maps as a Virtual Observatory tool

    Get PDF
    One major component of the VO will be catalogs measuring gigabytes and terrabytes if not more. Some mechanism like XML will be used for structuring the information. However, such mechanisms are not good for information retrieval on their own. For retrieval we use queries. Topic Maps that have started becoming popular recently are excellent for segregating information that results from a query. A Topic Map is a structured network of hyperlinks above an information pool. Different Topic Maps can form different layers above the same information pool and provide us with different views of it. This facilitates in being able to ask exact questions, aiding us in looking for gold needles in the proverbial haystack. Here we discuss the specifics of what Topic Maps are and how they can be implemented within the VO framework. URL: http://www.astro.caltech.edu/~aam/science/topicmaps/Comment: 11 pages, 5 eps figures, to appear in SPIE Annual Meeting 2001 proceedings (Astronomical Data Analysis), uses spie.st

    Terminology server for improved resource discovery: analysis of model and functions

    Get PDF
    This paper considers the potential to improve distributed information retrieval via a terminologies server. The restriction upon effective resource discovery caused by the use of disparate terminologies across services and collections is outlined, before considering a DDC spine based approach involving inter-scheme mapping as a possible solution. The developing HILT model is discussed alongside other existing models and alternative approaches to solving the terminologies problem. Results from the current HILT pilot are presented to illustrate functionality and suggestions are made for further research and development

    Marketing and Advertising Translation: Humans vs Machines in the field of cosmetics

    Get PDF
    This undergraduate thesis focuses on a very specific field of specialized translation: advertising and marketing translation. Indeed, the high degree of specialization involved in this activity provides a testing ground for a reconsideration of the importance of the human translator and a reformulation of their role. The constant development of new technologies creates ever more sophisticated translation programs, which in turn revives the long-standing machine vs human translation debate. The aim of this project is to conduct a practical exercise targeted at verifying whether specialization in translation always requires the supervision of humans equipped with the relevant linguistic knowledge and technical background, or whether, on the contrary, machine translation can at present provide valid enough results and a sufficient level of reliability.El presente Trabajo de Fin de Grado se centra en un campo muy concreto de la traducción especializada: la traducción para la publicidad y la mercadotecnia. De hecho, el alto grado de especialización que implica esta actividad proporciona un campo de pruebas para una reconsideración de la importancia del traductor humano y una reformulación de su papel. El desarrollo creciente e ininterrumpido de las nuevas tecnologías está produciendo programas de traducción cada vez más sofisticados, lo que a su vez reaviva el viejo debate que confronta la traducción humana y la traducción automática. El objetivo de este proyecto es llevar a cabo un ejercicio práctico destinado a verificar si la especialización en la traducción siempre requiere la supervisión de personas con la formación lingüística y los conocimientos técnicos pertinentes, o si, por el contrario, la traducción automática puede en la actualidad proporcionar por si sola resultados suficientes y un nivel suficiente de fiabilidad.Grado en Estudios Inglese
    corecore