179 research outputs found

    A Large-Scale Community Questions Classification Accounting for Category Similarity: An Exploratory?

    Full text link
    The paper reports on a large-scale topical categorization of questions from a Russian community question answering (CQA) service [email protected]. We used a data set containing all the questions (more than 11 millions) asked by [email protected] users in 2012. This is the first study on question categorization dealing with non-English data of this size. The study focuses on adjusting category structure in order to get more robust classification results. We investigate several approaches to measure similarity between categories: the share of identical questions, language models, and user activity. The results show that the proposed approach is promising.14-07-00589; RFBR; Russian Foundation for Basic Research

    Learning to predict closed questions on stack overflow

    Full text link
    The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over

    What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

    Full text link
    We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform

    Reproductive Structures and Early Life History of the Gulf Toadfish, Opsanus beta, in the Tecolutla Estuary, Veracruz, Mexico

    Get PDF
    Although the Gulf toadfish, Opsanus beta, is an abundant member of the nearshore Gulf of Mexico ichthyofaunal assemblage, little information exists regarding the ecology of the species, especially for southern Gulf of Mexico populations. We added to the existing knowledge of this species by describing the reproductive structures and examining the early life history of this species in the Tecolutla estuary, Mexico. Macro- and microscopic examination of 7 males showed spermatogenesis to be similar to other teleost species except for the occurrence of biflagellate spermatozoa. Histological examination of the male accessory gland showed 3 tissue layers, but their functions are still undetermined. We found asynchronous development of oocytes in the ovaries of 16 females, which may indicate multiple spawning over the long spawning season noted in this study. Batch fecundity estimates among females ranged from 79 to 518 mature ova with a mean ovum diameter of 3.5 mm. The above-mentioned factors along with large size at hatching, attached larval forms, and paternal care may account, in part, for the abundance of this species in highly dynamic systems

    Por uma abordagem interdisciplinar e resolutiva dos conflitos no espaço escolar

    Get PDF
    This text is in a literature search on a topic that has preoccupied and discouraged teachers from public schools. This theme refers to school conflicts, their causes and possible solutions within an interdisciplinary perspective. In this sense, the work is a brief historical reflection in order to understand what are the changes that have occurred in recent decades in school, because the clientele that we receive is not the same as before. With these changes, the methodologies, planning and interpersonal relationships previously used are not consistent with the most current need. So, the question is what changes occurred in public school, what the school's role in this new reality and what changes are necessary for the educator in this unusual context. These issues are raised and in order to reflect and minimize many complaints and cries heard constantly in the staff room and point out possibilities of effecting a job everyday and successfully perceive the conflict as enriching and learning moments.O presente texto trata-se de uma pesquisa bibliográfica a respeito de uma temática que tem preocupado e desestimulado os professores da rede pública. Essa temática refere-se aos conflitos escolares, suas causas e possibilidades de resolução dentro de uma perspectiva interdisciplinar. Nesse sentido, o trabalho realiza uma breve reflexão histórica, a fim de perceber quais são as mudanças que ocorreram nas últimas décadas na escola, pois a clientela que recebemos não é a mesma de outrora. Com essas mudanças, as metodologias, os planejamentos e as relações interpessoais utilizadas anteriormente não condizem mais com a atual necessidade. Assim, questiona-se quais as modificações ocorridas na escola pública, qual o papel da escola nessa nova realidade e quais as mudanças necessárias para o educador nesse contexto insólito. Essas questões são levantadas e a fim de refletir e minimizar muitas queixas e lamúrias ouvidas constantemente na sala dos professores e apontar possibilidades de efetivar um trabalho cotidiano com sucesso e perceber os conflitos como enriquecedores e momentos de aprendizagem

    LEARNING TO PREDICT CLOSED QUESTIONS ON STACK OVERFLOW // Ученые записки КФУ. Физико-математические науки 2013 том155 N4

    Get PDF
    В статье рассматривается задача прогнозирования вероятности того, что вопрос на сервисе Stack Overflow - популярном вопросно-ответном ресурсе, посвященном разработке программного обеспечения - будет закрыт модератором. Задача, данные и метрика оценки качества были предложены в рамках открытого конкурса по машинному обучению на сервисе Kaggle. В процессе решения задачи мы использовали широкий набор признаков для классификации, в том числе признаки, описывающие личные характеристики пользователя, взаимодействие пользователей друг с другом, а также содержание вопросов, в том числе тематическое. В процессе классификации протестировано несколько алгоритмов машинного обучения. По результатам эксперимента были выявлены наиболее важные признаки: личные характеристики пользователя и тематические признаки вопроса. Наилучшие результаты были получены с помощью алгоритма, реализованного в библиотеке Vowpal Wabbit, - интерактивного обучения на основе стохастического градиентного спуска. Наилучшая полученная нами оценка попадает в топ-5 лучших результатов в финальной таблице, но получена после даты завершения конкурса
    corecore