4 research outputs found

    RUSSE'2018 : a shared task on word sense induction for the Russian language

    Full text link
    The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and free word order. The participants were asked to group contexts with a given word in accordance with its senses that were not provided beforehand. For instance, given a word “bank” and a set of contexts with this word, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water”, a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings

    Интеллектуальная поддержка формирования образовательных программ на основе нейросетевых моделей языка с учетом требований рынка труда

    Get PDF
    The active development of the digital economy today imposes high requirements on the adaptability, practical orientation and quality of educational content. Existing approaches to the intelligent decision support of the formation of educational programs based on ontological models, expert systems and heuristic algorithms do not allow effectively taking into account and tracking changes both in the labor market and in the open educational content space in the Massive Open Online Courses (MOOC). Instead, it is proposed to use approaches to the semantic analysis based on the well-known neural network language model word2vec, which is trained without supervision on large text corpora. The complexity of semantic analysis is to move the definition of semantic similarity measures for short texts of the extracted entities (course topics, learning outcomes, job requirements, etc.) to matching of large structured documents, such as professional standard, an educational program. To take into account the interrelations of entities, a graph model is introduced for representing the educational and professional domain. The paper proposes an artificial intelligent method of forming recommendations for the actualization of the learning outcomes and content of educational programs. At the first stage, the actual requirements of the labor market are determined based on a semantic matching of job requirements with the content of professional standards. The second stage includes a semantic matching of the content of academic disciplines with the requirements of the labor market. At the third stage, a semantic search of relevant educational content is carried out among the programs of disciplines of leading universities and massive open online courses (MOOC). During the fourth stage, final recommendations on updating the educational program are formed. The experiment demonstrated the possibility of applying the method for matching learning outcomes and content of disciplines with the requirements of professional standards and evaluation using the example of the educational program (bachelor degree) of computer science and engineering.Активное развитие отраслей цифровой экономики сегодня предъявляет высокие требования к адаптивности, практической направленности и качеству современных образовательных программ. Существующие подходы к интеллектуальной поддержке формирования образовательных программ на основе онтологических моделей, экспертных систем и эвристических алгоритмов не позволяют эффективно учитывать и отслеживать изменения как на рынке труда, так и в пространстве открытого образовательного контента в среде Интернет. Вместо этого предлагается использовать подходы к семантическому анализу текстов на основе известной нейросетевой модели языка word2vec, обучаемой без учителя на больших текстовых корпусах. Сложность сопоставительного семантического анализа заключается в переходе от определения меры семантической близости для отдельных коротких описаний анализируемых сущностей (тем курсов, результатов обучения, требований вакансий и т. д.) к сопоставлению больших структурируемых документов, таких как профессиональный стандарт, образовательная программа по направлению подготовки. Для учета взаимосвязей сущностей вводится графовая модель представления образовательной и профессиональной области. В работе предлагается интеллектуальный метод формирования рекомендаций по актуализации целей и содержания образовательных программ, включающий четыре этапа анализа. На первом этапе определяются актуальные требования рынка труда на основе семантического сопоставления фрагментов вакансий с содержанием профессиональных стандартов. Второй этап включает в себя семантическое сопоставление содержания учебных дисциплин с требованиями рынка труда. На третьем этапе производится семантический поиск релевантного образовательного контента среди программ дисциплин ведущих вузов и онлайн-курсов. В ходе четвертого этапа формируются итоговые рекомендации по актуализации образовательной программы. В рамках эксперимента продемонстрирована возможность применения метода для сопоставления результатов обучения и содержания дисциплин с требованиями профессиональных стандартов с оценкой качества на примере образовательной программы по направлению «Информатика и вычислительная техника»

    RUSSE'2018 : a shared task on word sense induction for the Russian language

    No full text
    The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and free word order. The participants were asked to group contexts with a given word in accordance with its senses that were not provided beforehand. For instance, given a word “bank” and a set of contexts with this word, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water”, a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings
    corecore