5 research outputs found

    Graphs in clusters: a hybrid approach to unsupervised extractive long document summarization using language models

    Get PDF
    Effective summarization of long documents is a challenging task. When addressing this challenge, Graph and Cluster-Based methods stand out as effective unsupervised solutions. Graph-Based Unsupervised methods are widely employed for summarization due to their success in identifying relationships within documents. Cluster-Based methods excel in minimizing redundancy by grouping similar content together before generating a concise summary. Therefore, this paper merges Cluster-Based and Graph-Based methods by applying language models for Unsupervised Extractive Summarization of long documents. The approach simultaneously extracts key information while minimizing redundancy. First, we use BERT-based sentence embeddings to create sentence clusters using k-means clustering and select the optimum number of clusters using the elbow method to ensure that sentences are categorized based on their semantic similarities. Then, the TextRank algorithm is employed within each cluster to rank sentences based on their importance and representativeness. Finally, the total similarity score of the graph is used to rank the clusters and eliminate less important sentence groups. Our method achieves comparable or better summary quality and reduced redundancy compared to both individual Cluster-Based and Graph-Based methods, as well as other supervised and Unsupervised baseline models across diverse datasets

    Graphs in clusters: a hybrid approach to unsupervised extractive long document summarization using language models

    Get PDF
    Effective summarization of long documents is a challenging task. When addressing this challenge, Graph and Cluster-Based methods stand out as effective unsupervised solutions. Graph-Based Unsupervised methods are widely employed for summarization due to their success in identifying relationships within documents. Cluster-Based methods excel in minimizing redundancy by grouping similar content together before generating a concise summary. Therefore, this paper merges Cluster-Based and Graph-Based methods by applying language models for Unsupervised Extractive Summarization of long documents. The approach simultaneously extracts key information while minimizing redundancy. First, we use BERT-based sentence embeddings to create sentence clusters using k-means clustering and select the optimum number of clusters using the elbow method to ensure that sentences are categorized based on their semantic similarities. Then, the TextRank algorithm is employed within each cluster to rank sentences based on their importance and representativeness. Finally, the total similarity score of the graph is used to rank the clusters and eliminate less important sentence groups. Our method achieves comparable or better summary quality and reduced redundancy compared to both individual Cluster-Based and Graph-Based methods, as well as other supervised and Unsupervised baseline models across diverse datasets

    Machine Learning Methods for Finding Textual Features of Depression from Publications

    Get PDF
    Depression is a common but serious mood disorder. In 2015, WHO reports about 322 million people were living with some form of depression, which is the leading cause of ill health and disability worldwide. In USA, there are approximately 14.8 million American adults (about 6.7% percent of the US population) affected by major depressive disorder. Most individuals with depression are not receiving adequate care because the symptoms are easily neglected and most people are not even aware of their mental health problems. Therefore, a depression prescreen system is greatly beneficial for people to understand their current mental health status at an early stage. Diagnosis of depressions, however, is always extremely challenging due to its complicated, many and various symptoms. Fortunately, publications have rich information about various depression symptoms. Text mining methods can discover the different depression symptoms from literature. In order to extract these depression symptoms from publications, machine learning approaches are proposed to overcome four main obstacles: (1) represent publications in a mathematical form; (2) get abstracts from publications; (3) remove the noisy publications to improve the data quality; (4) extract the textual symptoms from publications. For the first obstacle, we integrate Word2Vec with LDA by either representing publications with document-topic distance distributions or augmenting the word-to-topic and word-to-word vectors. For the second obstacle, we calculate a document vector and its paragraph vectors by aggregating word vectors from Word2Vec. Feature vectors are calculated by clustering word vectors. Selected paragraphs are decided by the similarity of their distances to feature vectors and the document vector to feature vectors. For the third obstacle, one class SVM model is trained by vectored publications, and outlier publications are excluded by distance measurements. For the fourth obstacle, we fully evaluate the possibility of a word as a symptom according to its frequency in entire publications, and local relationship with its surrounding words in a publication

    Estratégias pedagógicas em ambientes virtuais de aprendizagem : um foco nas interações sociais de idosos

    Get PDF
    A presente dissertação teve como objetivo construir estratégias pedagógicas para fomentar as interações sociais de idosos em Ambiente Virtual de Aprendizagem. A cada ano aumenta o número de idosos interessados na Educação permanente que buscam cursos presenciais, híbridos e virtuais que utilizem o Ambiente Virtual de Aprendizagem (AVA) como apoio. No entanto, é importante elaborar um conjunto de estratégias pedagógicas para apoiar a mediação de professores em sua prática com os idosos, principalmente considerando as interações sociais, como é a finalidade desta pesquisa. A metodologia teve uma abordagem qualitativa e quantitativa, descritiva, do tipo estudo de casos múltiplos, realizada em sete etapas. Primeiramente realizou-se uma revisão sistemática sobre os conceitos envolvidos, constituindo o referencial teórico. Após, efetuou-se a elaboração de uma Matriz preliminar para identificar e analisar ações pedagógicas a partir do panorama da interação social no ambiente virtual, tal qual aplicou-se no curso Viva@EaD (caso1). Na sequência, foram construídas as Estratégias Pedagógicas preliminares para fomentar a interação social dos idosos em ambientes virtuais: o protótipo. Também realizou-se a construção de um Material Educacional Digital (MED), denominado de EPi-EaD, com a finalidade de apoiar aulas para curso de extensão. Com base nesses processos, elaborou-se um curso de extensão para profissionais com tem interesse ou que atuam com idosos em Ambientes Virtuais de Aprendizagem (AVA), assim como a avaliação das EPavisi preliminares. A partir da integração entre essas etapas, concebeu-se a revisão do protótipo das ações relativas à prática do professor com vistas a fomentar as interações sociais nos espaços de comunicação on-line. O processo de construção, aplicação e avaliação ocorreu com o delineamento de uma proposta socioeducacional que pode auxiliar nas trocas sociais dos idosos em AVAThis dissertation aimed to build pedagogical strategies to foster social interactions of the elderly in a Virtual Learning Environment. Each year, the number of seniors interested in continuing education increases, seeking on-site, hybrid and virtual courses that use the Virtual Learning Environment (AVA) as support. However, it is important to develop a set of pedagogical strategies to support the mediation of teachers in their practice with the elderly, especially considering social interactions, which is the purpose of this research. The methodology had a qualitative and quantitative approach, descriptive, of the multiple case study type, carried out in seven stages. First, there was a systematic review of the concepts involved, constituting the theoretical framework. Afterwards, a preliminary Matrix was elaborated to identify and analyze pedagogical actions from the panorama of social interaction in the virtual environment, as applied in the Viva@EaD course (case 1). Next, preliminary Pedagogical Strategies were built to foster social interaction of the elderly in virtual environments: the prototype. The construction of a Digital Educational Material (MED) was also carried out, called EPi-EaD, in order to support classes for extension courses. Based on these processes, an extension course was developed for professionals interested in or working with the elderly in Virtual Learning Environments (AVA), as well as an evaluation of the preliminary EPavisi. From the integration between these steps, a review of the prototype of actions related to the teacher's practice was conceived with a view to fostering social interactions in online communication spaces. The process of construction, application and evaluation took place with the design of a socio-educational proposal that can help in the social exchanges of the elderly in AVA