4 research outputs found

    Unsupervised Text Topic-Related Gene Extraction for Large Unbalanced Datasets

    Get PDF
    There is a common notion that traditional unsupervised feature extraction algorithms follow the assumption that the distribution of the different clusters in a dataset is balanced. However, feature selection is guided by the calculation of similarities among features when topic keywords are extracted from a large number of unmarked, unbalanced text datasets. As a result, the selected features cannot truly reflect the information of the original data set, which thus affects the subsequent performance of classifiers. To solve this problem, a new method of extracting unsupervised text topic-related genes is proposed in this paper. Firstly, a sample cluster group is obtained by factor analysis and a density peak algorithm, based on which the dataset is marked. Then, considering the influence of the unbalanced distribution of sample clusters on feature selection, the CHI statistical matrix feature selection method, which combines average local density and information entropy together, is used to strengthen the features of low-density small-sample clusters. Finally, a related gene extraction method based on the exploration of high-order relevance in multidimensional statistical data is described, which uses independent component analysis to enhance the generalisability of the selected features. In this way, unsupervised text topic-related genes can be extracted from large unbalanced datasets. The results of experiments suggest that the proposed method of extracting unsupervised text topic-related genes is better than existing methods in extracting text subject terms from low-density small-sample clusters, and has higher prematurity and feature dimension-reduction ability

    Ecological risk assessment based on land cover change: A case of Zanzibar-Tanzania, 2003-2027

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesLand use under improper land management is a major challenge in sub-Saharan Africa, and this has drastically affected ecological security. Addressing environmental impacts related to this major challenge requires faster and more efficient planning strategies that are based on measured information on land-use patterns. This study was employed to access the ecological risk index of Zanzibar using land cover change. We first employed Random Forest classifier to classify three Landsat images of Zanzibar for the year 2003, 2009 and 2018. And then the land change modeler was employed to simulate the land cover for Zanzibar City up to 2027 from land-use maps of 2009 and 2018 under business-as-usual and other two alternative scenarios (conservation and extreme scenario). Next, the ecological risk index of Zanzibar for each land cover was assessed based on the theories of landscape ecology and ecological risk model. The results show that the built-up areas and farmland of Zanzibar island have been increased constantly, while the natural grassland and forest cover were shrinking. The forest, agricultural and grassland have been highly fragmented into several small patches relative to the decrease in their patch areas. On the other hand, the ecological risk index of Zanzibar island has appeared to increase at a constant rate and if the current trend continues this index will increase by up to 8.9% in 2027. In comparing the three future scenarios the results show that the ERI for the conservation scenario will increase by only 4.6% which is at least 1.6% less compared to 6.2% of the business as usual, while the extreme scenario will provide a high increase of ERI of up to 8.9%. This study will help authorities to understand ecological processes and land use dynamics of various land cover classes, along with preventing unmanaged growth and haphazard development of informal housing and infrastructure

    Modélisation des informations et extraction des connaissances pour la gestion des crises

    Get PDF
    L’essor des technologies Ă©mergentes de collecte de donnĂ©es offre des opportunitĂ©s nouvelles pour diverses disciplines scientifiques. L’informatique est appelĂ© Ă  jouer sa partition par le dĂ©veloppement de techniques d’analyse intelligente des donnĂ©es pour apporter un certain Ă©clairage dans la rĂ©solution de problĂšmes complexes. Le contenu de ce mĂ©moire de recherche doctorale s’inscrit dans la problĂ©matique gĂ©nĂ©rale de l’extraction des connaissances Ă  partir de donnĂ©es par les techniques informatiques. Ce travail de thĂšse s’intĂ©resse dans un premier temps Ă  la problĂ©matique de la modĂ©lisation des informations pour la gestion de crise nĂ©cessitant des prises en charge mĂ©dicale, Ă  l’aide d’une collaboration des applications informatiques de la tĂ©lĂ©mĂ©decine. Nous avons proposĂ© une mĂ©thodologie de gestion d’une crise Ă  distance en trois Ă©tapes. Elle est principalement axĂ©e sur la collaboration des actes de tĂ©lĂ©mĂ©decine (TĂ©lĂ©consultation, TĂ©lĂ©expertise, TĂ©lĂ©surveillance, TĂ©lĂ©assistance, et la RĂ©gulation mĂ©dicale), de la phase de transport des victimes Ă  la phase de traitements mĂ©dicaux dans et/ou entre les structures de santĂ©. Cette mĂ©thodologie permet non seulement de mettre Ă  la disposition des gestionnaires de crise un systĂšme d'aide Ă  la dĂ©cision informatisĂ©, mais aussi de minimiser les coĂ»ts financiers et rĂ©duire le temps de rĂ©ponse des secours Ă  travers une gestion organisĂ©e de la crise. Dans un deuxiĂšme temps, nous avons Ă©tudiĂ© en dĂ©tail l’extraction de la connaissance Ă  l’aide des techniques de data mining sur les images satellitaires afin de dĂ©couvrir des zones Ă  risques d’épidĂ©mie, dont l’étude de cas a portĂ© sur l’épidĂ©mie de cholĂ©ra dans la rĂ©gion de Mopti, au Mali. Ainsi, une mĂ©thodologie de six phases a Ă©tĂ© prĂ©sentĂ©e en mettant en relation les donnĂ©es collectĂ©es sur le terrain et les donnĂ©es satellitaires pour prĂ©venir et surveiller plus efficacement les crises d’épidĂ©mie. Les rĂ©sultats nous indiquent qu’à 66% le taux de contamination est liĂ© au fleuve Niger, en plus de certains facteurs sociĂ©taux comme le jet des ordures en pĂ©riode hivernale. Par consĂ©quent, nous avons pu Ă©tablir le lien entre l’épidĂ©mie et son environnement d’évolution, ce qui permettra aux dĂ©cideurs de mieux gĂ©rer une Ă©ventuelle crise d’épidĂ©mie. Et enfin, en dernier lieu, pendant une situation de crise d’épidĂ©mie, nous nous sommes focalisĂ©s sur l’analyse mĂ©dicale, plus prĂ©cisĂ©ment par l’usage des microscopes portables afin de confirmer ou non la prĂ©sence des agents pathogĂšnes dans les prĂ©lĂšvements des cas suspects. Pour ce faire, nous avons prĂ©sentĂ© une mĂ©thodologie de six phases, basĂ©e sur les techniques du deep learning notamment l’une des techniques des rĂ©seaux de neurones convolutifs, l’apprentissage par transfert qui tirent parti des systĂšmes complexes avec des invariants permettant la modĂ©lisation et l'analyse efficace de grandes quantitĂ©s de donnĂ©es. Le principe consiste Ă  entraĂźner les rĂ©seaux de neurones convolutifs Ă  la classification automatique d’images des agents pathogĂšnes. Par exemple dans notre cas d’étude, cette approche a Ă©tĂ© utilisĂ©e pour distinguer une image microscopique contenant le virus de l’épidĂ©mie de cholĂ©ra appelĂ© Vibrio cholerae d’une image microscopique contenant le virus de l’épidĂ©mie du paludisme appelĂ© Plasmodium. Ceci nous a permis d’obtenir un taux de rĂ©ussite de classification de 99%. Par la suite, l’idĂ©e est de dĂ©ployer cette solution de reconnaissance d’images d’agents pathogĂšnes dans les microscopes portables intelligents pour les analyses de routine et applications de diagnostic mĂ©dical dans la gestion de situations de crise. Ce qui permettra de combler le manque de spĂ©cialistes en manipulation microscopique et un gain de temps considĂ©rable dans l’analyse des prĂ©lĂšvements avec des mesures prĂ©cises favorisant l’accomplissement du travail dans de meilleures conditions
    corecore