4 research outputs found
Unsupervised Text Topic-Related Gene Extraction for Large Unbalanced Datasets
There is a common notion that traditional unsupervised feature extraction algorithms follow the assumption that the distribution of the different clusters in a dataset is balanced. However, feature selection is guided by the calculation of similarities among features when topic keywords are extracted from a large number of unmarked, unbalanced text datasets. As a result, the selected features cannot truly reflect the information of the original data set, which thus affects the subsequent performance of classifiers. To solve this problem, a new method of extracting unsupervised text topic-related genes is proposed in this paper. Firstly, a sample cluster group is obtained by factor analysis and a density peak algorithm, based on which the dataset is marked. Then, considering the influence of the unbalanced distribution of sample clusters on feature selection, the CHI statistical matrix feature selection method, which combines average local density and information entropy together, is used to strengthen the features of low-density small-sample clusters. Finally, a related gene extraction method based on the exploration of high-order relevance in multidimensional statistical data is described, which uses independent component analysis to enhance the generalisability of the selected features. In this way, unsupervised text topic-related genes can be extracted from large unbalanced datasets. The results of experiments suggest that the proposed method of extracting unsupervised text topic-related genes is better than existing methods in extracting text subject terms from low-density small-sample clusters, and has higher prematurity and feature dimension-reduction ability
Ecological risk assessment based on land cover change: A case of Zanzibar-Tanzania, 2003-2027
Dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science in Geospatial TechnologiesLand use under improper land management is a major challenge in sub-Saharan Africa, and this has drastically affected ecological security. Addressing environmental impacts related to this major challenge requires faster and more efficient planning strategies that are based on measured information on land-use patterns. This study was employed to access the ecological risk index of Zanzibar using land cover change. We first employed Random Forest classifier to classify three Landsat images of Zanzibar for the year 2003, 2009 and 2018. And then the land change modeler was employed to simulate the land cover for Zanzibar City up to 2027 from land-use maps of 2009 and 2018 under business-as-usual and other two alternative scenarios (conservation and extreme scenario). Next, the ecological risk index of Zanzibar for each land cover was assessed based on the theories of landscape ecology and ecological risk model. The results show that the built-up areas and farmland of Zanzibar island have been increased constantly, while the natural grassland and forest cover were shrinking. The forest, agricultural and grassland have been highly fragmented into several small patches relative to the decrease in their patch areas. On the other hand, the ecological risk index of Zanzibar island has appeared to increase at a constant rate and if the current trend continues this index will increase by up to 8.9% in 2027. In comparing the three future scenarios the results show that the ERI for the conservation scenario will increase by only 4.6% which is at least 1.6% less compared to 6.2% of the business as usual, while the extreme scenario will provide a high increase of ERI of up to 8.9%. This study will help authorities to understand ecological processes and land use dynamics of various land cover classes, along with preventing unmanaged growth and haphazard development of informal housing and infrastructure
Modélisation des informations et extraction des connaissances pour la gestion des crises
Lâessor des technologies Ă©mergentes de collecte de donnĂ©es offre des opportunitĂ©s nouvelles pour diverses disciplines scientifiques. Lâinformatique est appelĂ© Ă jouer sa partition par le dĂ©veloppement de techniques dâanalyse intelligente des donnĂ©es pour apporter un certain Ă©clairage dans la rĂ©solution de problĂšmes complexes. Le contenu de ce mĂ©moire de recherche doctorale sâinscrit dans la problĂ©matique gĂ©nĂ©rale de lâextraction des connaissances Ă partir de donnĂ©es par les techniques informatiques. Ce travail de thĂšse sâintĂ©resse dans un premier temps Ă la problĂ©matique de la modĂ©lisation des informations pour la gestion de crise nĂ©cessitant des prises en charge mĂ©dicale, Ă lâaide dâune collaboration des applications informatiques de la tĂ©lĂ©mĂ©decine. Nous avons proposĂ© une mĂ©thodologie de gestion dâune crise Ă distance en trois Ă©tapes. Elle est principalement axĂ©e sur la collaboration des actes de tĂ©lĂ©mĂ©decine (TĂ©lĂ©consultation, TĂ©lĂ©expertise, TĂ©lĂ©surveillance, TĂ©lĂ©assistance, et la RĂ©gulation mĂ©dicale), de la phase de transport des victimes Ă la phase de traitements mĂ©dicaux dans et/ou entre les structures de santĂ©. Cette mĂ©thodologie permet non seulement de mettre Ă la disposition des gestionnaires de crise un systĂšme d'aide Ă la dĂ©cision informatisĂ©, mais aussi de minimiser les coĂ»ts financiers et rĂ©duire le temps de rĂ©ponse des secours Ă travers une gestion organisĂ©e de la crise. Dans un deuxiĂšme temps, nous avons Ă©tudiĂ© en dĂ©tail lâextraction de la connaissance Ă lâaide des techniques de data mining sur les images satellitaires afin de dĂ©couvrir des zones Ă risques dâĂ©pidĂ©mie, dont lâĂ©tude de cas a portĂ© sur lâĂ©pidĂ©mie de cholĂ©ra dans la rĂ©gion de Mopti, au Mali. Ainsi, une mĂ©thodologie de six phases a Ă©tĂ© prĂ©sentĂ©e en mettant en relation les donnĂ©es collectĂ©es sur le terrain et les donnĂ©es satellitaires pour prĂ©venir et surveiller plus efficacement les crises dâĂ©pidĂ©mie. Les rĂ©sultats nous indiquent quâĂ 66% le taux de contamination est liĂ© au fleuve Niger, en plus de certains facteurs sociĂ©taux comme le jet des ordures en pĂ©riode hivernale. Par consĂ©quent, nous avons pu Ă©tablir le lien entre lâĂ©pidĂ©mie et son environnement dâĂ©volution, ce qui permettra aux dĂ©cideurs de mieux gĂ©rer une Ă©ventuelle crise dâĂ©pidĂ©mie. Et enfin, en dernier lieu, pendant une situation de crise dâĂ©pidĂ©mie, nous nous sommes focalisĂ©s sur lâanalyse mĂ©dicale, plus prĂ©cisĂ©ment par lâusage des microscopes portables afin de confirmer ou non la prĂ©sence des agents pathogĂšnes dans les prĂ©lĂšvements des cas suspects. Pour ce faire, nous avons prĂ©sentĂ© une mĂ©thodologie de six phases, basĂ©e sur les techniques du deep learning notamment lâune des techniques des rĂ©seaux de neurones convolutifs, lâapprentissage par transfert qui tirent parti des systĂšmes complexes avec des invariants permettant la modĂ©lisation et l'analyse efficace de grandes quantitĂ©s de donnĂ©es. Le principe consiste Ă entraĂźner les rĂ©seaux de neurones convolutifs Ă la classification automatique dâimages des agents pathogĂšnes. Par exemple dans notre cas dâĂ©tude, cette approche a Ă©tĂ© utilisĂ©e pour distinguer une image microscopique contenant le virus de lâĂ©pidĂ©mie de cholĂ©ra appelĂ© Vibrio cholerae dâune image microscopique contenant le virus de lâĂ©pidĂ©mie du paludisme appelĂ© Plasmodium. Ceci nous a permis dâobtenir un taux de rĂ©ussite de classification de 99%. Par la suite, lâidĂ©e est de dĂ©ployer cette solution de reconnaissance dâimages dâagents pathogĂšnes dans les microscopes portables intelligents pour les analyses de routine et applications de diagnostic mĂ©dical dans la gestion de situations de crise. Ce qui permettra de combler le manque de spĂ©cialistes en manipulation microscopique et un gain de temps considĂ©rable dans lâanalyse des prĂ©lĂšvements avec des mesures prĂ©cises favorisant lâaccomplissement du travail dans de meilleures conditions