26 research outputs found

    A Pointillism Approach for Natural Language Processing of Social Media

    Get PDF
    Natural language processing tasks typically start with the basic unit of words, and then from words and their meanings a big picture is constructed about what the meanings of documents or other larger constructs are in terms of the topics discussed. Social media is very challenging for natural language processing because it challenges the notion of a word. Social media users regularly use words that are not in even the most comprehensive lexicons. These new words can be unknown named entities that have suddenly risen in prominence because of a current event, or they might be neologisms newly created to emphasize meaning or evade keyword filtering. Chinese social media is particularly challenging. The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. Thus, even knowing what the boundaries of words are in a social media corpus is a difficult proposition. For these reasons, in this document I propose the Pointillism approach to natural language processing. In the pointillism approach, language is viewed as a time series, or sequence of points that represent the grams\u27 usage over time. Time is an important aspect of the Pointillism approach. Detailed timing information, such as timestamps of when posts were posted, contain correlations based on human patterns and current events. This timing information provides the necessary context to build words and phrases out of trigrams and then group those words and phrases into topical clusters. Rather than words that have individual meanings, the basic unit of the pointillism approach is trigrams of characters. These grams take on meaning in aggregate when they appear together in a way that is correlated over time. I anticipate that the pointillism approach can perform well in a variety of natural language processing tasks for many different languages, but in this document my focus is on trend analysis for Chinese microblogging. Microblog posts have a timestamp of when posts were posted, that is accurate to the minute or second (though, in this dissertation, I bin posts by the hour). To show that trigrams supplemented with frequency information do collect scattered information into meaningful pieces, I first use the pointillism approach to extract phrases. I conducted experiments on 4-character idioms, a set of 500 phrases that are longer than 3 characters taken from the Chinese-language version of Wiktionary, and also on Weibo\u27s hot keywords. My results show that when words and topics do have a meme-like trend, they can be reconstructed from only trigrams. For example, for 4-character idioms that appear at least 99 times in one day in my data, the unconstrained precision (that is, precision that allows for deviation from a lexicon when the result is just as correct as the lexicon version of the word or phrase) is 0.93. For longer words and phrases collected from Wiktionary, including neologisms, the unconstrained precision is 0.87. I consider these results to be very promising, because they suggest that it is feasible for a machine to reconstruct complex idioms, phrases, and neologisms with good precision without any notion of words. Next, I examine the potential of the pointillism approach for extracting topical trends from microblog posts that are related to environmental issues. Independent Component Analysis (ICA) is utilized to find the trigrams which have the same independent signal source, i.e., topics. Contrast this with probabilistic topic models, which leverage co-occurrence to classify the documents into the topics they have learned, so it is hard for it to extract topics in real-time. However, pointillism approach can extract trends in real-time, whether those trends have been discussed before or not. This is more challenging because in phrase extraction, order information is used to narrow down the candidates, whereas for trend extraction only the frequency of the trigrams are considered. The proposed approach is compared against a state of the art topic extraction technique, Latent Dirichlet Allocation (LDA), on 9,147 labelled posts with timestamps. The experimental results show that the highest F1 score of the pointillism approach with ICA is 4% better than that of LDA. Thus, using the pointillism approach, the colorful and baroque uses of language that typify social media in challenging languages such as Chinese may in fact be accessible to machines. The thesis that my dissertation tests is this: For topic extraction for scenarios where no adequate lexicon is available, such as social media, the Pointillism approach uses timing information to out-perform traditional techniques that are based on co-occurrence

    China-Malawi Cooperation: Hope for Malawi's Climate-Resilient Infrastructure

    Get PDF
    Infrastructure development has been a crucial component of development in the global South. Any disruptions to infrastructure can negatively impact other aspects of society such as environmental, social, and economic development. Unfortunately, Malawi has suffered from a series of climate change-related natural disasters such as Floods and Tropical cyclones for several consecutive years. These disasters have not only killed many people but also damaged infrastructures such as roads, buildings, railways and bridges. The rehabilitation of such damages is costly and creates unexpected economic pressures on the national economy, eventually disturbing the macroeconomic parameters and worsening the welfare of people. For these reasons, the South-South Cooperation particularly, the China- Malawi Cooperation considers climate-resilient infrastructure as one of the important areas of cooperation. However, there is no empirical evidence to track the progress registered so far. It is against this background that we conducted this review study through a systematic literature search to appreciate if the cooperation is providing any hope to climate resilient infrastructure for Malawi. The study recommends that cooperation provide sufficient discussions and agreements on the type of infrastructure that needs priority, quality standards for climate-resistant infrastructure, and investment in alternative energy sources besides hydro-power. This is crucial for Malawi to have resilient infrastructure that can withstand climate change impacts

    Assessment of Bioflocculant Production by Bacillus sp. Gilbert, a Marine Bacterium Isolated from the Bottom Sediment of Algoa Bay

    Get PDF
    The bioflocculant-producing potentials of a marine bacteria isolated from the bottom sediment of Algoa Bay was investigated using standard methods. The 16S rDNA sequence analysis revealed 98% similarity to that of Bacillus sp. HXG-C1 and the nucleotide sequence was deposited in GenBank as Bacillus sp. Gilbert with accession number HQ537128. Bioflocculant was optimally produced when sucrose (72% flocculating activity) and ammonium chloride (91% flocculating activity) were used as sole sources of carbon and nitrogen, respectively; an initial pH 6.2 of the production medium; and Mg2+ as cation. Chemical analysis of the purified bioflocculant revealed the compound to be a polysaccharide

    Studies on Bioflocculant Production by Arthrobacter sp. Raats, a Freshwater Bacteria Isolated from Tyume River, South Africa

    Get PDF
    A bioflocculant-producing bacteria was isolated from Tyume River in the Eastern Cape Province, South Africa and identified by 16S rRNA gene nucleotide sequence to have 91% similarity to Arthrobacter sp. 5J12A, and the nucleotide sequence was deposited in GenBank as Arthrobacter sp. Raats (accession number HQ875723). The bacteria produced an extracellular bioflocculant when grown aerobically in a production medium containing glucose as sole carbon source and had an initial pH of 7.0. Influences of carbon, nitrogen and metal ions sources, as well as initial pH on flocculating activity were investigated. The bacteria optimally produced the bioflocullant when lactose and urea were used as sole sources of carbon and nitrogen respectively with flocculating activities of 75.4% and 83.4% respectively. Also, the bacteria produced the bioflocculant optimally when initial pH of the medium was 7.0 (flocculating activity 84%), and when Mg2+ was used as cation (flocculating activity 77%). Composition analyses indicated the bioflocculant to be principally a glycoprotein made up of about 56% protein and 25% total carbohydrate

    Dampak pengembangan pantai dengan kearifan lokal dalam meningkatkan pendapatan masyarakat sekitar

    No full text
    Tourism is seen as an important sector in the development of the world economy. Village tourism is one form of implementation of community-based and sustainable tourism development as a generator of regional economic growth, and as a poverty alleviation tool. This research aims to analyze the impact of beach development with local wisdom in increasing the income of the surrounding community (case study on Sebalang beach Kalianda South Lampung). The type of research used is qualitative research with data analysis techniques through observation, interviews, questionnaires, documentation, and data analysis. The results of the analysis show that in the development of Sebalang beach, local wisdom values are the main capital in building human creativity that has economic value and can increase people's income without damaging the social order and the surrounding natural environment.Pariwisata dipandang sebagai sektor penting dalam pengembangan ekonomi dunia. Desa wisata menjadi salah satu bentuk penerapan pembangunan pariwisata yang berbasis masyarakat dan berkelanjutan sebagai generator pertumbuhan ekonomi wilayah, dan sebagai alat pengentasan kemiskinan. Penelitian ini bertujuan untuk menganalisis dampak pengembangan pantai dengan kearifan lokal dalam meningkatkan pendapatan masyarakat sekitar (studi kasus di pantai Sebalang Kalianda Lampung Selatan). Jenis penelitian yang digunakan adalah penelitian kualitatif dengan tekhnik analisa data melalui observasi, wawancara, kuesioner, dokumentasi, dan analisa data. Hasil analisis menunjukkan bahwa dalam pengembangan pantai Sebalang, nilai-nilai kearifan lokal merupakan modal utama dalam membangun kreatifitas manusia yang memiliki nilai ekonomi dan dapat meningkatkan pendapatan masyarakat tanpa merusak tatanan social dan lingkungan alam sekitarnya

    Liquefaction of sunflower husks for biochar production

    No full text
    MSc (Engineering Sciences in Chemical Engineering), North-West University, Potchefstroom Campus, 2014Biochar, a carbon-rich and a potential solid biofuel, is produced during the liquefaction of biomass. Biochar can be combusted for heat and power, gasified, activated for adsorption applications, or applied to soils as a soil amendment and carbon sequestration agent. It is very important and advantageous to produce biochar under controlled conditions so that most of the carbon is converted. The main objective of the study was to investigate the effect of solvents, reaction temperature and reaction atmosphere on biochar production during the liquefaction of sunflower husks. The liquefaction of sunflower husks was initially investigated in the presence of different solvents (water, methanol, ethanol, iso-propanol and n-butanol) to study the effect of solvents on biochar yields. The experiments were carried out in an SS316 stainless steel high pressure autoclave at 280Β°C, 30 wt.% biomass loading in a solvent and starting pressure of 10 bar. Secondly, sunflower husks were liquefied at various temperatures (240-320Β°C) to assess the influence of reaction temperature on the biochar yield. Experiments were carried out under either a carbon dioxide or nitrogen atmosphere with a residence time of 30 minutes. Biochar samples obtained from sunflower husk liquefaction were structurally characterised by scanning electron microscopy (SEM) and Brunauer-Emmet-Teller (BET) analysis to compare surface morphological changes and pore structural changes at different reaction temperatures. Compositional analysis was done on sunflower husk biochar samples by proximate analysis, Fourier-transform infrared (FT-IR) spectroscopy, X-ray diffraction (XRD) and Elemental analysis. The results showed that biochar produced through the liquefaction of sunflower husks was significantly affected by the type of solvent used. The highest biochar yields were obtained when ethanol was used (57.35 wt. %) and the lowest yields were obtained when n-butanol was used as a solvent (41.5 wt. %). A temperature of 240Β°C was found to produce the highest biochar yield (64 wt. %). However, biochar yields decreased with increasing liquefaction temperature and the lowest yield was 41wt. % at 320Β°C. Temperature had the most significant influence on biochar yield in an Nβ‚‚ atmosphere, while solvent choice had the most significant influence on biochar yield in a COβ‚‚ atmosphere. Temperature also had an effect on the structure of biomass, as the SEM analysis shows the biochar became more porous with increasing temperature. Generally, results from the COβ‚‚ adsorption analysis, suggested that COβ‚‚ develops microporosity to a greater extent than Nβ‚‚ reaction. The results of sunflower husk compositional analysis show that sunflower husks contain a high lignin content (34.17 wt. %), of which the high lignin content in biomass is associated with high heating value and high solid yield product. Sunflower husks as waste product can be used to produce useful products such as biochar through liquefaction, and biochar can be used to generate heat and as a soil amendment due to its high heating value and high porosity. While these preliminary studies appear promising for the conversion of sunflower husks to biochar, further studies are needed.Master

    Rediscovery of lost early Royal Society papers on the Alkahest

    No full text