27 research outputs found
Enhancing Data Integration and Automation in Datalakes through Large Language Model(LLM)-Driven ETL Pipelines
Data pipelines are essential to ingest raw data into the organizational infrastructure. Extract, Transform, Load (ETL) process plays a pivotal role in building a data pipeline. However, ETL data pipelines require manual maintenance and understanding of the underlying schema structure and complex metadata, costing time and resources. In this work, we presented a novel approach to address these issues along with enhancing data standardization across diverse data sources. Inspired by the rapid growth of Large Language Model (LLM)s in understanding underlying structural patterns in textual data, this approach was designed. We experimented in an optimized environment to fine-tune language models - Llama3 8B, Qwen2 7B, Mistral 7B, and Llama2 7B on two different enterprise datasets - AdventureWorks and NorthWind. We experimented in 3 scenarios with different hyperparameters and window sizes on each dataset with the language models selected to fine-tune and observe the effect of the parameters and the resulting behavior of the fine-tuned models. We achieved 96% accuracy in predicting schema with Llama3 8B, and Mistral 7B when data pattern is recognizable and 63% accuracy by Qwen2 7B models when patterns are unrecognizable. Without compromising the performance to achieve the best accuracy, we found Mistral 7B was the best candidate. Overall, this work
shows promising results to replace manual work and adaptation of LLMs in automation and maintenance by adapting new data conditions in industrial use. Keywords: ETL pipeline, fine-tuning LLMs, optimization, data standardization, structural data adaptation
Study on epidemiology, risk factors and clinical characteristics of triple negative breast cancer in Bangladesh
Breast cancer is the second most common type of cancer in Bangladesh. Although significant improvement has been made in breast cancer treatment and management, Triple Negative Breast Cancer (TNBC) is still the least known breast cancer subtype in this country. TNBC is well known for its aggressive nature and limited therapeutic options when compared to other breast cancer subtypes. Several population-based studies indicated high prevalence of TNBC in African women in addition to few recent studies indicating a growing number of TNBC patients among Asian population. However, there is a lack of evidence on TNBC patients in Bangladesh due to limited knowledge and awareness. In this paper we review the epidemiology, general risk factors and clinical characteristics of TNBC to find out the correlation between TNBC and other conventional breast cancer subtypes in Bangladesh. Some diagnostic and therapeutic approaches as well as future novel solutions for TNBC are also discussed to understand the pathologic process and treatment strategies of TNBC. Literature review reveals that, there is a lack of TNBC studies in Bangladesh. Therefore, more investigations should be carried out to address the degree of vulnerability of TNBC in breast cancer patients of Bangladesh
Evaluation of serum lipoprotein(a) level in type 2 diabetic patients and non-diabetic people
Background: Type 2 diabetes mellitus has high morbidity and results in increased risk of mortality mainly due to cardiovascular diseases. Different factors have been found to be responsible for the increased prevalence of coronary artery disease in T2DM. One of these factors includes raised serum level of lipoprotein(a) (Lp(a)). The purpose of the present study is to assess the serum level of Lp(a) in type 2 diabetes mellitus patients and non-diabetic people with find the difference of serum Lp(a) between good and poor glycaemic control.Methods: This cross-sectional study was carried out in the department of Biochemistry and Endocrinology, Chittagong Medical College Hospital, Chattogram from July 2017 to June 2018. We assess a total of 100 type 2 diabetic patients and a group of 50 non-diabetic people with the age range 31-60 years. Blood samples were collected in fasting state and analysed for FPG, HbA1c%, serum lipid profile (TC, TG, LDL & HDL) and Lp(a). Data were analysed by T-test and chi-square test.Results: The serum Lp(a) levels were significantly elevated in type 2 diabetic patients compared to non-diabetic people (44.32±2.6 vs. 13.02±0.81). There were also significant difference of serum Lp(a) between good and poor glycaemic control.Conclusions: Lp(a) is an independent risk factor for atherosclerosis and has elevated level in diabetic patients. So, selective screening with lowering its concentration would help prevention of coronary artery disease, a known cause of death in diabetic patients
Social Safety Net Programme as a Mean to Alleviate Poverty in Bangladesh
The Social Safety Net Programmes (SSNPs) play a key role in Bangladesh to protect the poor households from poverty and vulnerability. Either income poverty or human poverty is responsible for the prevalence of poverty in Bangladesh. The causes of being poor differ across individuals. Taking into account all these factors, the government of Bangladesh is trying to trim down poverty by executing various types of social safety net programmes since her emergence as a new nation. The Government of Bangladesh allocates significant amount of money in the budget to implement various social safety net measures with the motive to attenuate the degree of poverty. The major social safety net programmes in Bangladesh can be divided under two broad categories: (i) social protection measures; (ii) social empowerment measures. All these measures intend to facilitate education, health, vulnerability reduction, employment creation, risk reduction etc. To attain the goal of poverty reduction for the overall welfare of the society better targeting of beneficiaries and better monitoring and supervision must be ensured. There is need for a comprehensive macroeconomic policy response and strong programme management to make the SSNPs work efficiently. Keywords: Poverty, Social safety net, Bangladesh
Nutritionally important starch fractions and estimated glycemic index of selected South Indian rice varieties
The nutritionally important starch fractions and in vitro starch digestibility index (SDI) were studied in three commercially available rice varieties and a millet which were subjected to four different cooking methods to validate the claim of low glycemic index. The Hydrolysis index was analyzed to compute Estimated Glycemic Index (EGI) and correlated with SDI. In addition, carbohydrate profile, amylose content, the degree of gelatinization and ultra-structural analysis were also done. The starch fractions differed according to the cooking methods. Samples with high Rapidly Available Glucose (RAG) showed higher Starch Digestibility Index (SDI). The SDI ranged from 17-46, samples cooked by pressure and steaming method had higher SDI. The degree of gelatinization (DG) correlated with total starch (TS) content. The Estimated Glycemic Index ranged from 53-65 categorizing them as medium GI foods. The nutritional properties of rice starch fractions are of immense interest due to their digestion characteristics (slowly digested and absorbed) and therefore, the identification of foods with low glycemic index and low RDS and SDI values could be useful for target population
Tea and Tea Product Diversification: A Review
Tea is the most consumed drink after water as well as is one of the prevalent and the cheapest beverage which consumed globally. Tea is considered a healthy beverage due to the presence of several antioxidants and minerals such as potassium, magnesium, calcium & manganese. Different kinds of teas are manufactured in different countries based on taste, habit and culture of the people. Normally, tea can be categorized into three groups: green tea (unfermented), Oolong tea (partially fermented) and black tea (fully fermented) based on tea processing. Tea is a rich source of polyphenols and now-a-days interest in the possible health benefits of polyphenols, particularly flavonoids, has increased owing to their antioxidant and free-radical scavenging abilities. The rising demand of tea is considered one of the significant components for the worldwide beverage market growth. Tea industry makes a vital contribution to the economy of the respective tea producing countries like China, Japan, India, Sri Lanka, Bangladesh, Kenya etc. At the present time with the rising demand of tea it is needed to emphasize for exploring alternative means of increasing profits from tea cultivation. The tea market price is low in Different countries like Srilanka, India, Bangladesh, Kenya etc. comprises with high cost of production. For this reason, there is no alternatives rather than product diversification of tea through value addition which can be an important approach to mitigate the impacts of low market price and high production costs. This review broadly focuses on the issues leading to the development of wide range of tea and tea product diversification. This paper is also associated health benefits with different types of tea, nutraceutical beverage, confectionary items, toiletries and cosmeceuticals which being commercialized in different parts of the world which are gaining consumer acceptance and also face the challenges of global marketing by tea industries that’s are described in this paper
A practical guide to cross-cultural and multi-sited data collection in the biological and behavioural sciences.
Researchers in the biological and behavioural sciences are increasingly conducting collaborative, multi-sited projects to address how phenomena vary across ecologies. These types of projects, however, pose additional workflow challenges beyond those typically encountered in single-sited projects. Through specific attention to cross-cultural research projects, we highlight four key aspects of multi-sited projects that must be considered during the design phase to ensure success: (1) project and team management; (2) protocol and instrument development; (3) data management and documentation; and (4) equitable and collaborative practices. Our recommendations are supported by examples from our experiences collaborating on the Evolutionary Demography of Religion project, a mixed-methods project collecting data across five countries in collaboration with research partners in each host country. To existing discourse, we contribute new recommendations around team and project management, introduce practical recommendations for exploring the validity of instruments through qualitative techniques during piloting, highlight the importance of good documentation at all steps of the project, and demonstrate how data management workflows can be strengthened through open science practices. While this project was rooted in cross-cultural human behavioural ecology and evolutionary anthropology, lessons learned from this project are applicable to multi-sited research across the biological and behavioural sciences
Reproductive inequality in humans and other mammals
To address claims of human exceptionalism, we determine where humans fit within the greater mammalian distribution of reproductive inequality. We show that humans exhibit lower reproductive skew (i.e., inequality in the number of surviving offspring) among males and smaller sex differences in reproductive skew than most other mammals, while nevertheless falling within the mammalian range. Additionally, female reproductive skew is higher in polygynous human populations than in polygynous nonhumans mammals on average. This patterning of skew can be attributed in part to the prevalence of monogamy in humans compared to the predominance of polygyny in nonhuman mammals, to the limited degree of polygyny in the human societies that practice it, and to the importance of unequally held rival resources to women’s fitness. The muted reproductive inequality observed in humans appears to be linked to several unusual characteristics of our species—including high levels of cooperation among males, high dependence on unequally held rival resources, complementarities between maternal and paternal investment, as well as social and legal institutions that enforce monogamous norms
A practical guide to cross-cultural and multi-sited data collection in the biological and behavioural sciences
Researchers in the biological and behavioural sciences are increasingly conducting collaborative, multi-sited projects to address how phenomena vary across ecologies. These types of projects, however, pose additional workflow challenges beyond those typically encountered in single-sited projects. Through specific attention to cross-cultural research projects, we highlight four key aspects of multi-sited projects that must be considered during the design phase to ensure success: (1) project and team management; (2) protocol and instrument development; (3) data management and documentation; and (4) equitable and collaborative practices. Our recommendations are supported by examples from our experiences collaborating on the Evolutionary Demography of Religion project, a mixed-methods project collecting data across five countries in collaboration with research partners in each host country. To existing discourse, we contribute new recommendations around team and project management, introduce practical recommendations for exploring the validity of instruments through qualitative techniques during piloting, highlight the importance of good documentation at all steps of the project, and demonstrate how data management workflows can be strengthened through open science practices. While this project was rooted in cross-cultural human behavioural ecology and evolutionary anthropology, lessons learned from this project are applicable to multi-sited research across the biological and behavioural sciences