Search CORE

14,856 research outputs found

A Novel Approach for Clustering of Heterogeneous Xml and HTML Data Using K-means

Author: Meena Saini, Yashwant Soni
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2017
Field of study

Data mining is a phenomenon of extraction of knowledgeable information from large sets of data. Now a day’s data will not found to be structured. However, there are different formats to store data either online or offline. So it added two other categories for types of data excluding structured which is semi structured and unstructured. Semi structured data includes XML etc. and unstructured data includes HTML and email, audio, video and web pages etc. In this paper data mining of heterogeneous data over Xml and HTML, implementation is based on extraction of data from text file and web pages by using the popular data mining techniques and final result will be after sentimental analysis of text, semi-structured documents that is XML files and unstructured data extraction of web page with HTML code, there will be an extraction of structure/semantic of code alone and also both structure and content.. Implementation of this paper is done using R is a programming language on Rstudio environment which commonly used in statistical computing, data analytics and scientific research. It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize, and present data

International Journal on Recent and Innovation Trends in Computing and Communication

A Novel Approach for Clustering of Heterogeneous Xml and HTML Data Using K-means

Author: Meena Saini, Yashwant Soni
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/12/2017
Field of study

Data mining is a phenomenon of extraction of knowledgeable information from large sets of data. Now a day�s data will not found to be structured. However, there are different formats to store data either online or offline. So it added two other categories for types of data excluding structured which is semi structured and unstructured. Semi structured data includes XML etc. and unstructured data includes HTML and email, audio, video and web pages etc. In this paper data mining of heterogeneous data over Xml and HTML, implementation is based on extraction of data from text file and web pages by using the popular data mining techniques and final result will be after sentimental analysis of text, semi-structured documents that is XML files and unstructured data extraction of web page with HTML code, there will be an extraction of structure/semantic of code alone and also both structure and content.. Implementation of this paper is done using R is a programming language on Rstudio environment which commonly used in statistical computing, data analytics and scientific research. It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize, and present data

International Journal on Recent and Innovation Trends in Computing and Communication

International Journal on Future Revolution in Computer Science & Communication Engineering

Scientometric Analysis of Technology & Innovation Management Literature

Author: Yildiz Kadir
Publication venue: AFIT Scholar
Publication date: 24/03/2016
Field of study

The management of technology and innovation has become an attractive and promising field within the management discipline. Therefore, much insight can be gained by reviewing the Technology & Innovation Management (TIM) research in leading TIM journals to identify and classify the key TIM issues by meta-categories and to identify the current trends. Based on a comprehensive scientometric analysis of 5,591 articles in 10 leading TIM specialty journals from 2005 to 2014, this research revealed several enlightening findings. First, the United States is the major producer of TIM research literature, and the greatest number of papers was published in Research Policy. Among the researchers in the field, M. Song is the most prolific author. Second, the TIM field often plays a bridging role in which the integration of ideas can be grouped into 10 clusters: innovation and firms, new product development (NPD) and marketing strategy, project management, patenting and industry, emerging technologies, science policy, social networks, system modeling and development, business strategy, and knowledge transfer. Third, the connectivity among these terms is highly clustered and a network-based perspective revealed that six new topic clusters are emerging: NPD, technology marketing, patents and intellectual property rights, university-industry cooperation, technology forecasting and roadmapping, and green innovation. Finally, chronological trend analysis of key terms indicates a change in emphasis in TIM research from information systems/technologies to the energy sector and green innovation. The results of the study improve our understanding of the structure of TIM as a field of practice and an academic discipline. This insight provides direction regarding future TIM research opportunities

AFTI Scholar (Air Force Institute of Technology)

New Approach for Market Intelligence Using Artificial and Computational Intelligence

Author: Yoseph Fahed
Publication venue: Åbo Akademi - Åbo Akademi University
Publication date: 01/01/2023
Field of study

Small and medium sized retailers are central to the private sector and a vital contributor to economic growth, but often they face enormous challenges in unleashing their full potential. Financial pitfalls, lack of adequate access to markets, and difficulties in exploiting technology have prevented them from achieving optimal productivity. Market Intelligence (MI) is the knowledge extracted from numerous internal and external data sources, aimed at providing a holistic view of the state of the market and influence marketing related decision-making processes in real-time. A related, burgeoning phenomenon and crucial topic in the field of marketing is Artificial Intelligence (AI) that entails fundamental changes to the skillssets marketers require. A vast amount of knowledge is stored in retailers’ point-of-sales databases. The format of this data often makes the knowledge they store hard to access and identify. As a powerful AI technique, Association Rules Mining helps to identify frequently associated patterns stored in large databases to predict customers’ shopping journeys. Consequently, the method has emerged as the key driver of cross-selling and upselling in the retail industry. At the core of this approach is the Market Basket Analysis that captures knowledge from heterogeneous customer shopping patterns and examines the effects of marketing initiatives. Apriori, that enumerates frequent itemsets purchased together (as market baskets), is the central algorithm in the analysis process. Problems occur, as Apriori lacks computational speed and has weaknesses in providing intelligent decision support. With the growth of simultaneous database scans, the computation cost increases and results in dramatically decreasing performance. Moreover, there are shortages in decision support, especially in the methods of finding rarely occurring events and identifying the brand trending popularity before it peaks. As the objective of this research is to find intelligent ways to assist small and medium sized retailers grow with MI strategy, we demonstrate the effects of AI, with algorithms in data preprocessing, market segmentation, and finding market trends. We show with a sales database of a small, local retailer how our Åbo algorithm increases mining performance and intelligence, as well as how it helps to extract valuable marketing insights to assess demand dynamics and product popularity trends. We also show how this results in commercial advantage and tangible return on investment. Additionally, an enhanced normal distribution method assists data pre-processing and helps to explore different types of potential anomalies.Små och medelstora detaljhandlare är centrala aktörer i den privata sektorn och bidrar starkt till den ekonomiska tillväxten, men de möter ofta enorma utmaningar i att uppnå sin fulla potential. Finansiella svårigheter, brist på marknadstillträde och svårigheter att utnyttja teknologi har ofta hindrat dem från att nå optimal produktivitet. Marknadsintelligens (MI) består av kunskap som samlats in från olika interna externa källor av data och som syftar till att erbjuda en helhetssyn av marknadsläget samt möjliggöra beslutsfattande i realtid. Ett relaterat och växande fenomen, samt ett viktigt tema inom marknadsföring är artificiell intelligens (AI) som ställer nya krav på marknadsförarnas färdigheter. Enorma mängder kunskap finns sparade i databaser av transaktioner samlade från detaljhandlarnas försäljningsplatser. Ändå är formatet på dessa data ofta sådant att det inte är lätt att tillgå och utnyttja kunskapen. Som AI-verktyg erbjuder affinitetsanalys en effektiv teknik för att identifiera upprepade mönster som statistiska associationer i data lagrade i stora försäljningsdatabaser. De hittade mönstren kan sedan utnyttjas som regler som förutser kundernas köpbeteende. I detaljhandel har affinitetsanalys blivit en nyckelfaktor bakom kors- och uppförsäljning. Som den centrala metoden i denna process fungerar marknadskorgsanalys som fångar upp kunskap från de heterogena köpbeteendena i data och hjälper till att utreda hur effektiva marknadsföringsplaner är. Apriori, som räknar upp de vanligt förekommande produktkombinationerna som köps tillsammans (marknadskorgen), är den centrala algoritmen i analysprocessen. Trots detta har Apriori brister som algoritm gällande låg beräkningshastighet och svag intelligens. När antalet parallella databassökningar stiger, ökar också beräkningskostnaden, vilket har negativa effekter på prestanda. Dessutom finns det brister i beslutstödet, speciellt gällande metoder att hitta sällan förekommande produktkombinationer, och i att identifiera ökande popularitet av varumärken från trenddata och utnyttja det innan det når sin höjdpunkt. Eftersom målet för denna forskning är att hjälpa små och medelstora detaljhandlare att växa med hjälp av MI-strategier, demonstreras effekter av AI med hjälp av algoritmer i förberedelsen av data, marknadssegmentering och trendanalys. Med hjälp av försäljningsdata från en liten, lokal detaljhandlare visar vi hur Åbo-algoritmen ökar prestanda och intelligens i datautvinningsprocessen och hjälper till att avslöja värdefulla insikter för marknadsföring, framför allt gällande dynamiken i efterfrågan och trender i populariteten av produkterna. Ytterligare visas hur detta resulterar i kommersiella fördelar och konkret avkastning på investering. Dessutom hjälper den utvidgade normalfördelningsmetoden i förberedelsen av data och med att hitta olika slags anomalier

National Library of Finland DSpace Services

Application of a text analytics model for the automatic discovery and classification of research trends in e-learning (the UOC case)

Author: Garcia Brustenga Guillem
López Ruiz José
Publication venue: 'Fundacio per la Universitat Oberta de Catalunya'
Publication date: 25/11/2019
Field of study

The review of scientific literature is a common task among researchers who wish to detect the trends that are influencing the evolution of a discipline or specific field of research. Obtaining such knowledge requires heavy and repetitive tasks of extraction, analysis, synthesis and classification of hundreds, and in some cases thousands, of bibliographic references. These first phases require a great effort on the part of the team of analysts to collect and order the results of the scientific evidence on which to base their conclusions, dedication that can be increased depending on the zoom of each search. In its capacity to observe the "world map" of knowledge to detect and reflect on the trends and events that are transforming online higher education in the world, the eLC integrates into its activity such tasks of documental review and prospective. With the intention of facilitating the path towards the discovery and analysis of the relevant objective information of each study, the centre has piloted a project consisting of automating part of this revision process based on the use of analytical technologies and text mining.La revisión de literatura científica, es una tarea habitual entre investigadores que deseen detectar las tendencias que están influyendo como palancas de cambio en la evolución de una disciplina o ámbito concreto de investigación. Obtener tal conocimiento requiere de pesadas y repetitivas tareas de extracción, análisis, síntesis y clasificación de cientos, y en algunos casos miles, de referencias bibliográficas. Estas primeras fases requieren de un gran esfuerzo por parte del equipo de analistas para recoger y ordenar resultados de la evidencia científica en los que basar sus conclusiones, dedicación, que puede incrementarse en función del zoom de cada búsqueda. En su facultad de observación del "mapamundi" del conocimiento para detectar y reflexionar acerca de las tendencias y acontecimientos que están transformando la educación superior en línea en el mundo, el eLC integra en su actividad tales tareas de revisión y prospectiva documental. Con la intención de facilitar el camino hacia el descubrimiento y análisis de la información relevante objetivo de cada estudio, el centro ha pilotado un proyecto consistente en automatizar parte de este proceso de revisión basándose en el uso de tecnologías de analítica y minería de texto.La revisió de literatura científica, és una tasca habitual entre investigadors que desitgen detectar les tendències que estan influint com a palanques de canvi en l'evolució d'una disciplina o àmbit concret de recerca. Obtenir tal coneixement requereix pesades i repetitives tasques d'extracció, anàlisi, síntesi i classificació de centenars, i en alguns casos milers, de referències bibliogràfiques. Aquestes primeres fases requereixen un gran esforç per part de l'equip d'analistes per recollir i ordenar resultats de l'evidència científica en els quals basar les seves conclusions, dedicació, que pot incrementar-se en funció del zoom de cada cerca. En la seva facultat d'observació del "mapamundi" del coneixement per detectar i reflexionar sobre les tendències i esdeveniments que estan transformant l'educació superior en línia en el món, l'eLC integra en la seva activitat aquestes tasques de revisió i prospectiva documental. Amb la intenció de facilitar el camí cap al descobriment i anàlisi de la informació rellevant objectiu de cada estudi, el centre ha pilotat un projecte consistent en automatitzar part d'aquest procés de revisió basant-se en l'ús de tecnologies d'analítica i mineria de text

The Oberta in open access

Global Research Trends and Hot Topics on Library and Information Science: A Bibliometric Analysis

Author: lotfi Mahshid
Ouchi Ali
Panahi Sirous, Associate Professor in Librarianship and Medical Information Department
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 13/05/2022
Field of study

Abstract Background and objective: One of the approaches to represent scientific publications in a field of science is to determine research trends and hot topics. Therefore, this study aimed to determine the research trends on the Library and Information Science (LIS) in the Scopus database during 2011-2020 and specify the hot topics in this field from July 2020 to July 2021. Materials and Methods: This study used scientometric techniques. The research population consisted of all papers in the field of LIS from July 2011 to July 2021. The data were collected from the Scopus database. The results were limited to 2011-2020 for determining the research trends in the field of LIS and from July 2020 to July 2021 for specifying the hot topics in this field. Data were analyzed using the word co-occurrence and social network analysis techniques, and UCINet, NetDraw, and VOSviewer software were used to draw scientific maps and identify core topics and individuals. Results: The keywords Systematic Review (frequency=531) and Bibliometrics (frequency=51) had the highest and lowest frequencies, respectively. Libraries and information technology (n=151), research methods (n=70), and databases (n=23) were the three important topic clusters in the study area, in which the United States, China, and the United Kingdom were the three most active countries, respectively. The Department of Library and Information Science, University of London, with 71 documents, and the Department of Information Management, University of Punjab, with 55 documents, had the most significant contribution of article publication among the influential institutions. Moreover, Zhang, Yut, and Wang, Liying each with 27 documents, and Li, Xiano with 24 documents were three active and influential authors in this field. In addition, systematic review , diffusion pattern , and bibliometric were also three hot topics. Conclusion: This study revealed that the orientation of the LIS research is going from traditional topics toward novel and emerging technologies. The results of this study can provide valuable information to researchers in LIS at the domestic and international levels

DigitalCommons@University of Nebraska

Text mining and it is development stages

Author: Madyarova Mukhlisa Abdulla qizi
Samiyeva Maftuna Faxriddin qizi
Publication venue: Open Science LLC
Publication date: 25/04/2023
Field of study

This paper is based on the process of designing text mining. And it illustrates how social mining is increasing in the ICT sector

Science and Education scientific journal

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref