16,069 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Information technology and distance learning aspects of materials databases

    Get PDF
    Distance learning is a flourishing area, with the number of programs, provided via remote delivery, increasing daily. At the same time, however, progress in the field of accessibility and support for learners, who want to pursue education and training in the area of materials science and engineering, has not always kept pace. An individual interested in taking a materials course, or in simply finding out what is available, may find himself/herself forced to locate and then plough through many unwieldy online course listings. This may discourage many learners from pursuing the distance learning option. In recent years, more and more distance learning databases have been developed and made available on the World Wide Web. These distance-learning databases are aimed to offer an information pool on many courses and programs that are available online and to cater for users' specific needs of locating information. This need is equally applicable to the area of materials science and engineering. The purpose of this current research has been to explore Information and Technology aspects of materials databases and closely study distance learning aspects. [Continues.

    A Shared Ontology Approach to Semantic Representation of BIM Data

    Get PDF
    Architecture, engineering, construction and facility management (AEC-FM) projects involve a large number of participants that must exchange information and combine their knowledge for successful completion of a project. Currently, most of the AEC-FM domains store their information about a project in text documents or use XML, relational, or object-oriented formats that make information integration difficult. The AEC-FM industry is not taking advantage of the full potential of the Semantic Web for streamlining sharing, connecting, and combining information from different domains. The Semantic Web is designed to solve the information integration problem by creating a web of structured and connected data that can be processed by machines. It allows combining information from different sources with different underlying schemas distributed over the Internet. In the Semantic Web, all data instances and data schema are stored in a graph data store, which makes it easy to merge data from different sources. This paper presents a shared ontology approach to semantic representation of building information. The semantic representation of building information facilitates finding and integrating building information distributed in several knowledge bases. A case study demonstrates the development of a semantic based building design knowledge base

    Historical Places In Malacca (Enhancement Of Maps Manipulation Capability Through The Website) Using MySQL

    Get PDF
    The motivation to be involved with the field of study regarding GIS, has been emerging in a fast pace in these few years.Much research had been done and performed, giving tremendous and beneficial results towards this field. But most ofthe GIS applications were developed usingthe vendors ownproprietary database, in which, this could promote many problems. Geographic Information Systems alsoknown as GIS, are all about gathering data andthen building layers upon layers of this dataand then displaying them on a computer screen. The aim and the objective of the study done through this paper wouldbe in usingMySQLfor developing a GIS application, thus showingMySQL's ability for supporting GIS-based data, or in the otherword, the spatial data. While the main objective in doing the study and developing the particular system is mainly using MySQL in managing the spatial data, the otherintegral objectives which comes along with this project are, providing better features and quality spatial data features from the system for the users and also enhancing the capability of manipulating the maps, which are provided through the system. TheMethodology beingused in developing this project is according to the RAD Methodology, which involved the stages such as Requirement Planning, User Design, Construction, and Implementation. These stages would be further discussed through Chapter 3 of Methodology and Projectwork. And as for the Conclusion, which could be derived from the entire project, from the research being done, it could be seenthat, MySQL is able to support in the development of any GIS - based application through thenew released of its database which also included the spatial data management ability

    Migrating existing multimedia courseware to Moodle

    Get PDF
    Open source course management systems offer increased flexibility for instructors and instructional designers. Communities can influence the development of these systems and on an individual basis, the possibility to modify the system software exists. Migrating existing courseware to these systems can therefore be beneficial, sometimes even required. We report here about our experience in migrating an existing courseware system consisting of multimedia content and interactive, integrated infrastructure functionality to an open source course management system called Moodle. We will assess the difficulties that we have encountered during this process and, discuss the importance of standards in this context, and we aim to provide other instructors or instructional designers with guidelines and assessment support for other migration projects

    Towards semantic web mining

    Get PDF
    Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable

    Developing an online database of experts for the Worcester Regional Chamber of Commerce

    Get PDF
    The Worcester Regional Chamber of Commerce as part of their mission to attract business to the Worcester area, want to create an online searchable database of industry experts made up of faculty members of the Colleges and Universities in the Worcester area. This online database will be placed on the Worcester Regional Chamber of Commerce Higher Education – Business Partnership page on their website. The limitations placed on this request are that the Regional Chamber as of this moment have no monetary or Information Technologies resources to provide for the realization of this request. The proliferation of as A Service Information technology offerings provide a number of options for satisfying the request for an online searchable database of individuals, and some services are geared more specifically for this type of need and are intended for the nonprofit sector as well. The recommendation of this report is for the Worcester regional Chamber of Commerce to consider these options even if it requires a small investment of funds on their part

    Building a scalable index and a web search engine for music on the Internet using Open Source software

    Get PDF
    The Internet has made possible the access to thousands of freely available music tracks with Creative Commons or Public Domain licenses. Actually, this number keeps growing every year. In practical terms, it is very difficult to browse this music collection, because it is wide and disperse in hundreds of websites. To address the music recommendation issue, a case study on existing systems was made, to put the problem in context in order to identify necessary building blocks. This thesis is mainly focused on the problem of indexing this large collection of music. The reason to focus on this problem, is that there is no database or index holding information about this music material, thus making this research on the subject extremely difficult. In order to figure out what software could help solve this problem, the state of the art in “Open Source tools for web crawling and indexing” was assessed. Based on the conclusions from the state of the art, a prototype was developed and implemented using the most appropriate software framework. The created solution proved it was capable of crawling the web pages, while parsing and indexing MP3 files. The produced index is available through a web search engine interface also producing results in XML format. The results obtained lead to the conclusion that it is attainable to build a scalable index and web search engine for music in the Internet using Open Source software. This is supported by the proof of concept achieved with the working prototype.A Internet tornou possível o acesso a milhares de faixas musicais disponíveis gratuitamente segundo uma licença Creative Commons ou de Domínio Público. Na realidade, este número continua a aumentar em cada ano. Em termos práticos, é muito difícil navegar nesta colecção de música, pois a mesma é vasta e encontra-se dispersa em milhares de sites na Web. Para abordar o assunto da recomendação de música, um caso de estudo sobre sistemas de recomendação de música existentes foi elaborado, para contextualizar o problema e identificar os grandes blocos que os constituem. Esta tese foca-se na problemática da indexação de uma grande colecção de música, pela razão de que, não existe uma base de dados ou índice que contenha informação sobre este repositório musical, tornando muito difícil o estudo nesta matéria. De forma a compreender que software poderia ajudar a resolver o problema, foi avaliado o estado da arte em ferramentas de rastreio de conteúdos web e indexação de código aberto. Com base nas conclusões do estado da arte, o protótipo foi desenvolvido e implementado, utilizando o software mais apropriado para a tarefa. A solução criada provou que era possível percorrer as páginas Web, enquanto se analisavam e indexavam MP3. O índice produzido encontra-se disponível através de um motor de busca online e também com resultados no formato XML. Os resultados obtidos levam a concluir que é possível, construir um índice escalável e motor de busca na web para música na Internet utilizando software Open Source. Estes resultados são fundamentados pela prova de conceito obtida com o protótipo funcional

    A Strategy for Improving Performance On a Sharepoint Social Computing Portal

    Get PDF
    An important usability rule for any web site is the concept of speed. Failing to provide prompt pages and data will result in a negative view of the site and ultimately a lack of usership. In spite of this, many organizations implement web sites without a clear strategy regarding performance. This project explores three database strategies to consider when deploying a Microsoft SharePoint website with a social computing usage style. Although all of the strategies do not provide significant performance gains, the study illuminates several important factors that will increase performance in sites that use other usage styles. To properly explore each database strategy, specially designed tests were executed against a medium-size SharePoint server farm. The website performance statistics were recorded and compared to measure the effect of different configurations. The performance statistics showed a performance increase when site collections per database are limited to a specific amount. It was also discovered that large SharePoint content databases do not directly affect performance assuming three specific conditions are met. The third concept that was studied indicated that the implementation of external BLOB storage will increase performance assuming the average file size in the database is fairly large
    corecore