16,069 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Information technology and distance learning aspects of materials databases
Distance learning is a flourishing area, with the number of programs, provided via
remote delivery, increasing daily. At the same time, however, progress in the field of
accessibility and support for learners, who want to pursue education and training in the
area of materials science and engineering, has not always kept pace. An individual
interested in taking a materials course, or in simply finding out what is available, may
find himself/herself forced to locate and then plough through many unwieldy online
course listings. This may discourage many learners from pursuing the distance learning
option.
In recent years, more and more distance learning databases have been developed and
made available on the World Wide Web. These distance-learning databases are aimed to
offer an information pool on many courses and programs that are available online and to
cater for users' specific needs of locating information. This need is equally applicable to
the area of materials science and engineering.
The purpose of this current research has been to explore Information and Technology
aspects of materials databases and closely study distance learning aspects. [Continues.
A Shared Ontology Approach to Semantic Representation of BIM Data
Architecture, engineering, construction and facility management (AEC-FM) projects involve a large number of participants that must exchange information and combine their knowledge for successful completion of a project. Currently, most of the AEC-FM domains store their information about a project in text documents or use XML, relational, or object-oriented formats that make information integration difficult. The AEC-FM industry is not taking advantage of the full potential of the Semantic Web for streamlining sharing, connecting, and combining information from different domains. The Semantic Web is designed to solve the information integration problem by creating a web of structured and connected data that can be processed by machines. It allows combining information from different sources with different underlying schemas distributed over the Internet. In the Semantic Web, all data instances and data schema are stored in a graph data store, which makes it easy to merge data from different sources. This paper presents a shared ontology approach to semantic representation of building information. The semantic representation of building information facilitates finding and integrating building information distributed in several knowledge bases. A case study demonstrates the development of a semantic based building design knowledge base
Historical Places In Malacca (Enhancement Of Maps Manipulation Capability Through The Website) Using MySQL
The motivation to be involved with the field of study regarding GIS, has been emerging
in a fast pace in these few years.Much research had been done and performed, giving
tremendous and beneficial results towards this field. But most ofthe GIS applications
were developed usingthe vendors ownproprietary database, in which, this could promote
many problems. Geographic Information Systems alsoknown as GIS, are all about
gathering data andthen building layers upon layers of this dataand then displaying them
on a computer screen. The aim and the objective of the study done through this paper
wouldbe in usingMySQLfor developing a GIS application, thus showingMySQL's
ability for supporting GIS-based data, or in the otherword, the spatial data. While the
main objective in doing the study and developing the particular system is mainly using
MySQL in managing the spatial data, the otherintegral objectives which comes along
with this project are, providing better features and quality spatial data features from the
system for the users and also enhancing the capability of manipulating the maps, which
are provided through the system. TheMethodology beingused in developing this project
is according to the RAD Methodology, which involved the stages such as Requirement
Planning, User Design, Construction, and Implementation. These stages would be further
discussed through Chapter 3 of Methodology and Projectwork. And as for the
Conclusion, which could be derived from the entire project, from the research being
done, it could be seenthat, MySQL is able to support in the development of any GIS -
based application through thenew released of its database which also included the spatial
data management ability
Migrating existing multimedia courseware to Moodle
Open source course management systems offer increased flexibility for instructors and instructional designers. Communities can influence the development of these systems and on an individual basis, the possibility to modify the system software exists. Migrating existing courseware to these systems can therefore be beneficial, sometimes even required. We report here about our experience in migrating an existing courseware system consisting of multimedia content and interactive, integrated infrastructure functionality to an open source course management system called Moodle. We will assess the difficulties that we have encountered during this process and, discuss the importance of standards in this context, and we aim to provide other instructors or instructional designers with guidelines and assessment support for other migration projects
Towards semantic web mining
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable
Developing an online database of experts for the Worcester Regional Chamber of Commerce
The Worcester Regional Chamber of Commerce as part of their mission to attract business to the Worcester area, want to create an online searchable database of industry experts made up of faculty members of the Colleges and Universities in the Worcester area. This online database will be placed on the Worcester Regional Chamber of Commerce Higher Education – Business Partnership page on their website. The limitations placed on this request are that the Regional Chamber as of this moment have no monetary or Information Technologies resources to provide for the realization of this request.
The proliferation of as A Service Information technology offerings provide a number of options for satisfying the request for an online searchable database of individuals, and some services are geared more specifically for this type of need and are intended for the nonprofit sector as well. The recommendation of this report is for the Worcester regional Chamber of Commerce to consider these options even if it requires a small investment of funds on their part
Building a scalable index and a web search engine for music on the Internet using Open Source software
The Internet has made possible the access to thousands of freely available music tracks
with Creative Commons or Public Domain licenses. Actually, this number keeps growing
every year.
In practical terms, it is very difficult to browse this music collection, because it is wide
and disperse in hundreds of websites.
To address the music recommendation issue, a case study on existing systems was
made, to put the problem in context in order to identify necessary building blocks.
This thesis is mainly focused on the problem of indexing this large collection of
music. The reason to focus on this problem, is that there is no database or index holding
information about this music material, thus making this research on the subject extremely
difficult.
In order to figure out what software could help solve this problem, the state of the art
in “Open Source tools for web crawling and indexing” was assessed.
Based on the conclusions from the state of the art, a prototype was developed and
implemented using the most appropriate software framework. The created solution proved it
was capable of crawling the web pages, while parsing and indexing MP3 files. The produced
index is available through a web search engine interface also producing results in XML
format.
The results obtained lead to the conclusion that it is attainable to build a scalable index
and web search engine for music in the Internet using Open Source software. This is
supported by the proof of concept achieved with the working prototype.A Internet tornou possível o acesso a milhares de faixas musicais disponíveis
gratuitamente segundo uma licença Creative Commons ou de Domínio Público. Na realidade,
este número continua a aumentar em cada ano.
Em termos práticos, é muito difícil navegar nesta colecção de música, pois a mesma é
vasta e encontra-se dispersa em milhares de sites na Web.
Para abordar o assunto da recomendação de música, um caso de estudo sobre sistemas
de recomendação de música existentes foi elaborado, para contextualizar o problema e
identificar os grandes blocos que os constituem.
Esta tese foca-se na problemática da indexação de uma grande colecção de música,
pela razão de que, não existe uma base de dados ou índice que contenha informação sobre este
repositório musical, tornando muito difícil o estudo nesta matéria.
De forma a compreender que software poderia ajudar a resolver o problema, foi
avaliado o estado da arte em ferramentas de rastreio de conteúdos web e indexação de código
aberto.
Com base nas conclusões do estado da arte, o protótipo foi desenvolvido e
implementado, utilizando o software mais apropriado para a tarefa. A solução criada provou
que era possível percorrer as páginas Web, enquanto se analisavam e indexavam MP3. O
índice produzido encontra-se disponível através de um motor de busca online e também com
resultados no formato XML.
Os resultados obtidos levam a concluir que é possível, construir um índice escalável e
motor de busca na web para música na Internet utilizando software Open Source. Estes
resultados são fundamentados pela prova de conceito obtida com o protótipo funcional
A Strategy for Improving Performance On a Sharepoint Social Computing Portal
An important usability rule for any web site is the concept of speed. Failing to provide prompt pages and data will result in a negative view of the site and ultimately a lack of usership. In spite of this, many organizations implement web sites without a clear strategy regarding performance. This project explores three database strategies to consider when deploying a Microsoft SharePoint website with a social computing usage style. Although all of the strategies do not provide significant performance gains, the study illuminates several important factors that will increase performance in sites that use other usage styles. To properly explore each database strategy, specially designed tests were executed against a medium-size SharePoint server farm. The website performance statistics were recorded and compared to measure the effect of different configurations. The performance statistics showed a performance increase when site collections per database are limited to a specific amount. It was also discovered that large SharePoint content databases do not directly affect performance assuming three specific conditions are met. The third concept that was studied indicated that the implementation of external BLOB storage will increase performance assuming the average file size in the database is fairly large
- …