915 research outputs found
Katsir: A Framework for Harvesting Digital Libraries on the Web
The information era has brought with it the wellknown problem of \u27Information Explosion\u27. There are many and varied search engines on the Internet but it is still hard to locate and concentrate only on materials relevant to a specific task. Digital libraries, on the other hand, provide better services for focused discovery of relevant Web resources. However, digital libraries have been much less researched and implemented than search engines. The \u27Katsir/Harvest\u27 project laid the ground for our understanding that a new paradigm should to be developed - the Harvested Digital Library (HDL). The contribution of this article is in presenting a new framework and harvesting model for constructing HDLs. The open harvesting architecture proposed here uses advanced information retrieval tools and provides a set of integrated DL services to its users. This model and architecture are discussed throughout the article, including description of the implemented Katsir system and discussion of future research directions. The future DLs will be knowledge rich in the sense that each DL contains relevant meta-information on its domain and employs advanced knowledge management techniques
Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish
Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field
Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish
Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field.Fil: Tessore, Juan Pablo. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Esnaola, Leonardo Martín. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Baldassarri, Sandra Silvia. Universidad de Zaragoza; Españ
Islamic Economy Through Online Community (IEOC) Issues on Information Gathering & Storing
Knowledge Communities are communities of interest that come together to share
knowledge that affects performance. Knowledge Management envisions getting
the right information within the right context to the right person at the right time
for the right business purpose. Communities are more aware and concern of
sharing and transfer the knowledge. The rapid development of web technology
had made the World Wide Web an important and popular application platform for
disseminating and searching for information as well as conducting business. As a
huge source, World Wide Web has allowed unprecedented sharing of ideas and
information on a scale never seen before. The use of Web and its exponential
growth are now well known, and they are causing a revolution in the way people
use computers and perform daily tasks. Therefore Islamic Economy thru Online
Community [IEOC] intention was to proposed for an avenue of knowledge
sharing and experience for the community. Issue on Information Gathering and
Storing is discussed in this project paper where it concentrates on how data are
being managed and used. The target users of this website are among consumers
and business personnel. In developing the project, the methodology comprises of
four ( 4) phase: System Planning and Strategy , System Analysis and Design ,
System Implementation and System Testing. The tools used comprises of
Macromedia Dreamweaver MX 2004, Joomla Open Source, Apache Web Server
and PHP scripting language. In the end of this paper, conclusion and
recommendation part will discuss for future enhancement.
ii
Hybrid ant colony system algorithm for static and dynamic job scheduling in grid computing
Grid computing is a distributed system with heterogeneous infrastructures. Resource
management system (RMS) is one of the most important components which has great influence on the grid computing performance. The main part of RMS is the scheduler algorithm which has the responsibility to map submitted tasks to available resources. The complexity of scheduling problem is considered as a nondeterministic polynomial complete (NP-complete) problem and therefore, an intelligent algorithm is required to achieve better scheduling solution. One of the prominent intelligent algorithms is ant colony system (ACS) which is implemented widely to solve various types of scheduling problems. However, ACS suffers from stagnation problem in medium and large size grid computing system. ACS is based on exploitation and exploration
mechanisms where the exploitation is sufficient but the exploration has a deficiency. The exploration in ACS is based on a random approach without any strategy. This study proposed four hybrid algorithms between ACS, Genetic Algorithm (GA), and Tabu Search (TS) algorithms to enhance the ACS performance. The algorithms are ACS(GA), ACS+GA, ACS(TS), and ACS+TS. These proposed hybrid algorithms
will enhance ACS in terms of exploration mechanism and solution refinement by
implementing low and high levels hybridization of ACS, GA, and TS algorithms. The proposed algorithms were evaluated against twelve metaheuristic algorithms in static (expected time to compute model) and dynamic (distribution pattern) grid computing
environments. A simulator called ExSim was developed to mimic the static and dynamic nature of the grid computing. Experimental results show that the proposed algorithms outperform ACS in terms of best makespan values. Performance of ACS(GA), ACS+GA, ACS(TS), and ACS+TS are better than ACS by 0.35%, 2.03%, 4.65% and 6.99% respectively for static environment. For dynamic environment,
performance of ACS(GA), ACS+GA, ACS+TS, and ACS(TS) are better than ACS by 0.01%, 0.56%, 1.16%, and 1.26% respectively. The proposed algorithms can be used to schedule tasks in grid computing with better performance in terms of makespan
Simple identification tools in FishBase
Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further
development. It explores the possibility of a holistic and integrated computeraided strategy
- …