915 research outputs found

    Katsir: A Framework for Harvesting Digital Libraries on the Web

    Get PDF
    The information era has brought with it the wellknown problem of \u27Information Explosion\u27. There are many and varied search engines on the Internet but it is still hard to locate and concentrate only on materials relevant to a specific task. Digital libraries, on the other hand, provide better services for focused discovery of relevant Web resources. However, digital libraries have been much less researched and implemented than search engines. The \u27Katsir/Harvest\u27 project laid the ground for our understanding that a new paradigm should to be developed - the Harvested Digital Library (HDL). The contribution of this article is in presenting a new framework and harvesting model for constructing HDLs. The open harvesting architecture proposed here uses advanced information retrieval tools and provides a set of integrated DL services to its users. This model and architecture are discussed throughout the article, including description of the implemented Katsir system and discussion of future research directions. The future DLs will be knowledge rich in the sense that each DL contains relevant meta-information on its domain and employs advanced knowledge management techniques

    Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish

    Get PDF
    Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field

    Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish

    Get PDF
    Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field.Fil: Tessore, Juan Pablo. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Esnaola, Leonardo Martín. Universidad Nacional del Noroeste de la Pcia.de Bs.as.. Escuela de Tecnologia. Instituto de Investigacion y Transferencia En Tecnologia. - Comision de Investigaciones Cientificas de la Provincia de Buenos Aires. Instituto de Investigacion y Transferencia En Tecnologia.; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; ArgentinaFil: Baldassarri, Sandra Silvia. Universidad de Zaragoza; Españ

    Islamic Economy Through Online Community (IEOC) Issues on Information Gathering & Storing

    Get PDF
    Knowledge Communities are communities of interest that come together to share knowledge that affects performance. Knowledge Management envisions getting the right information within the right context to the right person at the right time for the right business purpose. Communities are more aware and concern of sharing and transfer the knowledge. The rapid development of web technology had made the World Wide Web an important and popular application platform for disseminating and searching for information as well as conducting business. As a huge source, World Wide Web has allowed unprecedented sharing of ideas and information on a scale never seen before. The use of Web and its exponential growth are now well known, and they are causing a revolution in the way people use computers and perform daily tasks. Therefore Islamic Economy thru Online Community [IEOC] intention was to proposed for an avenue of knowledge sharing and experience for the community. Issue on Information Gathering and Storing is discussed in this project paper where it concentrates on how data are being managed and used. The target users of this website are among consumers and business personnel. In developing the project, the methodology comprises of four ( 4) phase: System Planning and Strategy , System Analysis and Design , System Implementation and System Testing. The tools used comprises of Macromedia Dreamweaver MX 2004, Joomla Open Source, Apache Web Server and PHP scripting language. In the end of this paper, conclusion and recommendation part will discuss for future enhancement. ii

    Hybrid ant colony system algorithm for static and dynamic job scheduling in grid computing

    Get PDF
    Grid computing is a distributed system with heterogeneous infrastructures. Resource management system (RMS) is one of the most important components which has great influence on the grid computing performance. The main part of RMS is the scheduler algorithm which has the responsibility to map submitted tasks to available resources. The complexity of scheduling problem is considered as a nondeterministic polynomial complete (NP-complete) problem and therefore, an intelligent algorithm is required to achieve better scheduling solution. One of the prominent intelligent algorithms is ant colony system (ACS) which is implemented widely to solve various types of scheduling problems. However, ACS suffers from stagnation problem in medium and large size grid computing system. ACS is based on exploitation and exploration mechanisms where the exploitation is sufficient but the exploration has a deficiency. The exploration in ACS is based on a random approach without any strategy. This study proposed four hybrid algorithms between ACS, Genetic Algorithm (GA), and Tabu Search (TS) algorithms to enhance the ACS performance. The algorithms are ACS(GA), ACS+GA, ACS(TS), and ACS+TS. These proposed hybrid algorithms will enhance ACS in terms of exploration mechanism and solution refinement by implementing low and high levels hybridization of ACS, GA, and TS algorithms. The proposed algorithms were evaluated against twelve metaheuristic algorithms in static (expected time to compute model) and dynamic (distribution pattern) grid computing environments. A simulator called ExSim was developed to mimic the static and dynamic nature of the grid computing. Experimental results show that the proposed algorithms outperform ACS in terms of best makespan values. Performance of ACS(GA), ACS+GA, ACS(TS), and ACS+TS are better than ACS by 0.35%, 2.03%, 4.65% and 6.99% respectively for static environment. For dynamic environment, performance of ACS(GA), ACS+GA, ACS+TS, and ACS(TS) are better than ACS by 0.01%, 0.56%, 1.16%, and 1.26% respectively. The proposed algorithms can be used to schedule tasks in grid computing with better performance in terms of makespan

    Veröffentlichungen und Vorträge 2002 der Mitglieder der Fakultät für Informatik

    Get PDF

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy
    corecore