Search CORE

474,386 research outputs found

Toward autonomic distributed data mining using intelligent web services.

Author: Ramaswamy Padmanabhan, 1976-
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2005
Field of study

This study defines a new approach for building a Web Services based infrastructure for distributed data mining applications. The proposed architecture provides a roadmap for autonomic functionality of the infrastructure hiding the complexity of implementation details and enabling the user with a new level of usability in data mining process. Web Services based infrastructure delivers all required data mining activities in a utility-like fashion enabling heterogeneous components to be incorporated in a unified manner. Moreover, this structure allows the implementation of data mining algorithms for processing data on more than one source in a distributed manner. The purpose of this study is to present a simple, but efficient methodology for determining when data distributed at several sites can be centralized and analyzed as data from the same theoretical distribution. This analysis also answers when and how the semantics of the sites is influenced by distribution in data. This hierarchical framework with advanced and core Web Services improves the current data mining capability significantly in terms of performance, scalability, efficiency, transparency of resources, and incremental extensibility

University of Louisville

M-Grid: Similarity Searching in Grids

Author: Batko Michal
Dohnal Vlastislav
Zezula Pavel
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

The problem of similarity searching is nowadays attracting a lot of attention, because upcoming applications process complex data and the traditional exact match searching is not sufficient. There are efficient solutions, but they are tailored for the needs of specific data domains. General solutions, based on the metric space abstraction, are extensible, but they are designed to operate on a single computer only. Therefore, their scalability is limited and they cannot adapt to different performance requirements. In this paper, we propose a distributed access structure which is fully dynamic and exploits a Grid infrastructure. We study properties of this structure in numerous experiments. Besides, the performance tuning is analyzed with respect to user-specific requirements which include the maximum response time and the number of queries executed concurrently.The problem of similarity searching is nowadays attracting a lot of attention, because upcoming applications process complex data and the traditional exact match searching is not sufficient. There are efficient solutions, but they are tailored for the needs of specific data domains. General solutions, based on the metric space abstraction, are extensible, but they are designed to operate on a single computer only. Therefore, their scalability is limited and they cannot adapt to different performance requirements. In this paper, we propose a distributed access structure which is fully dynamic and exploits a Grid infrastructure. We study properties of this structure in numerous experiments. Besides, the performance tuning is analyzed with respect to user-specific requirements which include the maximum response time and the number of queries executed concurrently

Univerzitní repozitář Masarykovy univerzity

Knowledge extraction from raw data in water networks: application to the Barcelona supramunicipal water transport network

Author: Espín Santiago
García Valverde Diego
Pascual Pañach Josep
Puig Cayuela Vicenç
Quevedo Casín Joseba Jokin
Roquet Jaume
Saludes Closa Jordi
Valero Fernando
Publication venue
Publication date: 01/01/2015
Field of study

Critical Infrastructure Systems (CIS) such as the case of potable water transport network are complex large-scale systems, geographically distributed and decentralized with a hierarchical structure, requiring highly sophisticated supervisory and real-time control (RTC) schemes to ensure high performance achievement and maintenance when conditions are non-favorable due to e.g. sensor malfunctions (drifts, offsets, problems of batteries, communications problems,...). Once the data are reliable, a process to transform these validated data into useful information and knowledge is key for the operating plan in real time (RTC). And moreover, but no less important, it allows extracting useful knowledge about the assets and instrumentation (sectors of pipes and reservoirs, flowmeters, level sensors, ...) of the network for short, medium and large term management plans. In this work, an overall analysis of the results of the application of a methodology for sensor data validation/reconstruction to the ATLL water network in the city of Barcelona and the surrounding metropolitan area since 2008 until 2013 is described. This methodology is very important for assessing the economic and hydraulic efficiency of the network.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Ontology-Based Queries over Cancer Data

Author: Alejandra Gonzalez-Beltran
Publication venue
Publication date: 16/12/2010
Field of study

The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such as caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources’ metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing

Crossref

Nature Precedings

Efficient Search in Unbalanced, Randomized Peer-To-Peer Search Trees

Author: Aberer Karl
Publication venue
Publication date: 13/07/2005
Field of study

Scalable mechanisms to support efficient key-based search in distributed systems are an important part of the infrastructure of peer-to-peer systems and global information systems. They received substantial attention both in information and communication systems research. A particularly important class of approaches is based on a principle of scalable distribution of binary search trees that has been introduced by Plaxton \cite{PLAXTON}. When adapting the shape of such a tree search structure to the data distribution in order to obtain load balancing, the search trees may become highly unbalanced. We show that for P-Grid, a Plaxton-like distributed search structure that we first introduced in \cite{PGRID1}, the expected communication cost for searches is strictly limited by

\log(n)

where

n

is the number of peers. This result is completely independent of the shape of the underlying tree. The approach exploits the randomization principle of the P-Grid structure by virtue of its decentralized and randomized construction process

Infoscience - École polytechnique fédérale de Lausanne

IntegraTUM : information services and university IT governance

Author: Bode Arndt
Publication venue
Publication date: 01/01/2007
Field of study

Universities of the 21st century heavily depend on an efficient IT infrastructure for teaching, research and administration. E-Learning environments, blended learning and all sorts of multimedia and cooperative environments are important requirements for teaching at universities and for further education. Many of the organizational structures such as continuous examinations, interdisciplinary studies, ECTS system and many more require efficient examination administration systems as well as room and personnel management. Research is based on Internet inquiries, eScience, eLibrary and other IT supported media. Research results must be documented and archived in a digital way and results must be distributed and marketed through the Internet. The efficient administration of all kinds of resources of the university must be planned using management support systems. Decisions of university heads must be prepared from well documented statistics and analysis software. In the past, many of the applications named above for teaching, research and administration have been performed by separate software applications and run in distributed environments of universities. Powerful server structures and networking features as well as new software technology like service-oriented architectures make it necessary to recentralize the IT services of the university after a long period of decentralization. Based on metadirectories and unified access procedures, all of the software components must be integrated into a seamless IT infrastructure. To guarantee consistency, data must not be stored in a redundant way. Project IntegraTUM of Technische Universität München started in 2003 and is an umbrella project to define such a seamless IT infrastructure for a university with 22.000 students and approximately 10.000 staff. The talk describes the project, which besides the definition of new technology is based on a fundamental process analysis of the university and many changes in the organizational structure

Hochschulschriftenserver - Universität Frankfurt am Main

Migration from client/server architecture to internet computing architecture

Author: Ahmad Fauzan
Publication venue: RIT Scholar Works
Publication date: 01/01/2000
Field of study

The Internet Computing Architecture helps in providing a object-based infrastructure that can be used by the application developers to design, develop, and deploy the ntiered enterprise applications and services. For years of distributed application development, the Internet Computing Architecture has helped in providing various techniques and infrastructure software for the successful deployment of various systems, and established a foundation for the promotion of re-use and component oriented development. Object-oriented analysis is at the beginning of this architecture, which is carried through deploying and managing of finished systems. This architecture is multi-platform, multi-lingual, standards-based, and open that offers unparalleled integration capability. And for the development of mission critical systems in record time it has allowed for the reuse of the infrastructure components. This paper provides a detailed overview of the Internet Computing Architecture and the way it is applied to designing systems which can range from simple two-tier applications to n-tier Web/Object enterprise systems. Even for the best software developers and managers it is very hard to sort through alternative solutions in today\u27s business application development challenges. The problems with the potential solutions were not that complex now that the web has provided the medium for large-scale distributed computing. To implement an infrastructure for the support of applications architecture and to foster the component-oriented development and reuse is an extraordinary challenge. Further, to scale the needs of large enterprises and the Web/Internet the advancement in the multi-tiered middleware software have made the development of object-oriented systems more difficult. The Internet Computing Architecture defines a scaleable architecture, which can provide the necessary software components, which forms the basis of the solid middleware foundation and can address the different application types. For the software development process to be component-oriented the design and development methodologies are interwoven. The biggest advantage of the Internet Computing Architecture is that developers can design object application servers that can simultaneously support two- and three-tier Client/Server and Object/Web applications. This kind of flexibility allows different business objects to be reused by a large number of applications that not only supports a wide range of application architectures but also offers the flexibility in infrastructure for the integration of data sources. The server-based business objects are managed by runtime services with full support for application to be partitioned in a transactional-secure distributed environment. So for the environments that a supports high transaction volumes and a large number of users this offers a high scaleable solution. The integration of the distributed object technology with protocols of the World Wide Web is Internet Computing Architecture. Alternate means of communication between a browser on client machine and server machines are provided by various web protocols such as Hypertext Transfer Protocol and Internet Inter-ORB Protocol [NOP]. Protocols like TCP/IP also provides the addressing protocols and packetoriented transport for the Internet and Intranet communications. The recent advancements in the field of networking and worldwide web technology has promoted a new network-centric computing structure. World Wide Web evolves the global economy infrastructure both on the public and corporate Internet\u27s. The competition is growing between technologies to provide the infrastructure for distributed large-scale applications. These technologies emerge from academia, standard activities and individual vendors. Internet Computing Architecture is a comprehensive, open, Network-based architecture that provides extensibility for the design of distributed environments. Internet Computing Architecture also provides a clear understanding to integrate client/server computing with distributed object architectures and the Internet. This technology also creates the opportunity for a new emerging class of extremely powerful operational, collaboration, decision support, and e-commerce solutions which will catalyze the growth of a new networked economy based on intrabusiness, business -to-business (B2B) and business-to-consumer (B2C) electronic transactions. These network solutions would be able to incorporate legacy mainframe systems, emerging applications as well as existing client/server environment, where still most of the world\u27s mission-critical applications run. Internet Computing Architecture is the industry\u27s only cross-platform infrastructure to develop and deploy network-centric, object-based, end-to-end applications across the network. Open and de facto standards are at the core of the Internet computing architecture such as: Hyper Text Transfer Protocol (HTTP)/ Hyper Text Markup Language (HTML)/ Extensible Markup Language (XML) and Common Object Request Broker Architecture (CORBA). It has recognition, as the industry\u27s most advanced and practical technology solution for the implementation of a distributed object environment, including Interface Definition Language (IDL) for languageneutral interfaces and Internet Inter Operability (MOP) for object interoperability. Programming languages such as JAVA provides programmable, extensible and portable solutions throughout the Internet Computing Architecture. Internet Computing Architecture not only provides support, but also enhances ActiveX/Component Object Model (COM) clients through open COM/CORBA interoperability specifications. For distributed object-programming Java has also emerged as the de facto standard within the Internet/Intranet arena, making Java ideally suited to the distributed object nature of the Internet Computing Architecture. The portability that it offers across multi-tiers and platforms support open standards and makes it an excellent choice for cartridge development across all tiers

RIT Scholar Works

Application Development using Compositional Performance Analysis

Author: Rifkin Adam
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

A parallel programming archetype [Cha94, CMMM95] is an abstraction that captures the common features of a class of problems with a similar computational structure and combines them with a parallelization strategy to produce a pattern of dataflow and communication. Such abstractions are useful in application development, both as a conceptual framework and as a basis for tools and techniques. The efficiency of a parallel program can depend a great deal on how its data and tasks are decomposed and distributed. This thesis describes a simple performance evaluation methodology that includes an analytic model for predicting the performance of parallel and distributed computations developed for multicomputer machines and networked personal computers. This analytic model can be supplemented by a simulation infrastructure for application writers to use when developing parallel programs using archetypes. These performance evaluation tools were developed with the following restricted goal in mind: We require accuracy of the analytic model and simulation infrastructure only to the extent that they suggest directions for the programmer to make the appropriate optimizations. This restricted goal sacrifices some accuracy, but makes the tools simpler and easier to use. A programmer can use these tools to design programs with decomposition and distribution specialized to a given machine configuration. By instantiating a few architecture-based parameters, the model can be employed in the performance analysis of data-parallel applications, guiding process generation, communication, and mapping decisions. The model is language-independent and machine-independent; it can be applied to help programmers make decisions about performance-affecting parameters as programs are ported across architectures and languages. Furthermore, the model incorporates both platform-specific and application-specific aspects, and it allows programmers to experiment with tradeoffs better than either strictly simulation-based or purely theoretical models. In addition, the model was designed to be simple. In summary, this thesis outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis - decomposing a parallel program into its modular parts and analyzing their respective performances - to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes (e.g., mesh and mesh-spectral), using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers)

CiteSeerX

Caltech Authors