Search CORE

605 research outputs found

Automatically Discovering the Number of Clusters in Web Page Datasets

Author: Yao Zhongmei
Publication venue: eCommons
Publication date: 01/06/2005
Field of study

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the measure of average inter-cluster similarity reaches a constant of 1.7 when all our experiments produced the best results for clustering Web pages. We determine the number of clusters by using the constant as the stopping factor in our clustering process by arranging individual Web pages into clusters and then arranging the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant. Having the new method described in this paper together with our new Bidirectional Hierarchical Clustering algorithm reported elsewhere, we have developed a clustering system suitable for mining the Web

University of Dayton

Eventual Consistency: Origin and Support

Author: Bernabéu-Aubán José M.
García-Escrivá José-Ramón
González de Mendívil José Ramón
Muñoz-Escoí Francesc D.
Sendra-Roig Juan Salvador
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/11/2018
Field of study

Eventual consistency is demanded nowadays in geo-replicated services that need to be highly scalable and available. According to the CAP constraints, when network partitions may arise, a distributed service should choose between being strongly consistent or being highly available. Since scalable services should be available, a relaxed consistency (while the network is partitioned) is the preferred choice. Eventual consistency is not a common data-centric consistency model, but only a state convergence condition to be added to a relaxed consistency model. There are still several aspects of eventual consistency that have not been analysed in depth in previous works: 1. which are the oldest replication proposals providing eventual consistency, 2. which replica consistency models provide the best basis for building eventually consistent services, 3. which mechanisms should be considered for implementing an eventually consistent service, and 4. which are the best combinations of those mechanisms for achieving different concrete goals. This paper provides some notes on these important topics

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Corporate influence and the academic computer science discipline.

Author: Camille Akmut
Publication venue: 'Modern Language Association'
Publication date: 01/01/2022
Field of study

Prosopography of a major academic center for computer science

Humanities Commons

Research issues in real-time database systems

Author: Ulusoy O.
Publication venue: 'Elsevier BV'
Publication date: 01/11/1995
Field of study

Cataloged from PDF version of article.Today's real-time systems are characterized by managing large volumes of data. Efficient database management algorithms for accessing and manipulating data are required to satisfy timing constraints of supported applications. Real-time database systems involve a new research area investigating possible ways of applying database systems technology to real-time systems. Management of real-time information through a database system requires the integration of concepts from both real-time systems and database systems. Some new criteria need to be developed to involve timing constraints of real-time applications in many database systems design issues, such as transaction/query processing, data buffering, CPU, and IO scheduling. In this paper, a basic understanding of the issues in real-time database systems is provided and the research efforts in this area are introduced. Different approaches to various problems of real-time database systems are briefly described, and possible future research directions are discussed

Bilkent University Institutional Repository

Materialized views and data warehouses

Author: Adiba M. E.
Amir A.
Astrahan M. M.
Baekgraard Lars
Blakeley J. A.
Blakeley Jose A.
Buchheit M.
Chen C. M.
Chen Chungmin Melvin
Delis A.
Faloutsos C.
Finkelstein S.
Gray J.
Gupta Ashish
Gupta H.
Hanson E.
Hanson Eric N.
Hellerstein J. M.
Jensen C. S.
Jhingran A.
Larson A.
Mumick In Inderpal
Nick Roussopoulos
Papakonstantinou Y.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos N.
Roussopoulos Nick
Roussopoulos Nick
Sellis T.
Stamenas Antonios G.
Stonebraker M.
Valduriez Patrick
Zhuge Yue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Efficient Sampling Algorithms for Approximate Motif Counting in Temporal Graph Streams

Author: Jiang Wenjun
Li Yuchen
Tan Kian-Lee
Wang Jingjing
Wang Yanhao
Publication venue
Publication date: 22/11/2022
Field of study

A great variety of complex systems, from user interactions in communication networks to transactions in financial markets, can be modeled as temporal graphs consisting of a set of vertices and a series of timestamped and directed edges. Temporal motifs are generalized from subgraph patterns in static graphs which consider edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal network analysis. However, existing methods either cannot support temporal motifs or suffer from performance issues. Moreover, they cannot work in the streaming model where edges are observed incrementally over time. In this paper, we focus on approximate temporal motif counting via random sampling. We first propose two sampling algorithms for temporal motif counting in the offline setting. The first is an edge sampling (ES) algorithm for estimating the number of instances of any temporal motif. The second is an improved edge-wedge sampling (EWS) algorithm that hybridizes edge sampling with wedge sampling for counting temporal motifs with

3

vertices and

3

edges. Furthermore, we propose two algorithms to count temporal motifs incrementally in temporal graph streams by extending the ES and EWS algorithms referred to as SES and SEWS. We provide comprehensive analyses of the theoretical bounds and complexities of our proposed algorithms. Finally, we perform extensive experimental evaluations of our proposed algorithms on several real-world temporal graphs. The results show that ES and EWS have higher efficiency, better accuracy, and greater scalability than state-of-the-art sampling methods for temporal motif counting in the offline setting. Moreover, SES and SEWS achieve up to three orders of magnitude speedups over ES and EWS while having comparable estimation errors for temporal motif counting in the streaming setting.Comment: 27 pages, 11 figures; overlapped with arXiv:2007.1402

arXiv.org e-Print Archive

Scalable Storage for Digital Libraries

Author: Mather Paul
Publication venue
Publication date: 01/10/2002
Field of study

I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain

Computer Science Technical Reports @Virginia Tech