Search CORE

1,802 research outputs found

Pelabelan Klaster Artikel Ilmiah Menggunakan Topic Rank dan Maximum Common Subgraph

Author: Nurilham Adhi
Publication venue
Publication date: 20/07/2018
Field of study

Metode klasterisasi dapat memudahkan pengelompokkan artikel ilmiah. Pelabelan klaster diperlukan untuk mengetahui frasa kunci yang merepresentasikan topik bahasan kelompok artikel ilmiah. Beberapa klaster artikel ilmiah perlu digabung karena masih memiliki kemiripan topik untuk memberikan hasil label klaster yang lebih baik. Kemiripan topik dapat diwakili dengan kesamaan relasi kata yang dimodelkan dengan graf. Penelitian ini memiliki usulan metode pelabelan klaster artikel ilmiah dengan proses penggabungan klaster berdasarkan kesamaan struktur graf representasi klaster. Usulan metode terdiri dari : (1) Pengelompokkan artikel ilmiah menggunakan metode klasterisasi K-Means++. (2) Ekstraksi kandidat frasa menggunakan Frequent Phrase Mining (FPM). (3) Konstruksi graf menggunakan kata – kata pembentuk frasa sebagai vertex dan relasi kata sebagai edge berdasarkan Word2Vec. (4) Penggabungan klaster dengan pengukuran similaritas klaster berdasarkan struktur Maximum Common Subgraph (MCS). (5) Pelabelan klaster pada hasil penggabungan klaster menggunakan metode TopicRank. Usulan metode dievaluasi pada 2 dataset artikel ilmiah yang memiliki variasi tingkat pemisahan dan kohesi klaster. Koherensi topik digunakan sebagai pengukuran evaluasi untuk mengukur tingkat keterkaitan topik label klaster pada sebuah klaster. Hasil pengujian menunjukkan bahwa dataset yang memiliki tingkat pemisahan dan kohesi klaster yang tinggi (homogen) menghasilkan koherensi topik label klaster gabungan yang lebih tinggi. Penggunaan relasi kata co-occurrence pada pembuatan graf representasi klaster menghasilkan koherensi topik yang lebih baik dibandingkan relasi kata Word2Vec. Hal ini disebabkan oleh relasi kata co-occurrence berbasis frekuensi sehingga merepresentasikan topik mayoritas klaster. ========================================================================================================== Unstructured scientific articles can benefited by clustering method to group scientific articles based on topic similarity. Cluster labeling on the yielded cluster is required to discover key phrases that best represent the topics covered. Several clusters still need to be bundled because they still have similar topics to give better cluster labels results. In addition to word occurences, the similarity of the topic can also be represented by word semantic relation that can be modeled with the graph. This research proposes labeling clusters of scientific articles with cluster merging as research contribution to provide a more representative label of cluster topics. This research proposed cluster labeling method with cluster merging process using graph model. Graph model approach is choosen because it can map the relationship between words, hence representing text semantic information. There are several stages in the proposed method. First, K-Means++ clustering method is applied on a collection of scientific articles. Second, for each cluster, phrase extraction is executed using Frequent Phrase Mining to get word tokens that capable to constitute representative phrase for cluster topics. Acquired word tokens used as input to constructing graph representation of a cluster. After that, cluster merging is done based on cluster graph similarity using Maximum Common Subgraph (MCS) method. Then, the cluster labeling process is performed on clusters that have been merged using the TopicRank method. Proposed method evaluated on 2 dataset based on the merged cluster label topic coherence score, using Word2Vec-based graph model and co-occurence-based graph model. Result show that homogenous dataset 1 yield better result than heterogenous dataset 2. In addition, the use of co-occurence-based graph produce prefereable result on cluster merging process

ITS Repository

Embracing Visual Experience and Data Knowledge: Efficient Embedded Memory Design for Big Videos and Deep Learning

Author: Edstrom Jonathon
Publication venue: North Dakota State University
Publication date: 01/01/2019
Field of study

Energy efficient memory designs are becoming increasingly important, especially for applications related to mobile video technology and machine learning. The growing popularity of smart phones, tablets and other mobile devices has created an exponential demand for video applications in today?s society. When mobile devices display video, the embedded video memory within the device consumes a large amount of the total system power. This issue has created the need to introduce power-quality tradeoff techniques for enabling good quality video output, while simultaneously enabling power consumption reduction. Similarly, power efficiency issues have arisen within the area of machine learning, especially with applications requiring large and fast computation, such as neural networks. Using the accumulated data knowledge from various machine learning applications, there is now the potential to create more intelligent memory with the capability for optimized trade-off between energy efficiency, area overhead, and classification accuracy on the learning systems. In this dissertation, a review of recently completed works involving video and machine learning memories will be covered. Based on the collected results from a variety of different methods, including: subjective trials, discovered data-mining patterns, software simulations, and hardware power and performance tests, the presented memories provide novel ways to significantly enhance power efficiency for future memory devices. An overview of related works, especially the relevant state-of-the-art research, will be referenced for comparison in order to produce memory design methodologies that exhibit optimal quality, low implementation overhead, and maximum power efficiency.National Science FoundationND EPSCoRCenter for Computationally Assisted Science and Technology (CCAST

NDSU Libraries Institutional Repository

Resiliency in numerical algorithm design for extreme scale simulations

Author: Agullo Emmanuel
Altenbernd Mirco
Anzt Hartwig
Bautista Gomez Leonardo
Benacchio Tommaso
Publication venue: 'SAGE Publications'
Publication date: 01/12/2021
Field of study

This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling

Author: Abeyrathna K. Darshana
Bhattarai Bimal
Goodwin Morten
Gorji Saeed
Granmo Ole-Christoffer
Jiao Lei
Saha Rupsa
Yadav Rohan K.
Publication venue
Publication date: 01/01/2021
Field of study

Using logical clauses to represent patterns, Tsetlin Machines (TMs) have recently obtained competitive performance in terms of accuracy, memory footprint, energy, and learning speed on several benchmarks. Each TM clause votes for or against a particular class, with classification resolved using a majority vote. While the evaluation of clauses is fast, being based on binary operators, the voting makes it necessary to synchronize the clause evaluation, impeding parallelization. In this paper, we propose a novel scheme for desynchronizing the evaluation of clauses, eliminating the voting bottleneck. In brief, every clause runs in its own thread for massive native parallelism. For each training example, we keep track of the class votes obtained from the clauses in local voting tallies. The local voting tallies allow us to detach the processing of each clause from the rest of the clauses, supporting decentralized learning. This means that the TM most of the time will operate on outdated voting tallies. We evaluated the proposed parallelization across diverse learning tasks and it turns out that our decentralized TM learning algorithm copes well with working on outdated data, resulting in no significant loss in learning accuracy. Furthermore, we show that the proposed approach provides up to 50 times faster learning. Finally, learning time is almost constant for reasonable clause amounts (employing from 20 to 7,000 clauses on a Tesla V100 GPU). For sufficiently large clause numbers, computation time increases approximately proportionally. Our parallel and asynchronous architecture thus allows processing of massive datasets and operating with more clauses for higher accuracy.Comment: Accepted to ICML 202

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Agder University Research Archive

Recent Developments in Smart Healthcare

Author: Tie Qiu (Ed.)
Wenbing Zhao (Ed.)
Xiong Luo (Ed.)
Publication venue: MDPI - Multidisciplinary Digital Publishing Institute
Publication date: 01/01/2018
Field of study

Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

Directory of Open Access Books (DOAB)

Recommended from our members

Breaking Computational Barriers to Perform Time Series Pattern Mining at Scale and at the Edge

Author: Zimmerman Zachary Pierce
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Uncovering repeated behavior in time series is an important problem in many domains such as medicine, geophysics, meteorology, and many more. With the continuing surge of smart/embedded devices generating time series data, there is an ever growing need to perform analysis on datasets of increasing size. Additionally, there is an increasing need for analysis at low power edge devices due to latency problems inherent to the speed of light and the sheer amount of data being recorded. The matrix profile has proven to be a tool highly suitable for pattern mining in time series; however, a naive approach to computing the matrix profile makes it impossible to use effectively in both the cloud and at the edge. This dissertation shows how, through the use of GPUs and machine learning, the matrix profile is computed more feasibly, both at cloud-scale and at sensor-scale. In addition, it illustrates why both of these types of computation are important and what new insights they can provide to practitioners working with time series data

eScholarship - University of California