1,108 research outputs found

    Investigating Role of Data Mining in Software Engineering

    Get PDF
      Companies that focus on software development produce vast volumes of data. Every stage of software development, from gathering requirements to ongoing upkeep, generates its own set of data. To better the software, efforts are undertaken to collect and store data produced in software repositories. Data mining techniques are used to the massive amounts of data found in software repositories in order to extract previously unseen patterns and insights. Researchers from the fields of Software Engineering and Data Mining have lately made this area of study a top priority. This research aims to examine the many uses of data mining in software engineering, the many types of software engineering data that can be mined, and the many data mining techniques that are available and have been used by researchers to solve the problems that this research focuses on. The next step is to use this classification to determine which subfield within software engineering has the highest scholarly interest.   &nbsp

    Flexpop: A popularity-based caching strategy for multimedia applications in information-centric networking

    Get PDF
    Information-Centric Networking (ICN) is the dominant architecture for the future Internet. In ICN, the content items are stored temporarily in network nodes such as routers. When the memory of routers becomes full and there is no room for a new arriving content, the stored contents are evicted to cope with the limited cache size of the routers. Therefore, it is crucial to develop an effective caching strategy for keeping popular contents for a longer period of time. This study proposes a new caching strategy, named Flexible Popularity-based Caching (FlexPop) for storing popular contents. The FlexPop comprises two mechanisms, i.e., Content Placement Mechanism (CPM), which is responsible for content caching, and Content Eviction Mechanism (CEM) that deals with content eviction when the router cache is full and there is no space for the new incoming content. Both mechanisms are validated using Fuzzy Set Theory, following the Design Research Methodology (DRM) to manifest that the research is rigorous and repeatable under comparable conditions. The performance of FlexPop is evaluated through simulations and the results are compared with those of the Leave Copy Everywhere (LCE), ProbCache, and Most Popular Content (MPC) strategies. The results show that the FlexPop strategy outperforms LCE, ProbCache, and MPC with respect to cache hit rate, redundancy, content retrieval delay, memory utilization, and stretch ratio, which are regarded as extremely important metrics (in various studies) for the evaluation of ICN caching. The outcomes exhibited in this study are noteworthy in terms of making FlexPop acceptable to users as they can verify the performance of ICN before selecting the right caching strategy. Thus FlexPop has potential in the use of ICN for the future Internet such as in deployment of the IoT technology

    Improved Bounds on Information Dissemination by Manhattan Random Waypoint Model

    Full text link
    With the popularity of portable wireless devices it is important to model and predict how information or contagions spread by natural human mobility -- for understanding the spreading of deadly infectious diseases and for improving delay tolerant communication schemes. Formally, we model this problem by considering MM moving agents, where each agent initially carries a \emph{distinct} bit of information. When two agents are at the same location or in close proximity to one another, they share all their information with each other. We would like to know the time it takes until all bits of information reach all agents, called the \textit{flood time}, and how it depends on the way agents move, the size and shape of the network and the number of agents moving in the network. We provide rigorous analysis for the \MRWP model (which takes paths with minimum number of turns), a convenient model used previously to analyze mobile agents, and find that with high probability the flood time is bounded by O(NlogM(N/M)log(NM))O\big(N\log M\lceil(N/M) \log(NM)\rceil\big), where MM agents move on an N×NN\times N grid. In addition to extensive simulations, we use a data set of taxi trajectories to show that our method can successfully predict flood times in both experimental settings and the real world.Comment: 10 pages, ACM SIGSPATIAL 2018, Seattle, U

    A Literature Survey on Resource Management Techniques, Issues and Challenges in Cloud Computing

    Get PDF
    Cloud computing is a large scale distributed computing which provides on demand services for clients. Cloud Clients use web browsers, mobile apps, thin clients, or terminal emulators to request and control their cloud resources at any time and anywhere through the network. As many companies are shifting their data to cloud and as many people are being aware of the advantages of storing data to cloud, there is increasing number of cloud computing infrastructure and large amount of data which lead to the complexity management for cloud providers. We surveyed the state-of-the-art resource management techniques for IaaS (infrastructure as a service) in cloud computing. Then we put forward different major issues in the deployment of the cloud infrastructure in order to avoid poor service delivery in cloud computing

    How can SMEs benefit from big data? Challenges and a path forward

    Get PDF
    Big data is big news, and large companies in all sectors are making significant advances in their customer relations, product selection and development and consequent profitability through using this valuable commodity. Small and medium enterprises (SMEs) have proved themselves to be slow adopters of the new technology of big data analytics and are in danger of being left behind. In Europe, SMEs are a vital part of the economy, and the challenges they encounter need to be addressed as a matter of urgency. This paper identifies barriers to SME uptake of big data analytics and recognises their complex challenge to all stakeholders, including national and international policy makers, IT, business management and data science communities. The paper proposes a big data maturity model for SMEs as a first step towards an SME roadmap to data analytics. It considers the ‘state-of-the-art’ of IT with respect to usability and usefulness for SMEs and discusses how SMEs can overcome the barriers preventing them from adopting existing solutions. The paper then considers management perspectives and the role of maturity models in enhancing and structuring the adoption of data analytics in an organisation. The history of total quality management is reviewed to inform the core aspects of implanting a new paradigm. The paper concludes with recommendations to help SMEs develop their big data capability and enable them to continue as the engines of European industrial and business success. Copyright © 2016 John Wiley & Sons, Ltd.Peer ReviewedPostprint (author's final draft

    Project Final Report: HPC-Colony II

    Full text link
    This report recounts the HPC Colony II Project which was a computer science effort funded by DOE's Advanced Scientific Computing Research office. The project included researchers from ORNL, IBM, and the University of Illinois at Urbana-Champaign. The topic of the effort was adaptive system software for extreme scale parallel machines. A description of findings is included

    Comprehensive characterization of an open source document search engine

    Get PDF
    This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1
    corecore