17 research outputs found

    Zipf’s Law in Passwords

    Get PDF
    Despite more than thirty years of research efforts, textual passwords are still enveloped in mysterious veils. In this work, we make a substantial step forward in understanding the distributions of passwords and measuring the strength of password datasets by using a statistical approach. We first show that Zipf\u27s law perfectly exists in real-life passwords by conducting linear regressions on a corpus of 56 million passwords. As one specific application of this observation, we propose the number of unique passwords used in regression and the slope of the regression line together as a metric for assessing the strength of password datasets, and prove it in a mathematically rigorous manner. Furthermore, extensive experiments (including optimal attacks, simulated optimal attacks and state-of-the-art cracking sessions) are performed to demonstrate the practical effectiveness of our metric. To the best of knowledge, our new metric is the first one that is both easy to approximate and accurate to facilitate comparisons, providing a useful tool for the system administrators to gain a precise grasp of the strength of their password datasets and to adjust the password policies more reasonably

    Exploring the Law of Numbers: Evidence from China's Real Estate

    Full text link
    The renowned proverb, Numbers do not lie, underscores the reliability and insight that lie beneath numbers, a concept of undisputed importance, especially in economics and finance etc. Despite the prosperity of Benford's Law in the first digit analysis, its scope fails to remain comprehensiveness when it comes to deciphering the laws of number. This paper delves into number laws by taking the financial statements of China real estate as a representative, quantitatively study not only the first digit, but also depict the other two dimensions of numbers: frequency and length. The research outcomes transcend mere reservations about data manipulation and open the door to discussions surrounding number diversity and the delineation of the usage insights. This study wields both economic significance and the capacity to foster a deeper comprehension of numerical phenomena.Comment: DS

    Going to great lengths in the pursuit of luxury:how longer brand names can enhance the luxury perception of a brand

    Get PDF
    Brand names are a crucial part of the brand equity and marketing strategy of any company. Research suggests that companies spend considerable time and money to create suitable names for their brands and products. This paper uses the Zipf's law (or Principle of Least Effort) to analyze the perceived luxuriousness of brand names. One of the most robust laws in linguistics, Zipf's law describes the inverse relationship between a word's length and its frequency i.e., the more frequently a word is used in language, the shorter it tends to be. Zipf's law has been applied to many fields of science and in this paper, we provide evidence for the idea that because polysyllabic words (and brand names) are rare in everyday conversation, they are considered as more complex, distant, and abstract and that the use of longer brand names can enhance the perception of how luxurious a brand is (compared with shorter brand names, which are considered to be close, frequent, and concrete to consumers). Our results suggest that shorter names (mono‐syllabic) are better suited to basic brands whereas longer names (tri‐syllabic or more) are more appropriate for luxury brands

    Paths to more effective personal information management

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 256-272).With the widespread availability of digital tools for storing, accessing, and sharing information, why is so much information still lost, forgotten, or kept on paper? The work in this thesis finds that such disorganization results from problems in the designs of the personal information management (PIM) tools in common use today. Such problems impede information capture, force many information forms to be left out, and cause information to be forgotten. How can these problems be mitigated? Our Information Scraps study identifies the need to support more diverse kinds of information, while conserving time, attention, and memory for retained information items. Our first approach to achieving these goals is to eliminate the artificial separation and homogeneity that structured PIM tools impose, so that arbitrary information can be captured in any way desired. A two-year study of List-it, our short-note-taking tool, discovers that people keep notes serving 5 primary roles: reminders, reference items, progress trackers, places to think, and archives of personal value. The second reintroduces structured data to support more effective use and management of information collections. Jourknow addresses the manageability of large note collections with lightweight-structured note contents and contextual retrieval, the access of notes by the contexts and activities at the time of creation. Poyozo reinforces recollection of previously seen information, by providing visualizations of all of a person's past information activities. Finally, Atomate addresses the challenge of managing the ever-increasing deluge of new information, by letting people delegate to software behaviors actions to be automatically taken when new information arrives. These studies identify critical needs of PIM tools and offer viable solutions.by Max Goodwin Van Kleek.Ph.D

    Predicting Employee Performance Using Text Data from Resumes

    Get PDF
    Text analytics using term frequency was proposed as an extension of biodata for predicting job performance and addressing criticisms of biodata and predictor methods—that they do not identify the constructs they are measuring or their predictive elements. Linguistic Inquiry and Word Count software was used to analyze and sort text into validated categories. Prolific Academic was used to recruit full-time workers who provided a copy of their resume and were assessed on impression management (IM), cognitive ability, and job performance. Predictive analyses used resumes with 100+ words (n = 667), whereas correlational analyses used the full sample (N = 809). Third-person plural pronouns, impersonal pronouns, sadness words, certainty words, non-fluencies, and colons emerged as significant predictors of job performance (χ2 = 26.01 (10), p = .006). As hypothesized, impersonal pronouns were positively correlated with self-oriented IM (r = .07, p \u3c .05), and first-person singular pronouns were positively correlated with other-oriented IM (r = .07, p \u3c .05), however, first-person plural pronouns were negatively correlated (r = -.07, p \u3c .05). Pronouns and verbs were not predictive of job performance. Positive and negative emotion words did not show hypothesized relationships to OCBs, CWBs, or job performance. Finally, differentiation words (r = .09, p \u3c .01), conjunctions (r = .28, p \u3c .01), words longer than six characters (r = .29, p \u3c .01), prepositions (r = .20, p \u3c .01), cognitive process words (r = .19, p \u3c .01), causal words (r = .20, p \u3c .01), and insight words (r = .06, p \u3c .05) correlated with cognitive ability, but did not predict job performance. An exploratory regression analysis in which cognitive ability as measured by the Spot-The-Word Test (β = .10, p \u3c .05) and a composite of cognitive ability created from text analytics (β = .15, p \u3c .05) both uniquely and significantly predicted job performance (F(1,805) = 18.79, p \u3c .001), demonstrating that word categories can serve as a proxy for cognitive ability. Overall, the method of text analytics sidesteps some of the limitations of biodata predictor methods, while demonstrating the potential to automate resume reviews and mitigate unconscious bias inherent in human judgment

    Mission-Critical Communications from LMR to 5G: a Technology Assessment approach for Smart City scenarios

    Get PDF
    Radiocommunication networks are one of the main support tools of agencies that carry out actions in Public Protection & Disaster Relief (PPDR), and it is necessary to update these communications technologies from narrowband to broadband and integrated to information technologies to have an effective action before society. Understanding that this problem includes, besides the technical aspects, issues related to the social context to which these systems are inserted, this study aims to construct scenarios, using several sources of information, that helps the managers of the PPDR agencies in the technological decisionmaking process of the Digital Transformation of Mission-Critical Communication considering Smart City scenarios, guided by the methods and approaches of Technological Assessment (TA).As redes de radiocomunicações são uma das principais ferramentas de apoio dos órgãos que realizam ações de Proteção Pública e Socorro em desastres, sendo necessário atualizar essas tecnologias de comunicação de banda estreita para banda larga, e integra- las às tecnologias de informação, para se ter uma atuação efetiva perante a sociedade . Entendendo que esse problema inclui, além dos aspectos técnicos, questões relacionadas ao contexto social ao qual esses sistemas estão inseridos, este estudo tem por objetivo a construção de cenários, utilizando diversas fontes de informação que auxiliem os gestores destas agências na tomada de decisão tecnológica que envolve a transformação digital da Comunicação de Missão Crítica considerando cenários de Cidades Inteligentes, guiado pelos métodos e abordagens de Avaliação Tecnológica (TA)

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    Information-seeking behaviour at Kuwait University

    Get PDF
    Information technology is constantly changing, and if academic users are to make best use of these resources, they must sustain efficient information-seeking behaviour. This study explores the information-seeking behaviour of graduate students at Kuwait University, and investigates the factors influencing that behaviour. The population also includes faculty members engaged in teaching and supervising graduate students, and academic librarians. Adopting Wilson's information-seeking model (1999) as the theoretical framework, the study identifies factors influencing graduate students' information behaviour and formulates hypotheses that illustrate the relationship between the different variables. The use of this model provides useful insights into determinants of the information-seeking behaviour patterns of students in a multidisciplinary graduate context. The research uses a mixed method approach, comprising questionnaire survey, focus groups and semi-structured interviews. Application of the Critical Incident Technique method provided in-depth data about the patterns of information-seeking behaviour of both graduate students and faculty members. Logistic regression revealed that significant factors related to library awareness, information literacy, organisational and environmental issues, source characteristics, and demographics act as determinants of the patterns of students' information-seeking behaviour. Uneasiness on the part of graduate students towards using the library and consulting its personnel reflects a broader negative perception of the role of the library in shaping students' information-searching patterns. The clearest finding that emerged from the analysis of the students' information literacy dimension was that the majority of graduate students still face difficulties in finding the appropriate information resources, particularly when using resources that need advanced search strategies. Both quantitative and qualitative analyses revealed a heavy reliance on the information resources that require least effort (search engines, Internet websites, and personal contacts). Further, results revealed that graduate students are overwhelmed by an information overload, which leads them to become anxious about finding the appropriate information resources. Based on the results of the research, recommendations are made to further explore the information-seeking behaviour patterns of graduate students in order to enhance their information literacy skills. Improving information-seeking behaviour and enhancing the information literacy of students require interventions on various fronts: faculty members, academic librarians, the university administration, and graduate students themselves.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    The Contemporary Face of Transnational Criminal Organizations and the Threat they Pose to U.S. National Interest: A Global Perspective.

    Get PDF
    Traditional organized crime groups have consistently posed issues for la w enforcement; however, the contemporary TCOs present an even greater security risk and threat. TCOs thrive in countries with a weak rule of law and present a great threat to regional security in many parts of the world. Bribery and corruption employed by these groups further serve to destabilize already weak governments. These TCOs also present a major threat to U.S. and world financial systems by exploiting legitimate commerce, and in some cases creating parallel markets (“Transnational Organized,” 2011) . Finally, one of the most significant threats posed by contemporary TCOs is their alliances and willingness to work with terrorist and extremist organizations. This paper will focus on contemporary TCOs by giving a brief overview of the most common criminal enterprises associated with these groups, the nexus between various TCOs, the nexus between TCOs and terrorist and extremist groups, case studies highlighting the nexus, and the threats they pose to U.S. national interests

    User modeling servers - requirements, design, and evaluation

    Get PDF
    Softwaresysteme, die ihre Services an Charakteristika individueller Benutzer anpassen haben sich bereits als effektiver und/oder benutzerfreundlicher als statische Systeme in mehreren Anwendungsdomänen erwiesen. Um solche Anpassungsleistungen anbieten zu können, greifen benutzeradaptive Systeme auf Modelle von Benutzercharakteristika zurück. Der Aufbau und die Verwaltung dieser Modelle wird durch dezidierte Benutzermodellierungskomponenten vorgenommen. Ein wichtiger Zweig der Benutzermodellierungsforschung beschäftigt sich mit der Entwicklung sogenannter ?Benutzermodellierungs-Shells?, d.h. generischen Benutzermodellierungssystemen, die die Entwicklung anwendungsspezifischer Benutzermodellierungskomponenten erleichtern. Die Bestimmung des Leistungsumfangs dieser generischen Benutzermodellierungssysteme und deren Dienste bzw. Funktionalitäten wurde bisher in den meisten Fällen intuitiv vorgenommen und/oder aus Beschreibungen weniger benutzeradaptiver Systeme in der Literatur abgeleitet. In der jüngeren Vergangenheit führte der Trend zur Personalisierung im World Wide Web zur Entwicklung mehrerer kommerzieller Benutzermodellierungsserver. Die für diese Systeme als wichtig erachteten Eigenschaften stehen im krassen Gegensatz zu denen, die bei der Entwicklung der Benutzermodellierungs-Shells im Vordergrund standen und umgekehrt. Vor diesem Hintergrund ist das Ziel dieser Dissertation (i) Anforderungen an Benutzermodellierungsserver aus einer multi-disziplinären wissenschaftlichen und einer einsatzorientierten (kommerziellen) Perspektive zu analysieren, (ii) einen Server zu entwerfen und zu implementieren, der diesen Anforderungen genügt, und (iii) die Performanz und Skalierbarkeit dieses Servers unter der Arbeitslast kleinerer und mittlerer Einsatzumgebungen gegen die diesbezüglichen Anforderungen zu überprüfen. Um dieses Ziel zu erreichen, verfolgen wir einen anforderungszentrierten Ansatz, der auf Erfahrungen aus verschiedenen Forschungsbereichen aufbaut. Wir entwickeln eine generische Architektur für einen Benutzermodellierungsserver, die aus einem Serverkern für das Datenmanagement und modular hinzufügbaren Benutzermodellierungskomponenten besteht, von denen jede eine wichtige Benutzermodellierungstechnik implementiert. Wir zeigen, dass wir durch die Integration dieser Benutzermodellierungskomponenten in einem Server Synergieeffekte zwischen den eingesetzten Lerntechniken erzielen und bekannte Defizite einzelner Verfahren kompensieren können, beispielsweise bezüglich Performanz, Skalierbarkeit, Integration von Domänenwissen, Datenmangel und Kaltstart. Abschließend präsentieren wir die wichtigsten Ergebnisse der Experimente, die wir durchgeführt haben um empirisch nachzuweisen, dass der von uns entwickelte Benutzermodellierungsserver zentralen Performanz- und Skalierbarkeitskriterien genügt. Wir zeigen, dass unser Benutzermodellierungsserver die vorbesagten Kriterien in Anwendungsumgebungen mit kleiner und mittlerer Arbeitslast in vollem Umfang erfüllt. Ein Test in einer Anwendungsumgebung mit mehreren Millionen Benutzerprofilen und einer Arbeitslast, die als repräsentativ für größere Web Sites angesehen werden kann bestätigte, dass die Performanz der Benutzermodellierung unseres Servers keine signifikante Mehrbelastung für eine personalisierte Web Site darstellt. Gleichzeitig können die Anforderungen an die verfügbare Hardware als moderat eingestuft werden
    corecore