4 research outputs found

    The hiring problem and its algorithmic applications

    Get PDF
    The hiring problem is a simple model for on-line decision-making under uncertainty, recently introduced in the literature. Despite some related work dates back to 2000, the name and the first extensive studies were written in 2007 and 2008. The problem has been introduced explicitly first by Broder et al. in 2008 as a natural extension to the well-known secretary problem. Soon afterwards, Archibald and Martínez in 2009 introduced a discrete (combinatorial) model of the hiring problem, where the candidates seen so far could be ranked from best to worst without the need to know their absolute quality scores. This thesis introduces an extensive study for the hiring problem under the formulation given by Archibald and Martínez, explores the connections with other on-line selection processes in the literature, and develops one interesting application of our results to the field of data streaming algorithms. In the hiring problem we are interested in the design and analysis of hiring strategies. We study in detail two hiring strategies, namely hiring above the median and hiring above the m-th best. Hiring above the median hires the first interviewed candidate then any coming candidate is hired if and only if his relative rank is better than the median rank of the already hired staff, and others are discarded. Hiring above the m-th best hires the first m candidates in the sequence, then any coming candidate is hired if and only if his relative rank is larger than the m-th best among all hired candidates, and others are discarded. For both strategies, we were able to obtain exact and asymptotic distributional results for various quantities of interest (which we call hiring parameters). Our fundamental parameter is the number of hired candidates, together with other parameters like waiting time, index of last hired candidate and distance between the last two hirings give us a clear picture of the hiring rate or the dynamics of the hiring process for the particular strategy under study. There is another group of parameters like score of last hired candidate, score of best discarded candidate and number of replacements that give us an indicator of the quality of the hired staff. For the strategy hiring above the median, we study more quantities like number of hired candidates conditioned on the first one and probability that the candidate with score q is getting hired. We study the selection rule 1/2-percentile rule introduced by Krieger et al., in 2007, and the seating plan (1/2,1) of the Chinese restaurant process (CRP) introduced by Pitman, which are very similar to hiring above the median. The connections between hiring above the m-th best and the notion of m-records, and also the seating plan (0,m) of the CRP are investigated here. We report preliminary results for the number of hired candidates for a generalization of hiring above the median; called hiring above the alpha-quantile (of the hired staff). The explicit results for the number of hired candidates enable us to design an estimator, called RECORDINALITY, for the number of distinct elements in a large sequence of data which may contain repetitions; this problem is known in the literature as cardinality estimation problem. We show that another hiring parameter, the score of best discarded candidate, can also be used to design a new cardinality estimator, which we call DISCARDINALITY. Most of the results presented here have been published or submitted for publication. The thesis leaves some open questions, as well as many promising ideas for future work. One interesting question is how to compare two different strategies; that requires a suitable definition of the notion of optimality, which is still missing in the context of the hiring problem. We are also interested in investigating other variants of the problem like probabilistic hiring strategies, that is when the hiring criteria is not deterministic, unlike all the studied strategies

    Hiring above the m-th best candidate: a generalization of records in permutations

    Get PDF
    The hiring problem is a simple model of on-line decision- making under uncertainty. As in many other such models, the input is a sequence of instances and a decision must be taken for each instance depending on the subsequence examined so far, while nothing is known about the future. One famous example of on-line decision-making the secretary problem, formally introduced in the early sixties. Broder et al. (2008) introduced the hiring problem as an extension of the secretary problem. Instead of selecting only one candidate, we are looking for selecting (hiring) many candidates to grow up a small company. In this context, a hiring strategy should meet two demands: to hire candidates at some reasonable rate and to improve the average quality of the hired staff. Soon afterwards, Archibald and Martinez (2009) introduced a discrete model of the hiring problem where candidates seen so far could be ranked from best to worst without the need to know their absolute quality scores. Hence the sequence of candidates could be modeled as a random permutation. Two general families of hiring strategies were introduced: hiring above the m-th best candidate and hiring in the top P % quantile (for instance, P = 50 is hiring above the median). In this paper we consider only hiring above the m-th best candidate. We introduce new hiring parameters that describe the dynamics of the hiring process, like the distance between the last two hirings, and the quality of the hired staff;, like the score of the best discarded candidate. While Archibald and Martínez made systematic use of analytic combinatorics techniques (Flajolet, Sedgewick, 2008) in their analysis, we use here a different approach to study the various hiring parameters related associated to the hiring process. We are able to obtain explicit formulas for the probability dis- tribution or the probability generating function of the random variables of interest in a rather direct way. The explicit nature of our results also allows a very detailed study of their asymptotic behaviour. Adding our new results to those of Archibald and Martínez leads to a very precise quantitative characterization of the hiring above the m-th best candi- date strategy. This might prove very useful in applications of the hiring process, e.g., in data stream algorithms.Postprint (published version

    Affirmative sampling: theory and applications

    Get PDF
    Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the “diversity” (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.This work has been supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science & Innovation MCIN/AEI/10.13039/501100011033, and by Princeton University, and its Department of Computer Science.Peer ReviewedPostprint (published version
    corecore