Although Classical Test Theory has been used by the measurement community for almost a century, Item Response Theory has become commonplace for educational assessment development, evaluation and refinement in recent decades. Its potential for improving test items as well as eliminating the ambiguous or misleading ones is substantial. However, in order to estimate its parameters and produce reliable results, IRT requires a large sample size of examinees, thus limiting its use to large-scale testing programs. Nevertheless, the accuracy of parameter estimates becomes of lesser importance when trying to detect items whose parameters exceed a threshold value. Under this consideration, the present study investigates the application of IRT-based assessment evaluation to small sample sizes through a series of simulations. Additionally, it introduces a set of quality indices, which exhibit the success rate of identifying potentially flawed items in a way that test developers without a significant statistical background can easily comprehend and utilize