Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very sparse data matrices. This article addresses some of the issues using a multistage confirmatory factor analytic approach. The approach is illustrated using data from a performance test in medicine for which examinees encounter multiple patients with medical problems (tasks), with each problem portrayed by a different trained patient (rater). A series of models was fit to rating data (1) to obtain alternative task difficulty and discrimination parameters and (2) to evaluate the observed improvement in the goodness of model fit due to accounted rater and test site effects. The results suggest that availability of alternative task parameter estimates can be useful in practice for making decisions related to task banking, rater training, and test assembly.