3 Things to Look for When Assessing Assessments

There are literally hundreds (if not thousands) of assessments that are available to organizations. With all of those assessments, it is quite easy to make a mistake and use an assessment that may not only be poor but may also cause harm to the workplace. To help you choose a good assessment, there are really only three things you need to consider. These things are reliability, validity, and appropriateness.

#1 Any assessment, be it a measure of employee engagement, job performance, or personality type, must be reliable. Briefly, reliability is the consistency of measurement. If you get on a scale and it says you weigh 135, and then you get on it three minutes later and it says you weigh 170, it’s probably not a very good scale. The same thing applies to assessments. If you take an assessment on Tuesday and it says you are an introvert, and then you take that same assessment the following Friday and it says you are an extrovert, it probably isn’t a good assessment. Although there are several types of reliability, the two most common are test-retest and internal consistency. Test-retest is pretty straightforward. Like the example above, it is the degree of consistency you could expect if you gave an assessment on, say, a Tuesday, and then gave the same test to the same people on the following Friday. The more consistent the responses to all of the questions, the more reliability you have.

The second type of reliability is internal consistency, which is also known as Cronbach’s alpha, coefficient alpha, or just simply alpha. As you can imagine from the example above, it is a bit of a hassle to get employees to take the same test twice. So, internal consistency is simply a formula that estimates what the reliability would be if it were given twice. That formula is based on having employees take the test once, dividing it in half, and then calculating the reliability. Doing this for all possible split halves and coming up with an overall average is alpha.
Although both estimates of reliability are good, be aware that, all things being equal, test-retest reliability tends to be a bit lower than internal consistency estimates. And, be aware that a good level of reliability is at least .70. Anything lower than .70 and the assessment may not be reliable enough to be useful.

#2 Another thing to consider when assessing an assessment is the validity of an assessment. Put simply, validity is how well the assessment measures what it claims to measure. For instance, if an assessment claims to measure job satisfaction, how do you know that it really measures job satisfaction? Things like job satisfaction, employee engagement, personality, and many other job-related concepts are abstract, meaning they are not tangible — you cannot touch them, feel them, or see them. To demonstrate validity, scale developers typically provide “evidence of validity” by showing that their measure is related to other measures of the same concept. Scale developers also typically provide “evidence of validity” by showing that their scale is related to other concepts that should be related to it. For job satisfaction, that might be lower absenteeism, lower turnover, and higher job performance.

#3 Finally, the third thing you should look for when assessing assessments is the purpose, or the appropriateness, of the assessment for its intended use. Setting aside for a moment as to whether it is reliable and valid (because there are serious questions as to if it is either), the Meyers Briggs Type Indicator (MBTI) is a personality assessment that is often used to construct teams and/or to determine if certain employees should work together. Yet, there is no evidence that the instrument was developed for those purposes, nor is there any evidence that employees who have similar profiles work better together than those who do not. In fact, one could argue that having a team comprised of diverse profiles may actually be more creative and productive than teams with the same profiles. The bottom line in all of this is that even if the assessment you are using has reliability and validity, make sure that the business application of it is appropriate for your situation.

When determining the viability of an assessment, you should be able to find information on all of three things (reliability, validity, and appropriateness) in a Test Manual. A Test Manual includes the background and development of an assessment as well evidence of reliability and validity. It may also include the use (or uses) of the assessment, but if it does not, then you must decide if it is appropriate for your intended use. To help you make that decision, pay particular attention to the background and development of the assessment. If the assessment was developed to assess mental instability, and you want to use it for creating work teams, it is probably not appropriate. And, if the assessment does not have a Test Manual readily available for your review, then I would strongly caution you not to use it.