• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 



• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 

You Can’t Judge a Measure by Its Label: Teaching the Process of Instrumentation


Continued article from The Behavioral Measurement Letter, Vol. 9, No. 1: Spring 2006


Jennifer Howard Brockway

United State Air force Academy


Fred Bryant

Loyola University Chicago


In this article, we describe an exercise in instrument selection applicable to both undergraduate and graduate courses. The exercise consists of 5 progressive steps that involve choosing and defining a theoretical construct (Steps 1 and 2), using computer-based technology to obtain 2 distinct instruments that measure this construct (Steps 3 and 4), and comparing and contrasting these 2 instruments along multiple dimensions (Step 5). This activity generates awareness of the issues involved in measuring latent constructs and teaches at least 3 lessons: (a) The 1st step to accurate instrumentation is precisely conceptualizing the construct of interest; (b) there is more than I way to measure latent constructs, and these multiple approaches should be compared; and (c) it is crucial to measure a construct in a way that best matches its underlying definition.

A difficult concept for many students to grasp is the notion that there is no one universally appropriate measure for any given psychological construct. Typically, several good (reliable and valid) instruments already exist for most constructs with which psychology students are familiar. Thus, an important skill to teach students is the ability to locate these measures and to choose the most appropriate one (Brockway & Bryant, 1997). According to Brewer et al. (1993), it is better to teach such abstract methodological issues through hands-on exercises as opposed to traditional lecture format. Although instructors have suggested several excellent classroom exercises with respect to creating new psychological measures (e.g., Benjamin, 1983; Davidson, 1987), instructors have not offered exercises to teach students how to locate, evaluate, compare, and select the most appropriate measure from a database of existing measures. The following exercise helps to familiarize students with the variety of measures that exist for any one construct and enables students to compare these different instruments to make better informed choices. The exercise includes five progressive steps: Steps 1 and 2 are part of a single assignment outside class, Steps 3 and 4 take place together during an out-of-class library session, and Step 5 takes the form of an in-class discussion or presentation. This exercise is appropriate for use in a variety of under-graduate psychology classes, including research methods, tests and measurements, and various psychology laboratories in which students collect and analyze data. Graduate students may also benefit from this exercise by including a more rigorous analysis, critique, and comparison of existing measures. At Loyola University Chicago we have incorporated this activity into an undergraduate social psychology laboratory course and a graduate research methods course. The following example comes from the undergraduate course.


Process of Instrumentation Procedure

This activity is best preceded by a review of psychological constructs and the inherent difficulty of measuring phenomena that are intangible, dynamic, and subject to multiple definitions. For this exercise to be effective, students must understand the difference between conceptual and operational definitions.


Step 1: Choose a Psychological Construct

After discussing the difficulties surrounding psychological measurement, students choose a psychological construct of interest. (Multiple students can select the same construct.) A list of constructs previously covered in the class may help to get students thinking along the right lines. Ideally, students should select a construct for which a number of available measures exist. Possibilities include altruism, aggression, perceived control, anxiety, and compliance. We use the construct guilt as an example for purposes of presentation.


Step 2: Create a Working Definition of the Construct

Next, the instructor asks students to conceptually define their construct thoroughly and carefully. Students generate their own conceptual definition of the construct, based on personal experience and intuition. Thus, one may define guilt as, “The negative emotion experienced after knowingly misbehaving” or “Feelings of distress when one’s social position is better than others.” The purpose of this step is to emphasize the development of a clear, precise conceptual definition before selecting an operational definition (i.e., instrument). Thus, if guilt is one’s construct, it is important to distinguish it from shame, embarrassment, and regret, for example.


Step 3: Use an Instrument File to Generate List of Existing Measures

After ensuring that students have precisely defined their construct, instructors ask students to utilize their library’s measurement database to generate a list of existing measures of their construct. A powerful new measurement database called the Health and Psychosocial Instrument (HaPI) File (1995) is particularly useful in this regard. HaPI provides information about thousands of behavioral and social measures through abstracted descriptions summarizing instrument characteristics (e.g., intended audience, validity and reliability information, means of obtaining copies). HaPI is available in hundreds of college libraries both in the United States and internationally.

The HaPI File has distinct advantages over traditional catalogs of instruments. First, HaPI provides a larger number of measures (more than 40,000) than other sources. HaPI also provides a more efficient means of managing the volume of measurement information that exists. We believe the HaPI File provides a more thorough and cost-effective measurement tool than other measurement volumes.

To use HaPI, students simply type in the name of their construct and generate a list of corresponding references. For instance, for the guilt construct, HaPI generated a list of 73 instruments that the originators of these measures described as assessing some form of guilt.


Step 4: Choose Two Measures With Distinct Conceptual Definitions

From the list of measures, students choose two references for distinct instruments and obtain these articles from their library. One article should define the construct in a way that resembles the students’ conceptual definition. The other article should define the construct in a way that is different from the students’ definition.

From the list of guilt measures, for example, we chose two articles with different theoretical orientations (Kugler & Jones, 1992; Montada & Schneider, 1989). Whereas the first article approached guilt from a macrolevel (sociological, cultural) perspective, the second article assessed guilt from a more microlevel (psychological) perspective. We chose these measures not because they are the best instruments, but because of the contrast between their conceptual and operational definitions. The greater the distinction between the two chosen measures, the easier it will be to complete the final step of the exercise. Although our students have experienced little difficulty in selecting two different instruments to compare and contrast, instructors may want to be available if students need assistance with this critical step of the exercise.


Step 5: Compare and Contrast Alternative Instruments

After ensuring that students have found distinct measures, ask students to make an in-class presentation comparing and contrasting the measures on various dimensions generated either by the instructor or by the students and the instructor. Besides each measure’s overall strengths and weaknesses, other dimensions could include:

  1. Theoretical orientation (e.g., social justice vs. psychological conceptualizations of guilt);
  2. Duration and frequency of construct manifestation (e.g., state vs. trait guilt);
  3. General format of the instrument (e.g., reactive vs. unobtrusive measures of guilt, vignettes vs. self-report questions, closed- vs. open-ended items);
  4. Intended audience (e.g., children, English speaking adults, etc.);
  5. Number of items and scaling issues (e.g., single item vs. composite index, Likert vs. semantic differential response format).

For example, Montada and Schneider (1989) defined existential guilt as a prosocial emotion felt when one perceives oneself as better off than others suffering hardships. To measure guilt, Montada and Schneider embedded three guilt items within a larger questionnaire designed to tap other “prosocial emotions” such as sympathy and moral outrage. The general form of the measure is a written scenario describing the misery of a group of disadvantaged people (e.g., the unemployed). Respondents use a 6-point scale to rate the degree to which three statements reflecting guilt express their thoughts and feelings. The instrument’s intended audience seems to be at least the age of young adults because it assumes respondents possess some moral awareness concerning the status of disadvantaged populations. Although the instrument appears in English, Montada and Schneider developed it using a German sample, so it is unclear whether it is applicable to other populations.

Taking a very different approach, Kugler and Jones’s (1992) measure is a guilt inventory. Ninety-eight items, presented on 5-point scales, tap three specific content domains (trait guilt, state guilt, and moral standards). Both college students and adult nonstudents have completed this inventory.


Evidence of Pedagogical Effectiveness

To assess the effectiveness of this exercise, 10 undergraduates enrolled in a social psychology laboratory course answered open-ended questions addressing several goals of the exercise both before and after completing the exercise. Results revealed that before the exercise, only 1 student (10%) knew that the first step in measuring a construct is to carefully create a conceptual definition, whereas all 10 students (100%) gave this correct answer after the exercise (Fisher’s exact p = .00006). Also, before the exercise, only 1 student (10%) stated that when confronted with three equally valid and reliable measures, one should choose the measure that most closely matches the conceptual definition, unlike 7 of 10 students (70%) at the posttest (Fisher’s exact p = .0099).

As a control condition, a comparable group of 6 laboratory students completed pretest and posttest measures, but did not participate in the measurement exercise. Results revealed no significant changes in knowledge from pretest to posttest regarding the crucial first step in selecting measures (Fisher’s exact p = .50) and how to choose from among three psychometrically equivalent measures (Fisher’s exact p = .23). Although the size and representativeness of these samples are far from ideal, these data nevertheless supports the effectiveness of the exercise.


Additional Suggestions

Instructors can simplify or expand this exercise to fit a particular time slot, lesson plan, or student population. For example, instructors can eliminate the library portion of the exercise and simply supply students with the results of the HaPI database search for existing measures for a particular construct and with a copy of two preselected articles and instruments. Instructors may then ask students to compare and contrast the two measures with respect to the dimensions (or a subset of those dimensions) discussed previously.

Conversely, instructors can expand this activity to include issues more appropriate for advanced psychology students. For example, students can compare and contrast instruments with respect to the various validities (i.e., face, construct, criterion, content) and reliabilities (i.e., test-retest, parallel forms, interrater, internal consistency). Also, advanced students can locate instruments tapping similar (yet conceptually separate) constructs and highlight the subtle distinctions between the constructs (e.g., guilt vs. shame vs. embarrassment). This approach offers instructors a concrete means of teaching students about the multitrait-multimethod matrix (Campbell & Fiske, 1959) and how to implement it. Finally, and perhaps the most challenging task of all, instructors can ask students to uncover the “missing measure” after reviewing all existing instruments. By highlighting both the gaps and overlaps in current measurement options, future researchers can begin to distinguish between areas that need new instruments and areas in which new measures are unnecessary.



This measurement exercise teaches students several important lessons. First, it stresses the importance of having a clear conceptual definition before choosing an instrument to measure a psychological construct. Second, it teaches students that there almost always is more than one way to measure a construct and that they should compare these multiple approaches. Third, it teaches students the importance of measuring a construct in a way that best matches one’s underlying definition. Finally, it introduces students to new information technologies that they can use to locate, evaluate, compare, and select the best measures. Indeed, maximizing the match between conceptual and operational definitions is the essence of construct validity (Cook & Campbell, 1979). Thus, it is important to teach students that you can’t judge a book (ie., a measure) by its cover (i.e., its label).



Benjamin, L. T., Jr. (1983). A class exercise in personality and psychological assessment. Teaching of Psychology, 10, 94-95.

Brewer, C. L., Hopkins, J. R., Kimble, G. A., Matlin, M. W., McCann, L. 1., McNeil, O. V., Nodine, B. F., Quinn, V. N., & Saundra. (1993). Curriculum. In T. V. McGovern (Ed.), Handbook for enhancing undergraduate education in psychology (pp. 161-182). Washington, DC: American Psychological Association.

Brockway, J. H., & Bryant, F. B. (1997). Spotlighting the second fiddle: The importance of teaching measurement. Manuscript submitted for publication.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand McNally.

Davidson, W. B. (1987). Undergraduate lab project in personality assessment: Measurement of anal character. Teaching of Psychology, 14, 101-103.

Health and Psychosocial Instrument (HaPI) File (Version 4.1) [CD-ROM]. (1995). Pittsburgh, PA: Behavioral Measurement Database Services [Producer and Distributor].

Kugler, K., & Jones, W. H. (1992). On conceptualizing and assessing guilt. Journal of Personality and Social Psychology, 62, 318-327.

Montada, L., & Schneider, A. (1989). Justice and emotional reactions to the disadvantaged. Social Justice Research, 3, 313-344.



  1. An earlier version of this article was presented at the American Psychological Society Institute on the Teaching of Psychology, San Francisco, June 1996.
  2. Correspondence concerning this article should be sent to Jennifer Howard Brockway, HQ USAFA DFBL, 2354 Fairchild Drive, Suite 6L47, USAFA, CO 80840-6228; e-mail: dibl@usafa.af.mil.


Read additional articles from this newsletter:

Measurement Instruments at Your Fingertips

Selecting Instruments for Behavioral Research: Advice for the Intermediate User



Subscribe to our Newsletter Today

Stay up to date! Newsletters sent out quarterly.

Copyright © 2024 BMDS |  All Rights Reserved

Design: LDS