Fred B. Bryant, Ph.D., Loyola University Chicago
Measurement involves the use of predetermined rules to assign numbers to categorize or quantify objects, events, or characteristics of people. In introductory statistics courses, students often learn that there are four basic types or levels of measurement in the health and psychosocial sciences. However, rarely do students understand why this knowledge matters—in other words, what difference does the type of measurement scale make, and why is it important to know the type of measurement scale you’re working with?
Below I describe the most commonly used framework for distinguishing among measurement scales, and I explain that the type of measurement scale matters because it determines the specific descriptive statistics and inferential statistical tests that are appropriate to use. I provide concrete examples of different types of measurement scales, describe the most commonly used descriptive statistics, and summarize guidelines for selecting the appropriate descriptive statistics to use with each type of scale.
In a forthcoming blog post, I will summarize guidelines for selecting the appropriate inferential statistics to use with each type of scale.
Four Main Types of Measurement Scales. Although researchers have developed a variety of conceptual frameworks over the years to categorize different types of measurement scales, arguably the most influential and well-known typology is that of Stanley S. Stevens, who published his ground-breaking classification scheme in a 1946 Science magazine article entitled, “On the Theory of Scales of Measurement.” In this article, Stevens argued that all scientific measurements reflect one of four different types of scales that he termed nominal, ordinal, interval, and ratio.
According to Stevens (1946), these four types of scales are distinguishable in terms of whether or not they possess each of three different measurement properties: (a) magnitude (that is, whether or not the numbers assigned to observations reflect varying amounts of the underlying variable being measured); (b) equal intervals (that is, whether or not the numerical differences between any two consecutive numbers on the measurement scale reflect equal differences in the amounts of the underlying variable being measured); and (c) an absolute-zero point (that is, whether or not there is an actual value on the measurement scale that truly reflects the complete absence of the underlying variable being measured).
1. Nominal (also known as “categorical”) measurement scales reflect the crudest form of qualitative measurement and involve using numbers simply to label, classify, or categorize observations of differing types. (The word “nominal” comes from the Latin noun nomen, which means “name.”) With a nominal scale, you cannot interpret the numbers assigned to observations as anything more than the names or labels for the things you are categorizing. Thus, nominal measurement scales have neither magnitude, equal intervals, nor an absolute zero-point.
As an illustration of a nominal measure of musical preference, for example, we might ask people to indicate their favorite style of music, and then keep track of their responses by coding them as follows: 1 = pop, 2 = rock, 3 = hip-hop, 4 = jazz, 5 = classical, 6 = country, 7 = heavy metal, 8 = gospel, 9 = R & B, 10 = punk, or 11 = other. Notice here that the numbers assigned to the different types of music lack magnitude—that is, they do not represent higher or lower amounts of an underlying characteristic, but simply reflect differences in type (or quality) rather than differences in degree (or quantity). Also, the numbers do not have equal intervals—for example, the difference between “rock” (2) and “pop” (1) [or, 2 – 1 = 1] is not in any meaningful way equivalent to the difference between, say, “country” (6) and “classical” (5) [or, 6 – 5 = 1]. Nor does this way of measuring musical preference provide an observable value that reflects the complete absence of a preference for any style of music.
2. Ordinal (also known as “ordered categorical”) measurement scales also use numbers to classify observations into categories, but the order of these numbers is meaningful in the sense that we can use them to rank observations in terms of the amount of the underlying characteristic. However, even though the numbers used in ordinal scales reflect relatively more or less of the underlying variable, they do not indicate how much more or how much less. Nor do ordinal scales include a value on the scale that reflects the total absence of the variable being measured. Thus, ordinal measurement scales possess magnitude, but they lack equal intervals and an absolute zero-point.
As an illustration of an ordinal measure of musical preference, for example, we might ask people to rank order a list of 10 different styles of music (pop, rock, hip-hop, jazz, classical, country, heavy metal, gospel, R & B, and punk), from their most favorite (1) to least favorite (10) musical style. Notice here that the numbers assigned to the different types of music reflect strength or magnitude of preference for each musical style—that is, lower numbers represent greater preference. However, the numbers do not have equal intervals—for example, the difference between a person’s preferences for their #1 and #2 ranked styles of music may well be much greater than the difference between their preference for their #9 and #10 ranked styles of music. Nor does this ordinal way of measuring musical preference provide an observable ranked value that reflects the complete absence of a preference for any style of music whatsoever.
3. Interval measurement scales use numbers that reflect the magnitude of the underlying variable the scale assesses. In addition, the numerical differences between any two consecutive numbers on the measurement scale reflect equal differences in the amounts of the underlying variable. However, interval scale measures do not provide an observable value that reflects the complete absence of the measured characteristic.
As an illustration of an interval measure of musical preference, for example, we might ask people to rate each of the 10 different styles of music, using a 7-point response scale where 1 is labeled “strongly prefer” and 7 is labeled “strongly dislike.” As with an ordinal scale, the numbers assigned to the different types of music reflect relative strength or magnitude of preference for each musical style. However, unlike an ordinal scale, the numbers on an interval scale imply equal differences in strength of preference between consecutive numbers across the full range of the 7-point interval. In other words, the numbers are presumed to have equal intervals—for example, the difference in amount of preference between ratings of 1 versus 2 is considered equivalent to the difference in amount of preference between ratings of 2 versus 3. However, this interval-scale way of measuring musical preference does not provide a rating that reflects the total absence of a preference for any musical style.
4. Ratio scales have all the measurement properties of an interval scale, as well as a specific value that indicates the complete absence of the characteristic that the scale assesses. Because it provides an absolute-zero point, a ratio scale produces numbers that you can meaningfully compare in terms of the absolute amount of the thing you are measuring. For instance, with a ratio scale, a number that is mathematically twice as large as another number reflects twice as much of whatever the scale measures. Thus, a ratio scale allows you to determine how many times greater one score is than another, whereas an interval scale only allows you to determine how far apart two scores are from each other.
As an illustration of a ratio measure of musical preference, for example, we might ask people to estimate the actual number of times they have listened to each of the 10 different styles of music during the past month. Notice that this way of measuring musical preference provides numbers that have not only magnitude and equal intervals, but also an absolute-zero point.
Table 1 summarizes the four types of measurement scales and the specific properties they possess.
Table 1. Four Types of Measurement Scales
Measurement Properties | Nominal | Ordinal | Interval | Ratio |
---|---|---|---|---|
Magnitude | NO | YES | YES | YES |
Equal Intervals | NO | NO | YES | YES |
Absolute-zero point | NO | NO | NO | YES |
Examples |
a. Gender b. Ethnicity c. Country in which you were born d. Religious preference e. Political affiliation f. Eye color g. Social security number h. Zip code i. Numbers on football players’ jerseys j. List of responses to the question, “What is your favorite food?” k. List of college majors l. Type of car one owns |
a. Class rank in high school b. Military rank c. Ranking of world’s top ten tennis players d. Measure of how often you get the hiccups using the following response-scale: 1 = rarely 2 = seldom 3 = occasionally 4 = sometimes 5 = often e. Measure of annual income using the following scale: 1 = less than $25,000 2 = $25,000 - $100,000 3 = more than $100,000 |
a. Time measured by an analog or digital clock. b. Page numbers in a book c. Shoe size. d. Temperature measured using the Celsius or Fahrenheit scale e. How often you get the hiccups, using a 1-to-7 scale (where 1 is labeled “rarely,” 7 is labeled “often,” and numbers 2-6 are not labeled) f. Floor numbers in an office building (but for high-rise buildings that skip the 13th floor, this scale would be ordinal) g. Score on the SAT or ACT |
a. Elapsed time measured by a stopwatch b. A weighing scale c. Height measured using a yardstick d. Amount of liquid in a drinking glass measured in ounces e. The balance in your checking account f. Number of hats that you own g. Resting pulse-rate in beats per minute h. Number of points a basketball team scored in their last game i. Number of pets you have owned in your life. |
Two Commonly Used Types of Descriptive Statistics. Having used some type of measurement scale to collect data, researchers can use statistics to describe and summarize a variety of different characteristics of their data. Perhaps the most frequently reported characteristics of data in the health and psychosocial sciences are central tendency and variability.
Central Tendency refers to the most common or typical score that lies in the middle of a distribution of scores. The three most often used measures of central tendency are the mean, median, and mode.
1. The mean (aka the “arithmetic average”), as many people know, is computed by adding together all of the observed scores in a distribution and then dividing this sum by the total number of scores in the distribution. However, this computational formula is not the conceptual definition of the mean. The definitional formula for the mean is that the mean is the one value for which Σ(x – mean) = 0. That is, if you subtract the mean from each score in the distribution and then add together all of these deviation-scores, the sum will be zero. In other words, the mean of a distribution is the one value for which (a) the sum of its distances from each of the scores above it, minus (b) the sum of its distances from each of the scores below it, exactly equals zero. Thus, the mean is a “balancing point” for the scores in a distribution, such that it precisely balances the sum of its distances from each of the scores above it and the sum of its distances from each of the scores below it.
Given this conceptual definition, notice that the measurement scale used to assess an underlying variable must have both magnitude (so that scores represent higher or lower amounts of the variable), as well as equal intervals (so that numerical differences between any two consecutive numbers on the scale reflect equal differences in amount) in order to use the mean. Otherwise, the mean of a set of scores will not be meaningfully interpretable as a “balancing point.” For this reason, it is appropriate to use the mean to describe central tendency only for interval and ratio measurement scales (which have magnitude and equal intervals), but it is not appropriate to use the mean for nominal or ordinal measurement scales (which lack magnitude). [Notice the mean does not require scales to have an absolute-zero point.]
2. The median is defined as the one value that splits a distribution of scores exactly in half, such that 50% of the cases lie above it and 50% of cases lie below it, when all of the scores are first arranged in order of magnitude (i.e., from lowest to highest, or from highest to lowest). Given this conceptual definition, notice that the measurement scale used to assess an underlying variable must have magnitude (so that scores represent higher or lower amounts of the variable) in order to use the median, but that the median does not require the measurement scale to have equal intervals. Whereas the mean balances (a) the sum of how far it is from the scores above it and (b) the sum of how far it is from the scores below it, the median balances (a) the total number of scores that lie above it and (b) the total number of scores that lie below it. For this reason, it is appropriate to use the median for all measurement scales that possess magnitude—that is, for ordinal, interval, and ratio scales—but it is not appropriate to use the median for nominal scales (which lack magnitude).
3. The mode is defined as the most frequently occurring score in a distribution (although it is possible for more than one score to be most frequent, in which case the distribution of scores in known as “multimodal”). Given this conceptual definition of the mode, notice that the measurement scale used to assess an underlying variable is not required to have either magnitude or equal intervals, in order to use the median. Thus, it is appropriate to use the mode for all four types of measurement scales—i.e., nominal, ordinal, interval, and ratio.
Variability refers to the degree to which the scores in a distribution deviate from one another, or how widely or narrowly spread out (or dispersed) scores are in value. The three most frequently used measures of variability are the range, variance, and standard deviation.
1. The range is the difference between the highest and lowest score in a distribution. Given this conceptual definition, notice that the measurement scale used to assess an underlying variable must have magnitude (so that scores represent higher or lower amounts of the variable) in order to use the range, but that the range does not require the measurement scale to have equal intervals. For this reason, it is appropriate to use the range for all measurement scales that possess magnitude—that is, for ordinal, interval, and ratio scales—but it is not appropriate to use the range for nominal scales (which lack magnitude).
2. The variance is an estimate of the typical distance that scores in a distribution are from their mean, when expressing these differences in squared units of measurement. Given that the mean is an essential ingredient in computing the variance, the measurement scale used to assess an underlying variable must have both magnitude (so that scores represent higher or lower amounts of the variable), as well as equal intervals (so that numerical differences between any two consecutive numbers on the scale reflect equal differences in amount) in order to use the variance. Otherwise, the variance of a set of scores will not be meaningfully interpretable as the size of their “typical squared deviation from the mean.” For this reason, it is appropriate to use the variance to describe variability only for interval and ratio measurement scales (which have magnitude and equal intervals), but it is not appropriate to use the variance for nominal or ordinal measurement scales (which lack magnitude). [Notice that the variance does not require scales to have an absolute-zero point.]
3. The standard deviation is simply the square root of the variance (and conversely, the variance is simply the standard deviation squared). In other words, the standard deviation is an estimate of the typical distance that scores in a distribution are from their mean, when expressing these differences in the original units of measurement. As with its squared counterpart (i.e., the variance), the standard deviation is based on the value of the mean—thus, it is appropriate to use the standard deviation to describe variability only for interval and ratio measurement scales (which have magnitude), but it is not appropriate to use the standard deviation for nominal or ordinal measurement scales (which lack magnitude).
The bottom line here is that different types of measurement scales require different descriptive statistics.
When analyzing interval or ratio scales, it is appropriate to use all three measures of central tendency (mean, median, and mode), as well as all three measures of variability (range, variance, and standard deviation).
When analyzing ordinal scales, it is appropriate to use the median and the mode as measures of central tendency (but not the mean), and the range as a measure of variability (but not the variance or the standard deviation).
When analyzing nominal scales, it is appropriate to use the mode as a measure of central tendency (but not the mean or the median), and it is inappropriate to use all three measures of variability (the range, variance, and standard deviation.
An appropriate measure of variability for use with nominal scales is the total number of nominal categories for which there is at least one observed response. For example, imagine we assess people’s favorite style of music using a nominal measure (where 1 = pop, 2 = rock, 3 = hip-hop, 4 = jazz, 5 = classical, 6 = country, 7 = heavy metal, 8 = gospel, 9 = R & B, 10 = punk) in two different samples of respondents—a group of 100 young adults (age 18-30), and a group of 100 senior citizens (age 70-90). We might hypothesize that the younger sample would show a greater variability in musical preference, compared to the older sample. Computing the total number of categories for which there is at least one observed response in each group, we find all 10 types of music received at least one endorsement in the younger sample, whereas only 4 types of music received at least one endorsement in the older sample (i.e., jazz, classical, country, and gospel).
Table 2 provides examples of the four types of measurement scales and summarizes the specific measures of central tendency and variability that are appropriate to use with each type of scale.
Table 2. Examples of Measurement Scales and the Proper Descriptive Statistics to Use with Them
Conceptual Variable | Example of Measurement Scale | Type of Scale | Appropriate Descriptive Statistic(s) to Use | |
---|---|---|---|---|
Central Tendency | Variability | |||
Political Preference | Which of the five leading presidential candidates would you vote for in the next presidential election? | Nominal | Mode | Total number of categories with at least one response |
Rank order the five leading presidential candidates in terms of your likelihood of voting for these candidates in the next presidential election. | Ordinal | Median, mode | Range | |
Rate how much you like each of the five leading presidential candidates, using a scale from 1 (dislike) to 10 (like). | Interval | Mean, median, mode | Range, variance, standard deviation | |
Rate each of the five leading presidential candidates in terms of the chance (from 0 – 100%) that you will vote for them in the next presidential election. | Ratio | Mean, median, mode | Range, variance, standard deviation | |
Frequency of temper tantrums in young children | How does your child typically react to not getting his or her way? (1 = accepts it; 2 = keeps asking me; 3 = has a temper tantrum; 4 = tries to bargain with me; 5 = other). | Nominal | Mode | Total number of categories with at least one response |
Rank order the frequency with which your child uses each of the following responses when the child does not get his or her way: accepts it; keeps asking me; has a temper tantrum; tries to bargain with me; other. | Ordinal | Median, mode | Range | |
Rate how often your child has a temper tantrum when the child does not get his or her way, using a scale from 1 (rarely) to 7 (often). | Interval | Mean, median, mode | Range, variance, standard deviation | |
How many temper tantrums has your child had in the past week? | Ratio | Mean, median, mode | Range, variance, standard deviation | |
Content of one’s earliest memory | Describe the content of your earliest childhood memory. (The researcher then develops a qualitative coding scheme for use in categorizing the general themes that respondents mention as their earliest memory.) | Nominal | Mode | Total number of categories with at least one response |
Rank order a provided list of various general themes, in terms of how early each theme exists in your memory. | Ordinal | Median, mode | Range | |
For each theme provided on a list, rate how early in your life you have a memory that involves the particular theme, using a scale from 1 (very early) to 7 (not very early). | Interval | Mean, median, mode | Range, variance, standard deviation | |
For each theme provided on a list, indicate the earliest age (in years) at which you have a memory that involves the particular theme. | Ratio | Mean, median, mode | Range, variance, standard deviation |