• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 



• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 

Measurement and Methodological Issues in Minority Aging Research

Continued article from the The Behavioral Measurement Letter, Vol. 7, No.1 Winter 2002


Mildred Ramirez, Marvella Ford, and Anita L. Stewart


The racial, ethnic, and age composition of the U.S. population is changing rapidly. By the year 2050, nearly half will be nonwhite and 20 percent will consist of people aged 65 and older. Despite these demographic trends, ethnic minorities, older adults, women, and those with lower socioeconomic status (SES) have been underrepresented in epidemiologic and clinical studies (Ferketich, Phillips, & Verran, 1993; Forsythe & Gage, 1994; Furnham & Malik, 1994; Anderson, Aaronson, & Wilkin, 1993; Iwata & Salto, 1993). Accordingly, the National Institutes of Health now mandates the inclusion of women and minorities in research. This mandate is significant because without adequate representation of all population groups, valid generalization of results to members of these populations is not possible.

Each racial and ethnic group has unique cultural characteristics, including its own values, norms, and attitudes, (Marin et al., 1995; Marin & Perez-Stable, 1995; Ferketich et al., 1993; Devore & Schlesinger, 1987), and thus measures developed with nonminority populations may not be valid for them. Hence, it is imperative to consider, for each of these groups, whether existing measures are relevant, appropriate, reliable, and valid. Still, although the importance of cultural validity in measurement has been recognized by many researchers, (Ferketich et al.; Forsythe & Gage, 1994; Furnham & Malik, 1994; Anderson et al., 1993; Iwata & Salto, 1993), it remains common practice to apply standard measures to racial and ethnic minority groups and to lower SES groups without prior investigation of their psychometric properties for these populations.

This article describes briefly some of the issues regarding measurement in diverse groups. A more in-depth treatment of these issues as they relate to assessment of health can be found in Stewart and Napoles-Springer (in press); such measurement issues in general will be discussed at length in a forthcoming issue of the Journal of Mental Health and Aging (in press) edited by Skinner, Teresi, Holmes, Stahl, and Stewart that will focus entirely on assessment in minority populations.


Resource Centers for Minority Aging Research (RCMAR)

The National Institute on Aging, the National Institute of Nursing Research, and the Office of Research in Minority Health jointly created a program (a) to decrease disparities in geriatric and gerontological research among racial, ethnic and SES groups, and (b) to increase the number of minority researchers working in these areas. Six Resource Centers for Minority Aging Research (RCMAR) and a Coordinating Center were subsequently funded to address these overarching goals (see Stahl, in press). These centers are: (a) the Center for the Active Life of Minority Elders (CALME) at Columbia Presbyterian Medical Center, New York City; (b) the Center for Aging in Diverse Communities at The University of California-San Francisco; (c) the Center on Minority Aging at the University of North Carolina Chapel Hill; (d) the Michigan Center for Urban African American Aging Research (MCUAAR) operated jointly by Wayne State University and the University of Michigan; (e) the Native Elder Research Center at the University of Colorado Health Sciences Campus; (f) the Resource Center for African American Aging Research operated by the Henry Ford Health System in Detroit; and (g) the RCMAR Coordinating Center at the Medical University of South Carolina. While each of the Centers has its own mission and special goals, all Centers share a common goal, “…to decrease the minority/ nonminority differential in health and its social sequelae for older people by focusing research upon health promotion, disease prevention, and disability prevention.” (Resource Centers for Minority Aging Research, 1997).

In order to address issues specifically pertinent to measurement with older minority populations, each RCMAR has a Measurement and Methods Core, with specific goals and areas of interest. Some of the Cores focus on both quantitative and qualitative research, while others concentrate primarily on quantitative research. Topic areas vary across sites and range from mental health to physical health, focusing, for example, on constructs such as depression, anxiety, cognition, and religiosity, and on diseases such as cancer, heart disease, and diabetes. (More specific information on the RCMARS may be found at http://rcmar.musc.edu.)


Sources of Cultural Bias in Measurement

Although substantial differences along various dimensions have been observed among racial, ethnic, and SES groups, (Berkanovic & Telesky, 1985; Angel & Thoits, 1987; Raczynski et al., 1994; Johnson et al., 1996; Osmond, Vranizan, Schillinger, Stewart, & Bindman, 1996; Shetterley, Baxter, Mason, & Hamman, 1996), it is uncertain whether these observed differences are true differences or a result of cultural bias in the measures or methods used, (Fullerton, Wallace, & Concha-Garcia, 1993). With the increasingly widespread recognition of racial, ethnic, and class bias and insensitivity in measurement instruments and methods, there is growing demand for the validation of existing measures using samples of minority group members, and for establishing the cross-ethnic equivalence of assessment tools, (Anderson et al., 1993; Chwalow, 1995; Knight, Virdin, Ocampo, & Roosa, 1994; Bullinger, Anderson, Cella, & Aaronson, 1993; Sullivan et al., 1995). The Advisory Panel on Alzheimer’s Disease, for example, specifically calls for the development and validation of screening instruments and methods that will be effective in identifying Alzheimer’s across various ethno-cultural groups. (Advisory Panel on Alzheimer’s Disease, 1993).

Any discussion of sources of measurement bias should include item structure, the criteria used in developing a measure, and error introduced by the interviewer and/or the respondent. (Teresi & Holmes, 1997). With regard to item structure, specific definitions and operationalizations of constructs, and the wording of particular items may have different cultural valences across different racial, ethnic, or SES groups, that is, they may not hold the same meaning for instrument designers, raters/ interviewers, and respondents from various ethnic/racial/SES backgrounds. (Rogler, 1989). For example, many African Americans refer to diabetes as “sugar” and to hypertension as “high blood.” Stevens, Kumanyika and Keil (1994) found that African American women, in response to the question of whether they were overweight, were less likely than Caucasian women to perceive themselves as being overweight despite the fact that the prevalence of obesity is twice as high among African American women as it is among Caucasian women. (Stevens et al., 1994). Cultural differences in the meaning of the term “overweight” and attitudes about the acceptability of being overweight, therefore, may well account for systematic response differences between African Americans and Caucasians to items concerning weight and body image.

Measurement bias may also be introduced in instrument administration. For example, raters/ interviewers who come from different racial/ ethnic/SES backgrounds than do the individuals being rated/interviewed may respond to cues incorrectly or in ways different than intended, or they may simply misinterpret information, leading to spurious results. (It should be noted here, too, that in addition to measurement bias, nonrepresentativeness of research samples is often cited as a factor that perpetuates racial/ ethnic/class bias in research.) (Abebimpe, 1994; Dohrenwend, 1975).

Regardless of source, measurement error, including that due to bias, has implications not only for research findings, where it might lead to erroneous results, and for epidemiological research, where measurement error may produce biased estimates of prevalence and magnitude of risk factors, but also for the development of public policies and for service delivery. Failure to account for inter- and/or intra-group differences in designing, administering, or delivering social services leads to ineffectiveness. The presumption of social or cultural homogeneity perpetuates inaccurate cultural stereotypes and thereby hinders the delivery of quality social and healthcare services to racial/ethnic minorities. Lack of fit between public policy and real-world conditions, and between client needs and services delivered, are the inevitable end results where there is cultural bias in instrument design or administration, and where instruments or methods used in data gathering are insensitive to racial/ethnic differences.


Measurement Issues in Research on Aging and Minorities

Within the research community, racial/ethnic bias has been identified as a methodological issue requiring careful examination. (See Teresi & Holmes (1994) for an overview of methodological issues related to comparison of measures across subgroups.) For example, Gibson (1991), using latent variable confirmatory factor analysis, examined racial differences in the structure of and measurements made with six self-reports of health widely used in studies involving older adults. The three elements of self-reported health Gibson examined in Americans Changing Lives (House, 1986) were disease, disability, and subjective interpretation of health status. Findings showed that the form of the model had an overall acceptable fit for both the African American and Caucasian samples, indicating that, in this instance, disease, disability, and subjective interpretations of health status derive from a single latent construct, internal health state, in both groups. However, racial differences were seen in parts of the three-element model, suggesting that there are cultural and racial differences in self reports of disease, disability, and health status (Gibson, 1991). For example, subjective interpretation of health was found to be a better measure of health status for Caucasians than for African Americans, while number of chronic conditions, as an indicator of disease, was found to be a better measure of health status for African Americans than for Caucasians.

Furthermore, application of a single model to all minority populations essentially ignores not only intergroup differences, but potential intra-group variations as well (e.g., between Mexican-Americans and Puerto Ricans within the larger Latino population, among various tribal groupings within the Native American population). Such intragroup variations have received even less attention than intergroup differences in the context of measuring health and psychosocial variables.

Some of the methodological issues related to social research in general are also applicable to minority aging research. General concerns are: errors related to items and criteria (e.g., lack of standard items with explicit coding instructions, lack of algorithms to account for missing data) (Teresi, Lawton, Ory, & Holmes, 1994), errors due to occasion (e.g., the health or psychological status of respondents or external environmental conditions); errors due to raters (e.g., reporting bias) (Teresi & Holmes, 1997), lack of established interrater reliability, lack of adequate training for assessors); and errors related or due to respondents themselves, including low level of arousal, impaired communication ability, inability to provide informed consent, decrements in vision, hearing or motor abilities, presence of conditions such as depression and fatigue, and demographic characteristics (e.g., age, gender, racial/ ethnic/cultural group membership, level of education). (See Teresi & Holmes for a detailed discussion of these latter issues.)


Methods for Identifying Bias

There are three broad approaches for assessing the magnitude and nature of bias in measures across groups: qualitative studies, classic psychometric studies, and studies using modern psychometric methods.

Qualitative Studies. Qualitative studies can be used to assess the conceptual equivalence of existing measures (e.g., to explore how individuals from diverse backgrounds conceptualize a domain), and to determine whether any constructs are missing from a measurement model (Johnson et al., 1991). Qualitative approaches also can facilitate understanding of how people construct their answers — the cognitive processes of reporting (Sudman, Bradburn, & Schwarz, 1996). Cognitive testing, focus groups, and expert panels are commonly used in qualitative measurement studies.

Classic Psychometric Studies. Applications of traditional psychometric approaches have been used to examine measures across demographic subgroups, such as comparing measurement properties across subgroups simultaneously. Classic psychometric studies include examination of methods to deal with missing data, inter- vs. intragroup variability, response bias, reliability, factor structure, content validity (a form of conceptual equivalence), and construct validity.

Modern Psychometric Methods. Because classical test theory-determined parameters and summary statistics normally vary across demographic subgroups in prevalence or distribution of items (Teresi & Holmes, 1997), more advanced psychometric methods, such as item response analysis and various types of factor analysis, are also being used to examine measurement bias. Item response theory recently has been employed to detect differential item function and item bias in epidemiological screening measures (Teresi, Cross, & Golden, 1989; Teresi & Golden, 1994; Teresi, Kleinman, & Welikson, in press), and to develop more accurate estimates of prevalence (Teresi, Albert, Holmes, & Mayeux, in press). For example, recent studies of cognitive assessment screens suggest that cognitive screens and items perform differently across groups that differ in terms of education, ethnicity and race (Albert & Teresi 1999; Teresi, 1995). In view of this, it is not surprising that research findings reflect racial, ethnic, and education subgroup differences in classification rates developed using common cognitive screening measures when such rates are compared to those provided by clinical diagnosis. (Anthony, (Anthony, Niaz, Le Resche, Von Korff, & Folstein, 1982; Fillenbaum, Heyman, Williams, Prosnitz, & Burchett, 1990; Escobar et al., 1986; Gurland, Wilder, Cross, Teresi, & Barnett, 1992; Valle et al., 1991).



As research increasingly takes into account, or even focuses on differences across diverse groups, issues of measurement comparability among groups are paramount. To the extent that investigators become more acquainted with these issues, there should be greater examination of measurement instruments and techniques used to improve understanding of the adequacy of existing measures and to determine the need for additional studies of measurement bias across and within cultural, racial/ethnic, and other demographically distinguishable groups.



Abebimpe, V.R. (1994). Race, racism, and epidemiological surveys. Hospital and Community Psychiatry, 45(1), 27-31.

Advisory Panel on Alzheimer’s Disease. (1993). Fourth Report of the Advisory Panel on Alzheimer’s Disease, 1992. NIH Pub. No. 93-3520. Washington, DC: U.S. Government Printing Office.

Albert, S., & Teresi, J. (1999). Reading ability, education and cognitive status assessment among older adults in Harlem, New York City. American Journal of Public Health, 89, 95-97.

Anderson, R.T., Aaronson, N.K., & Wilkin, D. (1993). Critical review of the international assessments of health- related quality of life. Quality of Life Research, 2, 365-395.

Angel, R., & Thoits, P. (1987). The impact of culture on the cognitive structure of illness. Culture, Medicine, and Psychiatry, 11, 465.

Anthony, J.C., Niaz, U., LeResche, L.A., VonKorff, M.R., & Folstein, M.F. (1982). Limits of the Mini-Mental State as a screening test for dementia and delirium among hospital patients. Psychological Medicine, 12, 397-408.

Arean, P.A., & Gallagher-Thompson, D. (1996). Issues and recommendations for the recruitment and retention of older ethnic minority adults into clinical research. Journal of Consulting and Clinical Psychology, 64, 878.

Berkanovic, E., & Telesky, C. (1985). Mexican-American, Black-American and White-American differences in reporting illnesses, disability, and physician visits for illnesses. Social Science & Medicine, 20, 567.

Bullinger, M., Anderson, R., Cella, D., & Aaronson, N. (1993). Developing and evaluating cross-cultural instruments from minimum requirements to optimal models. Quality of Life Research, 2, 451-459.

Chwalow, A.J. (1995). Cross-cultural validation of existing quality of life scales. Patient Education & Counseling, 26(1-3), 313-318.

Devore, W., & Schlesinger, E.G. (1987). Ethnic-sensitive social work practice (2nd ed.). Columbus, OH: Merill Publishing.

Dohrenwend, B.P. (1975). Sociocultural and social-psychological factors in the genesis of mental disorders. Journal of Health and Social Behavior, 16, 365-392.

Escobar, J., Burnam, M., Karno, M., Forsythe, A., Landsverk, J., & Golding, J.M. (1986). Use of the Mini- Mental State Examination (MMSE) in a community population of mixed ethnicity. Journal of Nervous and Mental Disease, 174, 607-614.

Ferketich, S., Phillips, L., & Verran, J. (1993). Focus on psychometrics: Development and administration of a survey instrument for cross-cultural research. Research in Nursing & Health, 16, 227-230.

Fillenbaum, G.G., Heyman, A., Williams, K., Prosnitz, B., & Burchett, B. (1990). Sensitivity and specificity of standardized screens of cognitive impairment and dementia among elderly Black and White community residents. Journal of Clinical Epidemiology, 43, 650-660.

Forsythe, H.E., & Gage, B. (1994). Use of a multicultural food-frequency questionnaire with pregnant and lactating women. American Journal of Clinical Nutrition, 59, 2038-2068.

Fullerton, J.T., Wallace, H.M., & Concha-Garcia, S. (1993). Development and translation of an English- Spanish dual-language instrument addressing access to prenatal care for the border-dwelling Hispanic women of San Diego County. Journal of Nurse-Midwifery, 38, 45- 49.

Furnham, A., & Malik, R. (1994). Cross-cultural beliefs about depression. International Journal of Social Psychiatry, 40, 106-123.

Garland, B.J., Wilder, D.E., Cross, P.E., Teresi. J.A., & Barnett, V.W. (1992). Screening scales for dementia: toward reconciliation of conflicting cross-cultural findings. International Journal of Geriatric Psychiatry, 7, 105-113.

Gibson, R.C. (1991). Race and the self-reported health of elderly persons. Journal of Gerontology: Social Sciences, 46, S235-S242.

House, J.S. (1986, Wave 1). Americans’ changing lives. [Computer file]. Ann Arbor, MI: Inter-University Consortium for Political and Social Research.

Iwata, N., & Salto, K. (1993). The factor structure of the 28-item General Health Questionnaire when used in Japanese early adolescents and adult employees: Age- and cross-cultural comparisons. European Archives of Psychiatry and Clinical Neuroscience, 242, 172-178.

Johnson, T.P., O’Rourke, D., Chavez, N., et al. (1996). Cultural variations in the interpretation of health survey questions. In R. Warnecke (Ed.). Health survey research methods. Hyattsville, MD: National Center for Health Statistics.

Kaye, J.M., Lawton, P., & Kaye, D. (1990). Attitudes of elderly people about clinical research on aging. Gerontologist, 30, 100.

Knight, G.P., Virdin, L.M., Ocampo, K.A., & Roosa, M. (1994). An examination of the cross-ethnic equivalence of measures of negative life events and mental health among Hispanics and Anglo-American children. American Journal of Community Psychology, 22(5), 767-783.

Marin, G., Burhansstipanov, L., Connell, C.M., Gielen, A.C., Helitzer-Allen, D., Lorig, K., Morisky, D.E.. Tenney, M., & Thomas, S. (1995). A research agenda for health education among underserved populations. Health Education Quarterly, 22, 346-363.

Marin, G., & Perez-Stable, E.J. (1995). Effectiveness of disseminating culturally appropriate smoking-cessation information: Programa Latino Para Dejar de Fumar. Journal of the National Cancer Institute Monographs, 18, 155-163.

McCarthy, C.R. (1994). Historical background of clinical trials involving women and minorities. Academic Medicine, 69, 695.

Osmond, D.H., Vranizan, K., Schillinger, D., Stewart, A.L., & Bindman, A.B. (1996). Measuring the need for medical care in an ethnically diverse populat