Evaluation of Mantel-Haenszel Statistic for Detecting Differential Item Functioning

Nabeel Abedalaziz


ABSTRACT: The educators have been redefining the goals of instruction and learning to include increased attention to high-level thinking skill. Mantel-Haenszel methods comprise a highly flexible methodology for assessing the degree of association between two categorical variables, whether they are nominal or ordinal, while controlling for other variables. The versatility of Mantel-Haenszel analytical approaches has made them very popular in the assessment of the DIF (Differential Item Functioning) of both dichotomous and polytomous items. The Mantel-Haenszel (M-H) procedure was originally used to Match subjects retrospectively on cancer risk factors in order to study current cancer rates (Mantel & Haenszel, 1959). The terminal objective of the study was to find out the impact of the number of score groups and the inclusion or exclusion of the studied item in forming score groups on estimating αs. Results indicated that: (1) fourth or more score groups yields stable α estimates with Mantel-Haenszel approach; and (2) the inclusion of the studied item is convergent to result in fewer items with significant chi-square values than the exclusion of the studied item in forming score groups. These findings seem to be consistent with the previous researches.

KEY WORDS: Differential Item Functioning, Mantel-Haenszel method, bias, estimating, and inclusion or exclusion of the studied item.

About the Author: Dr. Nabeel Abedalaziz is a Lecturer at the Department of Educational Psychology and Counseling, Faculty of Education UM (University of Malaya), 50603 Kuala Lumpur, Malaysia. He can be reached at: nabeelabdelazeez@yahoo.com and nabilaziz@um.edu.my

How to cite this article? Abedalaziz, Nabeel. (2011). “Evaluation of Mantel-Haenszel Statistic for Detecting Differential Item Functioning” in EDUCARE: International Journal for Educational Studies, Vol.3(2) February, pp.177-186. Bandung, Indonesia: Minda Masagi Press owned by ASPENSI in Bandung, West Java; and FKIP UMP in Purwokerto, Central Java, ISSN 1979-7877.

Chronicle of the article: Accepted (December 22, 2010); Revised (January 25, 2011); and Published (February 17, 2011).

Full Text:



Camilli, G. & L. Shepard. (1994). Methods for Identifying Biased Test Items. California: Sage Publication.

Dorans, N.J. & P.W. Holland. (1993). “DIF Detection and Description: Mantel-Haenszel and Standardization” in P.W. Holland & H. Wainer [eds]. Differential Item Functioning. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc., pp.35-66.

Hambleton, R.K. & H.J. Rogers. (1989). “Detection Potentially Biased Test Items: Comparison of IRT Areas and Mantel-Haenszel Methods” in Applied Measurement in Education, 2, pp.313-334.

Holland, P.W. & D.T. Thayer. (1986). “Differential Item Performance and the Mantel-Haenszel Procedure”. Paper presented at the meeting American Educational Research Association.

Holland, P.W. & D.T. Thayer. (1988). “Differential Item Performance and Mantel-Haenszel Procedure” in H. Wainer & H.I. Braum [eds]. Test Validity. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc., pp.129-145.

Mantel, N. & W. Haenszel. (1959). “Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease” in Journal of the National Cancer Institute, 22, pp.719-748.

NRC [National Research Council]. (1989). Every Body Counts. Washington, DC: National Academy of Science.

Raju, N.S., R.K. Bod & V.S. Larsen. (1989). “An Empirical Assessment of the Mantel-Haenszel Statistic for Studying Differential Item Performance” in Applied Measurement in Education, 2(1), pp.1-13.

Scheuneman, J.D. (1979). “A Method for Assessing Bias in Test Items” in Journal of Educational Measurement, 16, pp.143-152.

Schumacher, R. (2005). “Test Bias and Differential Item Functioning” in http://www.appliedmeasurementassociates.com.pdf [Accessed at Kuala Lumpur, Malaysia: 18 November 2010].

Wang, N. & S. Lane. (1996). “Detection of Gender-Related Differential Item Functioning in a Mathematical Performance Assessment” in Applied Measurement, 12(2).

Wright, D.J. (1986). “An Empirical Comparison of the Mantel-Haenszel and Standardization Methods of Detecting Differential Item Performance” in Statistical Report, No.SR-86-99.

Zumbo, B.D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Like(Ordinal) Item Scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.

EDUCARE: International Journal for Educational Studies. Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-BerbagiSerupa 4.0 Internasional