An Alternative to Chaid Segmentation Algorithm Based on Entropy.
- Galindo Villardón, María Purificación 1
- Vicente Villardón, José Luis 1
- Dorado Díaz, Ana 1
- Vicente Galindo, María Purificación 1
- Patino Alonso, María Carmen 1
- 1 Universidad de Salamanca, Departamento de Estadística
ISSN: 2215-3373, 2215-3373
Año de publicación: 2010
Volumen: 17
Número: 2
Páginas: 179-197
Tipo: Artículo
Otras publicaciones en: Revista de Matemática: Teoría y Aplicaciones
Resumen
The CHAID (Chi-Squared Automatic Interaction Detection) treebased segmentation technique has been found to be an effective approach for obtaining meaningful segments that are predictive of a K-category (nominal or ordinal) criterion variable. CHAID was designed to detect, in an automatic way, the nteraction between several categorical or ordinal predictors in explaining a categorical response, but, this may not be true when Simpson’s paradox is present. This is due to the fact that CHAID is a forward selection algorithm based on the marginal counts. In this paper we propose a backwards elimination algorithm that starts with the full set of predictors (or full tree) and eliminates predictors progressively. The elimination procedure is based on Conditional Independence contrasts using the concept of entropy. The proposed procedure is compared to CHAID.
Referencias bibliográficas
- Ávila, C.A. (1996) Una Alternativa al Análisis de Segmentación Basada en el Análisis de Hipótesis de Independencia Condicionada. Tesis Doctoral, Universidad de Salamanca.
- Baron, S.; Phillips, D. (1994) “Attitude survey data reduction using CHAID: an example in shopping centre market research”, Journal of Marketing Management 10: 75–88.
- Christensen R. (1990) Log-Linear Models. Springer-Verlag, New York.
- Clark, W.A.V.; Duerloo, M.C.; Dieleman, F.M. (1991) “Modeling categorical data with chi square automatic interaction detection and correspondence analysis ”, Geographical Analysis 23: 332–345.
- Dorado, A. (1998) Métodos de Búsqueda de Variables Relevantes en Análisis de Segmentación: Aportaciones desde una Perspectiva Multivariante. Tesis Doctoral, Universidad de Salamanca.
- Dorado, A.; Galindo. P.; Vicente, J.L.; Vicente-Tavera, S. (2002) “El CHAID como herramienta de marketing politico”, Esic Market 111: 129–140.
- Galindo, M. P.; Vicente-Galindo, P.; Patino-Alonso,C ; Vicente-Villardón, J. L. (2007) “Caracterización multivariante de los perfiles de las mujeres en situación laboral irregular: el caso de Salamanca”, Pecunia 4: 49–79.
- Kass, G.V. (1980) “An exploratory technique for investigating large quantiles of categorical data”, Applied Statistics 29: 119–127.
- Malchow, H. (1997) “The targeting revolution in political direct contact”, Campaigns & Elections 18: 51–66.
- Magidson, J. (1990) “CHAID, LOGIT and Log-linear Modelling”, Marketing Information Systems. Datrapo Report IM11-130: 101–115.
- Marques, P.; Tippetts, A.; Voas, R.; Beirness, D. (2001) “Predicting repeat DUI offenses with the alcohol interlock recorder”, Accident– Analysis–and–Prevention 33(5): 609–619.
- Mckenney, C. (2000) Women Chief Academic Officers of Public Community Colleges: Career Paths and Mobility Factors. Ed. Texas Tech University.
- Shannon, C.E.; Weaver, W. (1949/1963) The Mathematical Theory of Communication. University Illinois Press, Urbana and Chicago.
- Simpson, E.H. (1951) “The interpretation of interaction in contingence tables”, Royal Statistical Association 13B: 238–241.
- Van Diepen, M.; Franses, P.H. (2006) “Evaluating chi-squared automatic interaction detection”, Information Systems 31(8): 814–831.