help for

ldecomp-------------------------------------------------------------------------------

Title

ldecomp-- Decomposes total effects in logistic regresion into direct and indirect effects.

Syntax

ldecompdepvar[control_var1[...]] } [if] [in] [weight],direct(varname)indirect(varlist)[at(control_var1 #[;control_var2 #][...])obsprpredprpredoddsorrindirectnormalrange(# #)nip(#)interactionsnolegendnodecompnobootstrapbootstrap_options]fweights, pweights, and iweights are allowed when the

nobootstrapoption is specified.

Description

ldecompdecomposes the total effects of a categorical variable in logistic regresion into direct and indirect effects using a method method by Erikson et al. (2005) and a generalization of this method by Buis (2008). Say our dependent variable is whether or not someone attends college, and we are interested in decomposing the total effect of class background (high or low). We suspect that part of the total effect can be explained by differences between the classes in performance during high school: higher class children do better at high school and children that do better at high school are more likely to attend college. This is the indirect effect. The direct effect of class is the effect while controlling for the performance: higher class children are more likely to attend college even if they have the same performance during high school.There are two ways in which one can see the impact of differences in the distribution of performance across classes, and thus get the indirect effect. One can fix the logistic regresion coefficients to be equal to the coefficients for the lower class and compare the proportion attending college of the group with a distribution of performance equal to the lower class and proportion attending college of the group with a distribution of performance equal to the higher class. This way the only difference between the two groups is the distribution of performance. Call this method 1. Alternatively, one can make the same comparison but fix the logistic regression coefficients to be equal to the coefficients of the higher class. Call this method 2.

Similarly, one can control for performance, and thus get the direct effect, by fixing the distribution of performance to be equal to the distribution of performance of the higher class (method 1) or the lower class (method 2). Once these distributions are fixed one can compare the proportion attending college of the group with the logistic regression coefficients of the lower class with the proportion attending college of the group with the logistic regression coefficients of the higher class.

If these direct and indirect effects are represented as odds ratios than the total effect is the product of the the direct and indirect effect, as can be seen in equations 1 (method 1) and 2 (method 2). The O represents the odds of attending college, the first subscript the distribution of performance and the second subscript the logistic regression coefficients.

O_hl O_hh O_hh ------ X ------ = ------ (1) O_ll O_hl O_ll

O_hh O_lh O_hh ------ X ------ = ------ (2) O_lh O_ll O_ll

If these direct and indirect effects are represented as log odds ratios than the total effect is the sum of the the direct and indirect effect, as can be seen in equations 1' (method 1) and 2' (method 2).

+ + + + + + | O_hl | | O_hh | | O_hh | ln|------| + ln|------| = ln|------| (1') | O_ll | | O_hl | | O_ll | + + + + + +

+ + + + + + | O_hh | | O_lh | | O_hh | ln|------| + ln|------| = ln|------| (2') | O_lh | | O_ll | | O_ll | + + + + + +

By default

ldecompshows the decomposition in terms of log odds ratios for both method 1 and 2, and computes standard errors usingbootstrap. By specifying therelindiroptionldecompwill also show estimates of the size of the indirect relative to the total effect, using method 1 and 2 and the average of these two, and their standard errors, also computed using the bootstrap.In order for this decomposition to work one needs a whole set of odds of attending college, both of groups that actually exist in the data like the group with the distribution of performance and the logistic regression coefficients of the lower class, and of groups that don't exist in the data, like the group with the distribution of performance of the lower class and the logistic regression coefficients of the higher class. All these odds are computed by transforming the average predicted probability of all these actual and counterfactual groups. Erikson et al. (2005) and Buis (2008) differ with respect to the way these average probabilities are computed: Erikson et al. assume that ability is normaly distributed and numerically integrate over this distribution, while Buis makes no assumption about this distribution and just computes the predicted probabilities and then computes the mean, effectively integrating over the empirical distribution of performance instead of over a normal distribution.

Other than Erikson et al. (2005)

ldecompalso allows one to add control variables. While computing this decompostion these will by default be fixed at their mean value if no value was specified in theat()option

Options

direct(varname)specifies the variable whose direct effect we want to decompose into an indirect and total effect. This has to be a categorical variable, each value of varnameis assumed to represent a group.

indirect(varlist)specifies the variable(s) through which the indirect effect occurs. By default multiple variables are allowed and these can be from any distribution. If thenormaloption is specified only one variable can be entered, and this variable is assumed to be normally distributed.

at(control_var1 #[; control_var2 #][...])specifies the values at which the control variables are to be fixed. The default is to fix the value of control variable at its mean value.

obsprspecifies that a table of the observed proportions are to be displayed.

predprspecifies that a table of predicted and counterfactual proportions is to be displayed. If thenormaloption is not specified the diagonal elements of this table will be exactly the same as the observed proportions.

predoddsspecifies that a table of predicted and counterfactual odds is to be displayed.

orspecifies that the decomposition is displayed in terms of odds-ratios instead of log odds-ratios.

rindirectspecifies that the relative contributions of the indirect effects to the total effect (in terms of log odds ratios) is to be displayed.

normalspecifies that the predicted and counterfactual proportions are to be computed according to the method specified by Erikson et al. (2005). This means that the variable specified inindirect()is assumed to be normally distributed. This option was primarily added for compatibility with Erikson et al. (2005). By default the method by Buis (2008) is used, which allows multiple variables to be specified inindirect()and makes no assumptions about the distribution of these variables.

range(##)specifies the range over which the numerical integration of varlist is to be performed. The default is the minimum of the variable in varlist minus 10% of the range of varlist and the maximum of varlist plus 10% of the range of varlist. This option can only be specified with thenormaloption because in the default method there is no need for numerical integration.

nip(#)specifies the number of integration points used in the numerical integration of the variable in varlist. The default is 1000. This option can only be specified with thenormaloption because in the default method there is no need for numerical integration.

interactionsspecifies that interactions between the categories of the variable specified indirect()and the variable(s) specified inindirect(). In other words the effects of the variables specified inindirect()on the dependent variable are allowed to differ from one another for each category of the variable specified indirect(). This option was primarily added for compatibility with Erikson et al. (2005).

nolegendsuppresses a legend that is by default displayed at the bottom of the main table.

nodecomppreventsldecompfrom displaying the table of decompositions, which can be useful in combination with theobspr,predpr, and/orpredoddsoptions.

nobootstrappreventsldecompfrom usingbootstrapto calculate standard errors.

bootstrap_optionsThe following options ofbootstrapare allowed:reps(#),strata(varlist),size(#),cluster(varlist),idcluster(newvar),saving(filename[,suboptions]),bca,mse,level(#),nodots,seed(#), andjackknifeopts(jkopts).

Example

. use wisconsin.dta, clear. ldecomp college , direct(ocf57) indirect(hsrankq)

AuthorMaarten L. Buis Universitaet Tuebingen Instituet fur Soziologie maarten.buis@uni-tuebingen.de

ReferencesBuis, M.L. (2009). Direct and indirect effects in a logit model. http://www.maartenbuis.nl/wp/ldecomp.html

Erikson, R, J.H. Goldthorpe, M. Jackson, M. Yaish, D.R. Cox (2005). On class differentials in educational attainment.

Proceedings of theNational Academy of Science, 102(27): 9730-9732.Jackson, M, R. Erikson, J. Goldthorpe, M. Yaish (2007). Primary and secondary effects in class differentials in educational attainment: The transition to A-level courses in England and Wales.

ActaSociologica, 50(3): 211-229.

Also seeOnline:

logitbootstrapbootstrap_postestimationjackknifeIf installed:

fairliegdecomp