Fitting a Dirichlet distribution by maximum likelihood
dirifit depvarlist [weight] [if exp] [in range] [, { alphavar(varlist_a) alpha1|2|3|..|k(varlist_a_j) } | { muvar(varlist_m) phivar(varlist_p) mu1|2|3|...|k(varlist_m_j) baseoutcome(var) alternative } robust cluster(clustervar) level(#) maximize_options ]
by ... : may be used with dirifit; see help by.
fweights and aweights are allowed; see help weights.
Description
dirifit fits by maximum likelihood a Dirichlet distribution to a set of variables depvarlist. Each variable in depvarlist ranges between 0 and 1 and all variables in depvarlist must, for each observation, add up to 1: for example, they may be proportions.
Note that cases will be ignored if the one or more of the dependent variables has a value less than or equal to zero or more than or equal to one or if the dependent variables don't add up to one.
dirifit uses one of two parameterizations:
A conventional parameterization with shape parameters alpha_j > 0 (one for each variable in depvarlist) (e.g. Evans et al. 2000 or Kotz et al. 2000) will be used if only depvarlist is specified or if one or more of alphavar() and alpha1|2|3|...|k() is specified. alpha_j is reported on the logarithmic scale to ensure that it remains positive. The conventional parameterization is especially useful when no covariates are present.
An alternative parameterization with location parameters mu_j (one for each variable in depvarlist except the baseoutcome) and scale parameter phi will be used if one or more of muvar(), mu1|2|3|...|k(), baseoutcome(), and phivar() is specified or if the alternative option is specified. The alternative parameterization is especially useful when covariates are present. mu_j are reported on the multinomial logit scale so that they stay between 0 and 1, and add up to one. In order to help interpretation, various types of marginal effects can be calculated with ddirifit. phi is reported on the logarithmic scale to ensure that it remains positive. This parameterization is analogous to the parameterization proposed by Paolino (2001), Ferrari and Cribari-Neto (2004), and Smithson and Verkuilen (2006) for the beta distribution.
Options
alphavar() and alpha1|2|3|...|k() allow the user to specify each parameter in the conventional parameterization as a function of the covariates specified in the variable list. The covariates in alphavar() are common to all parameters, while alpha1|2|3|...|k() allow the user to specify (additional) covariates for the first, second, third, ..., k th parameter. The order of the parameters is determined by the order of depvarlist. A constant term is always included in each equation.
muvar(), mu1|2|3|...|k(), and phivar() allow the user to specify each parameter in the alternative parameterization as a function of the covariates specified in the respective variable list. The covariates in muvar() are common to all mu parameters, while mu1|2|3|...|k() allow the user to specify (additional) covariates for the first, second, third, ..., k th mu parameter. The order of the parameters is determined by the order of depvarlist. A constant term is always included in each equation.
As implied above, just one parameterization should be chosen.
alternative ensures that the alternative parameterization is used instead of the conventional parameterization if only depvarlist is specified. This option cannot be used with alphavar() or alpha1|2|3|...|k().
baseoutcome variable in depvarlist that will be the baseoutcome. The default is the first variable of depvarlist. This option cannot be used with alphavar() or alpha1|2|3|...|k().
robust specifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation; see [U] 20.14 Obtaining robust variance estimates ([U] 23.14 in version 8). robust combined with cluster() allows observations which are not independent within cluster (although they must be independent between clusters).
cluster(clustervar) specifies that the observations are independent across groups (clusters) but not necessarily within groups. clustervar specifies to which group each observation belongs; e.g., cluster(personid) in data with repeated observations on individuals. See [U] 20.14 Obtaining robust variance estimates ([U] 23.14 in version 8). Specifying cluster() implies robust.
level(#) specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level.
nolog suppresses the iteration log.
maximize_options control the maximization process; see help maximize. If you are seeing many "(not concave)" messages in the log, using the difficult option may help convergence.
Saved results
In addition to the usual results saved after ml, dirifit also saves the following, as appropriate:
e(b_alpha1) to e(b_alphak) (where k is the number of variables in depvarlist) are row vectors containing the parameter estimates from each equation in the conventional parameterization.
e(b_phi) and e(b_mu1) to e(b_muk) (where k is the number of variables in depvarlist) except for the baseoutcome, are row vectors containing the parameter estimates from each equation in the alternative parameterization.
e(length_b_alpha1) to e(length_b_alphak) or e(length_b_mu1) to e(length_b_muk) and e(length_b_phi) contain the lengths of these vectors. If no covariates are specified in an equation, the corresponding vector has length equal to 1 (the constant term); otherwise, the length is one plus the number of covariates.
Examples
use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear
dirifit governing safety education recreation social urbanplanning, /// mu(minorityleft noleft houseval popdens)
ddirifit, at(minorityleft 0 noleft 0 )
(click to run)
Authors
Maarten L. Buis, Universitaet Tuebingen maarten.buis@uni-tuebingen.de
Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk
Stephen P. Jenkins, University of Essex stephenj@essex.ac.uk
Acknowledgement Philipp Rehm provided a bug report.
References
Evans, M., Hastings, N. and Peacock, B. 2000. Statistical distributions. New York: John Wiley.
Ferrari, S.L.P. and Cribari-Neto, F. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31(7): 799-815.
Kotz, S., Balakrishnan, N., Johnson, N.L. 2000. Continuous multivariate distributions: Volume 1. New York: John Wiley.
MacKay, D.J.C. 2003. Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press (see pp.316-318). http://www.inference.phy.cam.ac.uk/itprnn/book.pdf
Paolino, P. 2001. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Analysis 9(4): 325-346. http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdf
Smithson, M. and Verkuilen, J. 2006. A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables. Psychological Methods 11(1): 54-71.
Also see
Online: help for dirifit_postestimation, betafit, fmlogit (if installed)