------------------------------------------------------------------------------- help forbetafit-------------------------------------------------------------------------------

Fitting a two-parameter beta distribution by maximum likelihood

betafitdepvar[weight] [if] [in] [,{alphavar(varlist_a)betavar(varlist_b)} | {muvar(varlist_m[,noconstant])phivar(varlist_p[,noconstanteform])alternative}rprrobustcluster(clustervar)level(#)maximize_options]

by...:may be used withbetafit; see help by.

fweights andaweights are allowed; see help weights.When using Stata version 11 or higher,

alphavar,betavar,muvar, andphivarmay contain factor variables; see fvvarlist.

Description

betafitfits by maximum likelihood a two-parameter beta distribution to a distribution of a variabledepvar.depvarranges between 0 and 1: for example, it may be a proportion.Note that cases will be ignored if the dependent variable has a value less than or equal to zero or more than or equal to one.

betafitcan still be used to fit a variable with a range beyond (0, 1) by rescaling this variable. Several examples are shown in Smithson and Verkuilen (2006).

betafituses one of two parameterizations:A conventional parameterization with shape parameters

alpha> 0 andbeta> 0 (e.g. Forbes et al. 2011 or Johnson et al. 1995) will be used if onlydepvaris specified or if one or both ofalphavar()andbetavar()is specified. The conventional parameterization is especially useful when no covariates are present.An alternative parameterization with location parameter

muand scale parameterphi(e.g. Ferrari and Cribari-Neto 2004, Paolino 2001, or Smithson and Verkuilen 2006) will be used if one or bothmuvar()andphivar()is specified or if thealternativeoption is specified. The alternative parameterization is especially useful when covariates are present.muis reported on the logit scale so that it stays between 0 and 1, i.e. logitmu=muvar* e(b_mu). In order to help interpretation, various types of marginal effects can be calculated with dbetafit.phiis reported on the logarithmic scale to ensure that it remains positive, i.e. lnphi=phivar* e(b_phi).

Options

alphavar()andbetavar()allow the user to specify each parameter in the conventional parameterization as a function of the covariates specified in the respective variable list. A constant term is always included in each equation.

muvar()andphivar()allow the user to specify each parameter in the alternative parameterization as a function of the covariates specified in the respective variable list. A constant term can be suppressed in each equation by specifying thenoconstantsuboption. To display exponentiated coefficients for thephiequation, specify theeformsuboption.As implied above, just one parameterization should be chosen.

alternativeensures that the alternative parameterization is used instead of the conventional parameterization if onlydepvaris specified. This option cannot be used withalphavar()orbetavar().

rprreports the estimated coefficients transformed to relative proportion ratios, i.e., exp(b) rather than b. Standard errors and confidence intervals are similarly transformed. This option affects how results are displayed, not how they are estimated. The interpretation of these relative proportion ratios is discussed in detail in the examples below.Relative proportion ratios can be useful when the model contains interaction terms, as in that case marginal effects as computed by

dbetafitwill no longer be appropriate. Relative proportion ratios for the interaction terms can still be interpreted as the factor by which the relative proportion ratio changes, as is discussed in Buis (2010).

robustspecifies that the Huber/White/sandwich estimator of variance is to be used in place of the traditional calculation; see[U] 23.14Obtaining robust variance estimates.robustcombined withcluster()allows observations which are not independent within cluster (although they must be independent between clusters).

cluster(clustervar)specifies that the observations are independent across groups (clusters) but not necessarily within groups.clustervarspecifies to which group each observation belongs, for example,cluster(personid)in data with repeated observations on individuals. See[U] 23.14 Obtaining robust variance estimates. Specifyingcluster()impliesrobust.

level(#)specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level.

nologsuppresses the iteration log.

maximize_optionscontrol the maximization process; see help maximize. If you are seeing many "(not concave)" messages in the log, using thedifficultoption may help convergence.

Saved resultsIn addition to the usual results saved after

ml,betafitalso saves the following as appropriate if no covariates have been specified:

e(alpha)ande(beta)are the estimated parameters in the conventional parameterization.

e(mu)ande(phi)are the estimated parameters in the alternative parameterization.The following results are saved regardless of whether covariates have been specified, as appropriate:

e(b_alpha)ande(b_beta)are row vectors containing the parameter estimates from each equation in the conventional parameterization.

e(b_mu)ande(b_phi)are row vectors containing the parameter estimates from each equation in the alternative parameterization.

e(length_b_alpha)ande(length_b_beta)ore(length_b_mu)ande(length_b_phi)contain the lengths of these vectors. If no covariates are specified in an equation, the corresponding vector has length equal to 1 (the constant term); otherwise, the length is one plus the number of covariates.

Examples and interpretation of results

Marginal effectsTo help with the interpretation of the results, use dbetafit to compute a set of marginal effects. Alternatively. it is also possible to use mfx or margins (for Stata versions 11 and higher).

These marginal effects depend on the values of the explanatory/independent/x variables. So each observation will have its own marginal effects. Those displayed by

dbetafitare for a (fictional) observation whose explanatory variables are fixed at the mean or at values specified in theat()option. So in the example below the marginal effects refer to a city governed by a leftwing government (the left is not a minority and not absent from the city government, so it must be the majority) and the house value and population density are average.For this fictional city the proportion spent on governing is 9.5% [E(governing|x)]. If that city is governed by a minority left government, that proportion will decrease 0.8 percentage points; and if it is governed by only parties on the right of the political spectrum, the proportion will increase by 0.9 percentage points (First table, column Min --> Max).

A 100,000 euro increase in average house value will lead to 2.5 percentage points increase in the proportion and an extra 1000 persons per square kilometre will lead to an 1.1 percentage points decrease in the proportion (Second table, column MFX at x).

use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear

betafit governing, mu(minorityleft noleft houseval popdens)

dbetafit, at(minorityleft 0 noleft 0)(click to run)

Relative proportion ratiosAlternatively,

betafitalso allows the display of relative proportion ratios. This can be useful when the dependent variable is a proportion. Consider the example below. This models the proportion of a city-budget spent on each city's own organization. In that case the relative proportion is the proportion spent on governing divided by 1 - the proportion spent on governing. That is, in other words, the proportion spent on governing divided by the proportion spent on useful stuff. As the total budget size drops out of this ratio, we can also say that this is the number of euros spent on governing per euro spent on productive stuff.It is useful to see the baseline relative proportion, that is, the relative proportion when all covariates are equal to zero. This is the exponentiated constant. Since Stata by default supresses the display of the exponentiated constant, we need to use a trick. We first create a variable baseline that contains all 1s, and add that to our list of variables in the

muvar()option, and at the same time add thenoconstantsub-option. The coefficient of baseline is now the baseline relative proportion.In the example below, a city with a city government consisting of majority left-leaning members, an average population and house value can expect to spent 10 cents on governing per euro spent on productive stuff. This ratio decreases by 10% (i.e. [1-.90]*100% = -10%) if it is governed by a minority left government, and it increases by 11% when no left parties are represented in the city government. A 100,000 euro increase in average house value will lead to an 35% increase in the relative proportion and an extra 1000 persons per square kilometre will lead to an 11% decrease in the relative proportion.

use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, cleargen byte baseline = 1sum popdens if !missing(minorityleft, noleft, houseval, popdens), meanonlygen cpopdens = popdens - r(mean)sum houseval if !missing(minorityleft, noleft, houseval, popdens), meanonlygen chouseval = houseval - r(mean)betafit governing, ///mu(minorityleft noleft chouseval cpopdens baseline, nocons) rpr(click to run)

NoteNotice the difference between percentage point changes (in the section on marginal effects) and percentage changes (in the section on relative proportion ratios). If we start with a baseline value of 1% and change by 1 percentage point, then the result will be 1 + 1 = 2%. If we change the baseline value by 1%, the result will be 1 * 1.01 = 1.01%.

AuthorsMaarten L. Buis, Universitaet Tuebingen maarten.buis@uni-tuebingen.de

Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk

Stephen P. Jenkins, The London School of Economics and Political Science S.Jenkins@lse.ac.uk

ReferencesBuis, M.L. 2010. Stata tip 87: Interpretation of interactions in non-linear models.

The Stata Journal10(2): 305-308.Forbes, C., Evans, M., Hastings, N. and Peacock, B. 2011.

Statisticaldistributions.Hoboken, NJ: John Wiley.Ferrari, S.L.P. and Cribari-Neto, F. 2004. Beta regression for modelling rates and proportions.

Journal of Applied Statistics31(7): 799-815.Johnson, N.L., Kotz, S. and Balakrishnan, N. 1995.

Continuous univariatedistributions: Volume 2.New York: John Wiley.MacKay, D.J.C. 2003.

Information theory, inference, and learningalgorithms.Cambridge: Cambridge University Press (see p.316). http://www.inference.phy.cam.ac.uk/itprnn/book.pdfPaolino, P. 2001. Maximum likelihood estimation of models with beta-distributed dependent variables.

Political Analysis9(4): 325-346. http://polmeth.wustl.edu/polanalysis/vol/9/WV008-Paolino.pdfSmithson, M. and Verkuilen, J. 2006. A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables.

Psychological Methods11(1): 54-71.

Also seeOnline: help for betafit postestimation,