How to Estimate a Regression Model Subject to Parameter Constraints
With multiple linear regression, a statistical model is computed to explain the variability in the dependent variable, Y, as a function of one or more independent variables, X. The model parameters are calculated so as to minimize the difference in the observed y values and the predicted ones. Model parameters have no other constraints in a typical regression analysis.
What if it is desired or necessary to compute a regression model, while ensuring that one or more parameters conform to a set of constraints? This article will explore one avenue of constraining model parameters during the regression function computation using the custom loss function.
For this example, we’ll use the Baseball.sta example spreadsheet in STATISTICA. Select the Home tab and, in the File group, click the Open arrow. Select Open Examples. In the Open a STATISTICA Data File dialog box, double-click on the Datasets folder, then browse to and open Baseball.sta.
In this example, the goal is to model the variable WIN as a function of RUNS, BA, and DP, with no intercept. Using General Linear Models, the regression parameter estimates for the sigma-restricted model are as follows.
These are parameters that are subject only to the constraint of minimizing the squared errors for the estimates from the chosen model.
Now, suppose it is necessary to compute the best regression function for the same variables, no intercept, while constraining all parameters in the model to be positive. How is this accomplished in STATISTICA? By adding penalties to the loss function for parameters outside the desired range.
Select the Statistics tab. In the Advanced/Multivariate group, click Advanced Models and select Nonlinear Estimation. In the Nonlinear Estimation Startup Panel, select User-specified regression, custom loss function.
Click the OK button.
In the User-Specified Regression, Custom Loss dialog box, click the Function to be estimated & loss function button to display the Estimated function and loss function dialog box. Enter the model to estimate. In this case, the model is:
‘WIN’ = B1 * ‘RUNS’ + B2 * ‘BA’ + B3 * ‘DP’
The custom Loss Function is where the constraints of the model are conveyed. The default loss function is the squared difference in observed and predicted Y, WIN in this case. Adding penalty functions to the loss function when parameters go outside of the desired range will effectively constrain those parameters within our desired limits. Here, a penalty should be added if any of the parameters, B1-3, should be other than positive. This is achieved with the following custom loss function:
(OBS-PRED)**2 + (B1<0) * 1000 + (B2<0) * 1000 + (B3<0) * 1000
The loss function is very heavily penalized when any parameter is negative.
Click OK in the Estimated function and loss function dialog box. Accept all other default settings, and click OK in the User-Specified Regression, Custom Loss Startup Panel, and then click OK in the Model Estimation dialog box to advance to the Results dialog box. Output the Summary spreadsheet to view the new parameter estimates in this constrained regression. As expected, all parameters are positive.
Using this same strategy, a regression equation can be computed subject to your desired constraints.