# Root Cause Analysis The term root cause analysis is commonly used in manufacturing to summarize the activities involved in determining the variables or factors that impact the final quality or yield of the respective processes. For example, if a particular pattern of defects emerges in the manufacture of silicon chips, engineers will pursue various methods and strategies for root cause analysis to determine the ultimate causes of those patterns of quality problems.

One method of root cause analysis is variable screening (also called feature selection), where analytic tools are used to find the variables most highly associated with the quality issue.  Interaction terms between these variables can also be part of the root cause. The analysis will yield a list of variables that are the best predictors of the quality issues. These variables can be explored further to gain additional insight.

## Design terms

The columns of the design matrix (design terms) for interaction effects are created as follows:

Continuous-by-continuous predictor interactions – A single column is created in the design matrix for each product of the continuous predictor columns.

Continuous-by-categorical predictor interactions – First, the number of unique values (classes) in the categorical predictor is determined. As many columns as there are unique values in the categorical predictors are generated. For each column j of the k columns (unique values), a 1 is generated if the respective observation belongs to class j, and a 0 otherwise. Each column (with the 0/1 indicator codes) is then multiplied by the continuous predictor variable. Hence, for continuous-by-categorical predictor interactions, the program will generate as many columns in the design matrix as there are unique values in the categorical predictor.

Categorical-by-categorical predictor interactions – The unique combinations of groups or classes are enumerated into a single column in the design matrix. For example, the interaction between two categorical predictors with two unique values (classes) each would result in a single column with (2*2 =) 4 values. Note that these coded columns in the design matrix are technically “confounded” with the main effects. In other words, if one of the categorical predictors is strongly related to the dependent variable in the analysis, it is likely that some of the interactions with other categorical predictors will show strong relationships with the dependent variable as well.

Higher-order interactions (e.g., three-way interactions) are created accordingly, i.e., they are generated as the products of continuous and categorical predictors following the rules outlined above. For example, a three-way interaction column would be generated by multiplying a two-way interaction with another effect.

## Variable Screening

Options for variable screening enable you to screen predictor variables for regression and classification problems as well as the methods that can be used to find the predictors that are important. In general, predictor statistics can be computed by the respective method, and then predictors can be ranked based on the method-specific measure of predictor importance. The following methods may be appropriate:

Linear model. A linear fit model using stepwise selection of predictors is a simple approach to the regression problem. Predictor importance is computed by ranking the p-values for each predictor effect. For tied p-values, the rankings are based on the ranking of the F-values. For classification tasks, a stepwise linear discriminant function analysis can be used. Predictor importance is computed by ranking the values of the Wilks’ lambda statistics for each predictor.

Classification and regression trees. For classification and regression trees, the standard rankings for predictor importance are used.

Boosted trees. For boosted trees models (stochastic gradient boosting), the standard rankings for predictor importance are used.

MARSplines. For multivariate regression splines (MARSplines), rankings are computed based on the number of times that each predictor was used (referenced) in a basis function. The more frequently a predictor was used (referenced by a basis function), the greater is its importance.

Neural networks. For neural networks, the final importance rankings for the predictors is computed by averaging the importance rankings for each predictor over a set of networks. 