How to Customize Boundaries in a Histogram
One challenge facing data analysts is creating data visualizations that convey the most information in the simplest manner possible. Histograms are used to visualize frequencies within continuous data to show how the data is distributed. The graph includes equally sized bins to categorize the data for the plot.
In STATISTICA, by default a histogram will create bins using the integer method that automatically determines a neat scheme of grouping the data. Then a normal distribution (or other selected distribution) is overlaid for comparison. At times, the data can be better represented in an alternative way. In this article, we will look at how to customize histogram bins using the boundary method, which is the most flexible option for creating bins for a histogram. With this method, there is no need for equally spaced groupings, giving you more control over the display.
STATISTICA is designed so that the most common options for creating a 2D histogram are available on the Quick tab.
As an example, use the data set included in STATISTICA titled creditscoring.sta (which contains fictitious data about credit applicants), and create a histogram for the variable Amount of Credit:
Select the Graphs tab and in the Common group, click Histogram. The 2D Histogram Startup Panel is displayed, and the Quick tab is shown by default.
Click the Variables button, and in the variable selection dialog box, select Amount of Credit and then click OK.
In the Startup Panel on the Quick tab, leave the default setting of the Integer mode option button selected with the Auto check box also selected. We are not interested in checking the normality of the data plotted, so clear the Fit type: Normal check box. Click OK, and the graph is generated.
Most of the cases are in the range of $0 to $5,000 for Amount of Credit.
Now, suppose we want to highlight what is occurring in the category $0 to $5,000. To further segment this group and better understand the distribution in this range, one option is to customize the boundaries of the histogram to settings you have determined to be most informative to stakeholders. The option for custom boundaries for histogram binning is located on the Advanced tab of the 2D Histogram Startup Panel.
The Advanced tab contains all the options included on the Quick tab, and also includes options that enable you to further customize your histogram. For this example, let’s customize the boundaries so that $1,000, $2,000, $3,000, $4,000, $5,000, $10,000, and above represent the boundaries for the histogram.
To get to the Advanced tab, you will need to first resume the analysis by either pressing CTRL+R on your keyboard or clicking the 2D Histogram analysis button on the analysis bar located at the lower-left side of your screen. Click on the Advanced tab, and select the Boundaries option button. The button that previously said Change Variable changes to Specify Boundaries. Click that button to display the Specify Boundaries dialog box, and enter the boundaries we have identified as shown below.
The default option for specifying boundaries is Enter Upper Boundaries, and these boundaries are upper limits for each bin. Click OK in the Specify Boundaries dialog box.
With the boundary option selected, distribution fitting is not appropriate, so ensure that the Fit Type is set to Off. Click OK in the 2D Histograms Startup Panel. The histogram with custom boundaries is shown below.
From this histogram, you can clearly see how customers are distributed, especially in the area of interest where Amount of Credit is below $5,000. The x axis is no longer on a continuous scale, as you can see, so take caution in reading and explaining this graph.
This is one of the many ways that a graph can be modified in STATISTICA to meet individual needs on a case-by-case basis. The ways to further modify these graphs are limited only by your own imagination. Take some time to investigate and explore the many ways you can customize graphs to give better visual meaning to your data.