How To Create Random Subset of Your Data
A customer asked recently how to create a random subset. And I thought this would be a good topic for a blog.
Let us pretend…
We want to develop a credit scoring model that can be used to determine if a new applicant is a good credit risk or a bad credit risk. But I want to use a random subset of data.
Start by opening STATISTICA’s example dataset, CreditScore.sta. It has 1000 rows of data.
You don’t know where the example datasets are located? Select the Open Example menu under the File menu (or Home tab / Open). See the Datasets folder? Select it and browse for CreditScore.sta.
Select the Data menu or Data tab. If you are using the classic menus, then look for Random Sampling menu.If you are using the Ribbonbar, then look for Sampling on the far right.
On the Simple Sampling tab, select the Exact checkbox. Type 25 in to the Approximate % field. Click OK.
You now have a random subset with 250 rows of data.