How To Create Random Subset of Your Data

A customer asked recently how to create a random subset. And I thought this would be a good topic for a blog.

Let us pretend…

We want to develop a credit scoring model that can be used to determine if a new applicant is a good credit risk or a bad credit risk. But I want to use a random subset of data.

Start by opening STATISTICA’s example dataset, CreditScore.sta. It has 1000 rows of data.

You don’t know where the example datasets are located? Select the Open Example menu under the File menu (or Home tab / Open). See the Datasets folder? Select it and browse for CreditScore.sta.

Select the Data menu or Data tab. If you are using the classic menus, then look for Random Sampling menu.If you are using the Ribbonbar, then look for Sampling on the far right.

On the Simple Sampling tab, select the Exact checkbox. Type 25 in to the Approximate % field. Click OK.

You now have a random subset with 250 rows of data.

About statsoftsa

StatSoft, Inc. was founded in 1984 and is now one of the largest global providers of analytic software worldwide. StatSoft is also the largest manufacturer of enterprise-wide quality control and improvement software systems in the world, and the only company capable of supporting its QC products worldwide, with wholly owned subsidiaries in all major markets (StatSoft has 23 full-service offices, on all continents), and its software is available in more than 10 languages.

Posted on January 30, 2013, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: