Predictive Analytics, Pasta and Earthquakes
My darling daughter is doing an experiment. Her class is creating two-story buildings that must contain multiple rooms. Their building materials are pasta and glue. The children select what types of pasta they want to use. The glue is flour + water.
After the buildings are finished, the class is going to simulate an earthquake with a shaking table. Their objective…see what happens.
This morning we went to buy pasta. My daughter and I discussed the strength of different pastas. We talked about what pasta would make better walls. We settled on lasagna, spaghetti, manicotti and some egg noodles.
It sounds like a fun project, but I don’t know what my daughter is actually learning. Earthquakes make buildings fall down?
As I drove to work this morning, a thought went through my head. Her activity is similar to a few predictive analytics projects that I have observed in my career. The dataset is gathered (build a house of pasta). Sometimes this does involve adding simulated data. Then you analyze the data and see what happens. These projects don’t typically end in a happy place unless their goal is “data understanding” (also known as exploratory data analysis). That is because the first step…an important step was skipped over.
What is the business’s vision? What are the project’s objectives? How are we measuring success…deployment of solution or a report?
As a project manager working with the CRISP methodology (Cross Industry Standard Process for Data Mining), it is critical to get answers to these questions. I need the story (vision) to guide the project. I need the objectives and expected outcomes to measure success to manage a project.
A couple of years ago, I attended a networking event at a university on predictive analytics. (I partly attended because of the “free lunch.”) One of the event’s goals was to allow business people and students to mingle and learn from each other. Each student had completed a predictive analytics project. There were about 125 people at the event.
I would look at reports from the student’s predictive analytics projects. I would ask some questions. The students would get to practice presenting skills. They would answer some questions.
I stopped at the first project and started talking with the student. The student had used U.S. Department of Transportation data. He presented his results. Vehicle accidents most commonly occur in:
- wet conditions
- poor visibility (at night)
- on curvy roads
The student showed me his pretty, pretty graph. And then he stopped talking.
He had discovered facts that any experienced driver understands. I was slightly stunned that he made no attempt to frame his discovery or discuss it. He leaned too hard on Data Visualization to tell his story. He did not acknowledge just how obvious his discovery was. I decided to ask a CRISP question.
I asked, “Why did you analyze this data? What were your objectives for the analysis?” (The first step of CRISP is Business Understanding.)
Student replied, “I was assigned the data.”
OK. I should have stopped asking questions at that point and grabbed lunch. I decided to try asking a couple more questions. I asked, “What is the actionable information? How could it be used or deployed?” (The last step of CRISP.)
It was clear that this question puzzled the student, who just wanted the grade.
I moved on and talked with other students and looked at other projects. While they were assigned data to work with, they made up a backstory/vision…or they actually did some research and figured out a common use case. For example another team had telecommunication data and focused on discovering upsell opportunities.
Understanding and being able to communicate why you are analyzing the data is very important.