The “Problem” With Data Science (Projects)

Michael Hartnett
4 min readFeb 17, 2021

No, this tongue in cheek blog title isn’t going to be me complaining about data science — it’s actually more of a reflection of my experience so far and insights I have gained through my data science journey.

So far we have been assigned four data science related projects with a capstone project also in the works. The projects have been: SAT/ACT scores, predicting housing prices (Ames Data), NLP and Sentiment analysis involving reddit subreddits, and an open ended group project. I certainly wouldn’t consider myself a master of the process by any means, but over time I have begun to notice trends and constants in what helps with designing a proper data science project. Like anything else in life, the biggest factor I have noticed with these range of projects involves step one: the problem statement. One of the most helpful posts that helped me reshape my thinking that I highly urge any aspiring data scientist to read can be found here:

As Vinita aptly points out in the first paragraph, “The problem statement stage is the first and most important step of solving an analytics problem. It can make or break the entire project.” These words have resonated deeply with me as I progressed from project to project, always trying to keep in mind the problem statement as I designed my notebook and presentation. Learning from the errors of ways from project 1, I wanted to approach my housing price predictor project with as much data science gusto as possible.

To start, I wanted to establish a somewhat realistic business situation that might occur in the real world. Luckily I had come from a real estate background so I had some ideas already in mind, so I began the design with selecting a business in need of my services: my former employer, Rise Realty. The next step was formulating why Rise would need my help. I already had the basic assignment of having to determine unknown prices of a house, so how could I frame that in such a way that would generate value to Rise. One question throughout our cohort we are always reminded of during our problem statement stage is: “who cares?” This always help me create a meaningful project. With this mind my problem statement began to materialize by asking myself a series of questions.

Who Cares? Rise Realty cares.

Why Have They Contacted Me? They are looking for help with determining most profitable neighborhoods to maximize profits.

Why Do They Care? They care because my project will help maximize profit.

What Value Will I Provide? I will create a housing predictor model that will highlight the most profitable neighborhoods based on sales price.

Using my answers to these main questions I was able to sew them together and paraphrase into a much more succinct clearer problem statement: “Rise Realty has approached me wondering, given their situation (number of agents) is their a correlation between neighborhoods and the potential earning opportunities for Rise and their agents? Is there a neighborhood that best provides potential sales opportunities?” With a well defined problem statement designing my project became way more streamlined. I was able to build around knowing “okay, my goal is to first build a model to predict prices and then use that model to find most profitable neighborhoods.” The rest of the project became just filling into details of my methods of getting from A to B as opposed to project 1 where I more or less just meandered through some ideas in a jupyter notebook.

The other key importance highlighted by creating a clear problem statement is that it allows you to easily provide a succinct conclusion. Once I ran my models and drew correlations, I was able to answer Rise definitely with “based on my findings I recommend these neighborhoods to maximize your profits…”

So for any incoming Data Scientists (I still include myself in that group), whenever you’re a faced with a project, I urge you to not jump over square one of designing a problem statement. Do not assume the statement will make itself evident as you go along, instead save yourself the trouble and ask yourself those same questions I did. Who Cares? What Do They Need Help With? Why Do They Care? and most importantly What Value Can I Provide? You will surely find that your project will come together and the structure will begin to build around your problem statement (as it should) and then at the end of the road, make sure you have arrived at some sort of conclusion. So keep in mind ultimately “Is this helping me answer my problem statement?”

--

--