Monday, September 28, 2020

How to Analyze and Present a Complex Dataset – in 30 Minutes

For New Graduates/Analysts

Often, with minimal time on hand – say 30 minutes – to summarize and present a relatively large and complex home sales dataset, comprising 18 months of data, with 30K rows, and ten variables, here is one approach worth considering:

1. Given the limited time, instead of trying to crunch the data in a spreadsheet, it's better to one's your favorite statistical software like SAS, SPSS, etc. What SAS will do in four short statements (Proc Means, Var, Class, and Output), and a matter of minutes, will need much longer to accomplish the same in spreadsheets. When one is starting out, it's good to take full advantage of these types of highly visible, often gratifying challenges to narrow the potential competition down.

2. It's good to have a realistic game plan. Instead of shooting for an array of parameters, it's better to start with the most significant one, i.e., Monthly Median Sale Price (and the normalized Sale Price per SF). Since the median is not prone to outliers, the dataset doesn't have to be edited for outliers, saving a significant amount of time.  

3. Now that the monthly median prices are there, one should be ready to create graphs for the presentation. While one graph depicting both prices (Y1 and Y2) against months (X-axis) may be created, it's prudent to keep them separated for ease of presentation. 

4. Since basic graphing is more straightforward in Excel (in fairness to the remaining time), it's better to transfer the output from SAS to Excel, ensuring that the graphs are adequately annotated and dressed up with the axis titles, legends, gridlines, etc. One must also remember that just doing things the right is not good enough, one must learn to present things elegantly as well. 

5. Since so much of the data have been summarized and rolled up behind one or two graphs, one must make sure they not only tell the overall story but also convey enough business intelligence to make the presentation look like a well-thought-out business solution in front of the attending EVP, SVP, etc. In the presence of clients, it enhances the bosses' image as well. So, it's smart to add trendlines alongside the data trend, selecting the primary trendline by eyeballing the data trend (linear, logarithmic, polynomial, etc.). Adding a 2 to 3-month moving average (depending on the time series) trendline to iron out any monthly aberrations could enhance the presentation.

6. It's also smart to keep the reporting verbiage clear and concise, explaining the makeup of the dataset, methodology including monthly medians, and how the normalized prices add value and help validate the primary. It's also important to explain the use of the trendline and its statistical significance and the other statistical measures like r-squared, slopes, etc. one might display on the graphs (avoiding the printing of equations on the graphs). 

7. It's good to add some business intelligence to the talking points, sticking to the market being presented but proving the depth of knowledge of that market by highlighting possible headwinds and tailwinds and how they would react to an inverted yield curve. Also, one should address other issues: If there is an on-going structural shift in demand for homes (are more millennial showing interest in that market); what the NAR's prediction of the summer inventory there is; if the inventory of affordable homes on the rise there; and how any expected change to the FHA rules would help first-time homebuyers in general, etc. 

8. One must try to control the conversation by sticking to what one is presenting, rather than what one does not have. For example, out of the ten variables, if only three are used (e.g., Sale Price, Sale Date, and Bldg SF), one should not start a conversation about the other important variables – Lot size, Age, Bldg Characteristics, and Location – that had to be left out ('If I had 30 more minutes' would be unnecessary). If that question comes up, one must answer it intelligently and truthfully, emphasizing, of course, the added utility of the three variables being used.

9. Now, let's assume that one has managed to complete the first cycle (as indicated above) in 20 minutes. In that case, one must go back to SAS and crunch the sales analysis by the sub-markets (Remember: Location! Location! Location!). In other words, one must understand how to walk down on the analysis curve. 

Of course, it's good to have these printouts handy. Just remember, one complete solution is always better than the more aspiring one but 95% complete.

-Sid Som, MBA, MIM
homequant@gmail.com

No comments:

Post a Comment