Maven Marketing Challenge

This report was my submission for the Maven Marketing Challenge. I tried to focus on improving my data cleaning. I found a few interesting things in the data the caused me to weed out a few customers:

  • 3 customers had birth years of 1893, 1899, and 1900 making each of them well over 100 years old. I assumed they were no longer with us and removed them from the dataset. (ID #’s 11004, 1150, 7829).
  • 4 customers had 0 purchases in either the web, catalog, or store categories but had a dollar amount listed in the some of the purchase amount categories. They got booted too. (ID #’s 1110, 3955, 5555, 11181).
  • 2 customers had “absurd” listed as their marital status; 3 had “alone”, and 2 had “YOLO”. I kept them in the dataset but changed their status to “Single”.
  • 485 customers had “PhD” listed as their education level. I changed this to the more encompassing term “Doctorate”.
  • 201 customers had “2n Cycle” as their education level. I did a brief Google search and it seems this is equivalent to a Master’s degree, so they got swapped over to consolidate the groups.
  • ~24 customers had no income level listed but I left them in since they only made up about 1% of the population.