# Identify patterns and relationships within data and use those findings to address business needs

## Identify patterns and relationships within data and use those findings to address business needs

The purpose of this assignment is to filter and sort data, identify patterns and relationships within data, and use those findings to address business needs.

For this assignment, you will use the “Adult Incomes” data set from the Topic Materials. Most data categories are self-explanatory. Clarifying notes are as follows.

Sex: Gender

Relationship: Relationship status

Education: Highest level of education completed

Hours _Per _Week: Number of weekly hours that the individual works

Capital _Gain: Amount of capital gains from tax records

Income: Income of the individual – either “>50K” or “<=50K”

The marketing department in your organization believes that customers with more education have higher incomes than those with less education, and the department director wants to formulate an ad campaign targeting customers with a specific income level. While the assumption stands to reason, having data to support this hypothesis is necessary in order for the department to justify the new advertising campaign strategy. The department director has asked you to determine several specific numbers and percentages along with calculating to what degree educational level is correlated with income. Based upon your finding, the marketing department will determine whether or not its idea for marketing specifically to those customers is likely to create a positive return on investment.

Question 1: Use Excel and various pivot tables and charts to calculate the following information. When summarizing categorical data, sometimes it is easier to create a new indicator (0/1) variable to obtain a percentage (the average of an indicator is a percent). Create a new indicator variable “Income_GT50” where the variable is equal to 1 when the income is greater than \$50,000; otherwise use 0. Using this indicator variable, calculate the overall percent of individuals who earn more than \$50,000 annually.

Question 2: You are interested in understanding the demographics of a particular cohort in your data set: those who have at least a “Bachelors,” “Masters,” or “Doctorate” degree whose occupation is also “Exec-managerial.” Filter the data to include only these individuals, and report the count of individuals who meet the criteria along with their average age and the percentage who earn more than \$50,000 annually.

Question 3: Many times age does not have a purely linear relationship with the dependent variable, so it is separated into age brackets. Create a new variable categorizing age into brackets as follows. Include the pivot table output when submitting the answer.

“<25”: If age less than 25

“25-34”: If age between 25 and 34 inclusive

“35-44”: If age between 35 and 44 inclusive

“45-54”: If age between 45 and 54 inclusive

“55-64”: If age between 55 and 64 inclusive

“65+”: If age is 65 or more

Question 4: Calculate the percentage of customers who earn more than \$50,000 annually for each age bracket. Create a pivot chart showing the results and include the chart when submitting the answer. Is the effect of age on income a linear or quadratic relationship?

Question 5: Using a pivot table, calculate the percent of customers earning more than \$50,000 annually by education level? Include the pivot table and pivot chart outputs and refer to them as you address whether or not the marketing department’s belief that customers with more education have higher incomes than those with less education. Present your findings in the form of a 250-word executive summary that includes relevant data, charts, and tables.