Problem Description:
This STATA assignment aims to investigate the relationship between cholesterol levels in the blood and dietary choices, focusing on carbohydrate and fat intake. By analyzing data, we seek to understand if there is a linear correlation between these variables and contribute to better preventative measures against heart disease.
Solution:
Method
Part I: Data Overview
In this section, we provide an overview of the dataset, which contains 15 variables with diverse observations. We introduce a new variable, "Cholesterol level," which categorizes cholesterol levels into low, medium, and high. We also create a variable for "Race" and examine the distribution of other variables. It's important to note that some variables are normally distributed, while others, like "Annual Household Income," are not.
Part II: Hypothesis Testing
We analyze the relationship between dietary carbohydrate intake and cholesterol levels. Using statistical tests, we assess whether a linear relationship exists between them. We also examine the relationship between cholesterol levels and protein and total fat intake. Our findings reveal whether these variables have a linear correlation, and we utilize hypothesis testing to support these conclusions.
Part III: Regression Analysis
In this section, we employ regression models to explore the correlation between total blood cholesterol and carbohydrate and total fat consumption. By calculating R-squared values and p-values, we aim to understand the extent to which these variables influence serum cholesterol levels.
Results
Part I: Data Insights
We provide an in-depth analysis of the dataset, including mean, variance, median, and interquartile range for various variables. This analysis helps us understand the distribution of key factors, including gender, education levels, cholesterol levels, race, and income per household.
Part II: Hypothesis Testing Outcomes
We present the results of our hypothesis testing, which reveal the presence or absence of linear relationships between cholesterol levels and dietary intake variables (carbohydrates, protein, and total fat). Our findings provide valuable insights into these connections.
Part III: Regression Analysis Findings
We share the outcomes of the regression analysis, specifically focusing on the relationship between total serum cholesterol and carbohydrate and total fat intake. These results help us understand the extent to which these dietary factors influence serum cholesterol levels.
Discussion
In the discussion section, we consider the potential inclusion of additional variables in a multivariate model to better understand the relationships between dependent and independent variables. We also emphasize the significance of education levels and explore other variables, such as race and income, that could be incorporated into the analysis to gain a more comprehensive understanding of the factors affecting blood cholesterol levels.
STATA Code
use C:\Users\User\OneDrive\Desktop\Stata\cholesterol_Fall2022.dta
recode dr2tchol 0/200 = 0 260/max = 1, generate(chol)
recode ridreth1 (3=1 "Mexican America") (2 . =9 "Other Hispanic") ( 1/5 = 4 "Non-Hispanic White") (nonmiss = 8 "Non-Hispanic Black") (miss= 9 "Others"), generate(Race)
** Histogram
histogram riagendr, width(1)
histogram Race, width(3)
histogram dmdeduc2, width(1)
histogram indhhin2, width(10)
histogram chol, width(1)
graph box seqn, over(riagendr)
graph box seqn, over(Race)
graph box seqn, over(dmdeduc2)
graph box seqn, over(indhhin2)
graph box seqn, over(chol)
** mean and standard deviation for those normally distributed
histogram chol, width(1)
tabstat riagendr, statistics( mean sd ) columns(variables)
tabstat ridreth1, statistics( mean sd ) columns(variables)
tabstat dmdeduc2, statistics( mean sd ) columns(variables)
tabstat chol, statistics( mean sd ) columns(variables)
tabstat indhhin2, statistics( mean sd ) columns(variables)
tabstat Race, statistics( mean sd ) columns(variables)
** Non normally distributed
sum indhhin2, detail
** Frequencies
tab dmdeduc2
tab indhhin2
tab Race
tab chol
tab riagendr
** Chi-square test
tabulate riagendr chol, chi2
tabulate dmdeduc2 chol, chi2
tabulate indhhin2 chol, chi2
tabulate Race chol, chi2
**** Examine the distribution
histogram dr2tchol, width(70)
tab cholestralev, su( dr2tcarb)
tab cholestralev, su( dr2tprot)
tab cholestralev, su( dr2ttfat)
** Two sample mean comparison
ttest dr2tchol == dr2tcarb, unpaired
ttest dr2tchol == dr2tprot, unpaired
ttest dr2tchol == dr2ttfat , unpaired
** Linear model
regress dr2tchol dr2tcarb
regress dr2tchol dr2ttfat dr2tcarb