To test whether the proportions of car accidents in each weekday among boroughs are equal, we perform the Chi-Squared test.
H0: The proportions of car accidents on weekdays among boroughs are equal.
H1: Not all proportions of car accidents on weekdays among boroughs are equal.
week_accidents =
accidents1 %>%
dplyr::select(crash_date, borough) %>%
mutate(weekdays = weekdays(accidents1$crash_date, abbreviate = T)) %>%
filter(!is.na(borough)) %>%
mutate(weekdays = as.factor(weekdays),
weekdays = fct_relevel(weekdays, "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))
table(week_accidents$borough, week_accidents$weekdays)
##
## Mon Tue Wed Thu Fri Sat Sun
## Bronx 1347 1339 1296 1448 1511 1378 1098
## Brooklyn 2337 2450 2391 2557 2758 2363 2051
## Manhattan 993 1103 1084 1184 1286 934 769
## Queens 1995 1939 2014 2011 2222 2046 1790
## Staten Island 178 221 198 209 250 205 185
chisq.test(table(week_accidents$borough, week_accidents$weekdays))
##
## Pearson's Chi-squared test
##
## data: table(week_accidents$borough, week_accidents$weekdays)
## X-squared = 73.531, df = 24, p-value = 6.303e-07
x_crit = qchisq(0.95, 24)
x_crit
## [1] 36.41503
Interpretation: At significant level \(\alpha\) = 0.05, \(p-value\) = 6.303e-07 < 0.05, so we reject the null hypothesis and conclude that there is at least one borough’s proportion of car accidents for weekdays different from others.
To test whether the proportions of car accidents in five car types among boroughs are equal, we performed the Chi-square test.
H0: Proportions of accident amounts for
five car types are equal among boroughs.
H1: Not all proportions of accident amounts
for five car types are not equal among boroughs.
five_common_cartype =
accidents1 %>%
select(borough, vehicle_type_code_1) %>%
filter(vehicle_type_code_1 %in%
c("Sedan",
"Station Wagon/Sport Utility Vehicle",
"Taxi",
"Pick-up Truck",
"Box Truck")) %>%
count(vehicle_type_code_1, borough) %>%
pivot_wider(
names_from = "vehicle_type_code_1",
values_from = "n"
) %>%
data.matrix() %>%
subset(select = -c(borough))
rownames(five_common_cartype) <- c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island", "Others")
five_common_cartype %>%
knitr::kable(caption = "Table of Top Five Car Type", caption.pos = "top")
| Box Truck | Pick-up Truck | Sedan | Station Wagon/Sport Utility Vehicle | Taxi | |
|---|---|---|---|---|---|
| Bronx | 158 | 187 | 4451 | 3288 | 411 |
| Brooklyn | 309 | 417 | 7865 | 6329 | 425 |
| Manhattan | 275 | 213 | 2828 | 2279 | 749 |
| Queens | 188 | 359 | 6460 | 5721 | 250 |
| Staten Island | 7 | 49 | 847 | 450 | 4 |
| Others | 480 | 657 | 11898 | 9474 | 929 |
chisq.test(five_common_cartype)
##
## Pearson's Chi-squared test
##
## data: five_common_cartype
## X-squared = 1614.5, df = 20, p-value < 2.2e-16
Interpretation: At significant level \(\alpha\) = 0.05, the result of chi-square shows that \(\chi^2\) > \(\chi_{crit}\), so we reject the null hypothesis and conclude that there is at least one car type’s proportion of accident amounts different from others.
We want to see whether the car accident rates are the same among boroughs, so we conduct a proportion test. We obtained the population of each borough from the most recent census.
H0: The proportions of the car accidents are equal among boroughs.
H1: The proportions of the car accidents are not equal among boroughs.
url = "https://www.citypopulation.de/en/usa/newyorkcity/"
nyc_population_html = read_html(url)
population = nyc_population_html %>%
html_elements(".rname .prio2") %>%
html_text()
boro = nyc_population_html %>%
html_elements(".rname a span") %>%
html_text()
nyc_population = tibble(
borough = boro,
population = population %>% str_remove_all(",") %>% as.numeric()
)
car_accident = accidents1 %>%
filter(!is.na(borough)) %>%
count(borough) %>%
mutate(borough = str_to_title(borough))
acci_popu_boro = left_join(car_accident, nyc_population)
acci_popu_boro %>%
knitr::kable(caption = "Results Table", caption.pos = "top")
| borough | n | population |
|---|---|---|
| Bronx | 9417 | 1472654 |
| Brooklyn | 16907 | 2736074 |
| Manhattan | 7353 | 1694251 |
| Queens | 14017 | 2405464 |
| Staten Island | 1446 | 495747 |
prop.test(acci_popu_boro$n, acci_popu_boro$population)
##
## 5-sample test for equality of proportions without continuity correction
##
## data: acci_popu_boro$n out of acci_popu_boro$population
## X-squared = 1482.5, df = 4, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3 prop 4 prop 5
## 0.006394577 0.006179292 0.004339971 0.005827150 0.002916810
Interpretation: From the test result, we can see that the \(p-value\) is smaller than 0.01, so we have enough evidence to conclude that the proportions of car accidents are different across boroughs.
In order to study how month is associated with the number of car accidents, we try to use an ANOVA test across months.
H0: The average numbers of accidents are equal across months.
H1: The average numbers of accidents are not equal across months.
fit_accidents =
accidents1 %>%
mutate(month = as.factor(month)) %>%
group_by(month, weekday, day) %>%
dplyr::summarize(num_accidents = n())
fit_accidents_month = lm(num_accidents ~ month, data = fit_accidents)
anova(fit_accidents_month) %>%
knitr::kable(caption = "One way anova of number of accidents and month", caption.pos = "top")
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| month | 7 | 2916412 | 416630.245 | 75.89771 | 0 |
| Residuals | 234 | 1284511 | 5489.365 | NA | NA |
Interpretation: As indicated by the result of the ANOVA test, the \(p-value\) is very small. Therefore, the null hypothesis is rejected and we can conclude that the average numbers of accidents are different across months in New York City in 2020.