Chi-Squared test - Car accidents on weekdays among boroughs

To test whether the proportions of car accidents in each weekday among boroughs are equal, we perform the Chi-Squared test.

H0: The proportions of car accidents on weekdays among boroughs are equal.

H1: Not all proportions of car accidents on weekdays among boroughs are equal.

week_accidents = 
  accidents1 %>%
  dplyr::select(crash_date, borough) %>%
  mutate(weekdays = weekdays(accidents1$crash_date, abbreviate = T)) %>% 
  filter(!is.na(borough)) %>%
  mutate(weekdays = as.factor(weekdays),
         weekdays = fct_relevel(weekdays, "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))

table(week_accidents$borough, week_accidents$weekdays)
##                
##                  Mon  Tue  Wed  Thu  Fri  Sat  Sun
##   Bronx         1347 1339 1296 1448 1511 1378 1098
##   Brooklyn      2337 2450 2391 2557 2758 2363 2051
##   Manhattan      993 1103 1084 1184 1286  934  769
##   Queens        1995 1939 2014 2011 2222 2046 1790
##   Staten Island  178  221  198  209  250  205  185
chisq.test(table(week_accidents$borough, week_accidents$weekdays))
## 
##  Pearson's Chi-squared test
## 
## data:  table(week_accidents$borough, week_accidents$weekdays)
## X-squared = 73.531, df = 24, p-value = 6.303e-07
x_crit = qchisq(0.95, 24)
x_crit
## [1] 36.41503

Interpretation: At significant level \(\alpha\) = 0.05, \(p-value\) = 6.303e-07 < 0.05, so we reject the null hypothesis and conclude that there is at least one borough’s proportion of car accidents for weekdays different from others.

Chi-square test - Car type’s proportion of accident amounts among boroughs

To test whether the proportions of car accidents in five car types among boroughs are equal, we performed the Chi-square test.

H0: Proportions of accident amounts for five car types are equal among boroughs.
H1: Not all proportions of accident amounts for five car types are not equal among boroughs.

five_common_cartype = 
  accidents1 %>%
  select(borough, vehicle_type_code_1) %>% 
  filter(vehicle_type_code_1 %in%
           c("Sedan",
             "Station Wagon/Sport Utility Vehicle",
             "Taxi",
             "Pick-up Truck",
             "Box Truck")) %>%
  count(vehicle_type_code_1, borough) %>% 
  pivot_wider(
    names_from = "vehicle_type_code_1",
    values_from = "n"
  )  %>% 
  data.matrix() %>% 
  subset(select = -c(borough))

rownames(five_common_cartype) <- c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island", "Others")

five_common_cartype %>% 
  knitr::kable(caption = "Table of Top Five Car Type", caption.pos = "top")
Table of Top Five Car Type
Box Truck Pick-up Truck Sedan Station Wagon/Sport Utility Vehicle Taxi
Bronx 158 187 4451 3288 411
Brooklyn 309 417 7865 6329 425
Manhattan 275 213 2828 2279 749
Queens 188 359 6460 5721 250
Staten Island 7 49 847 450 4
Others 480 657 11898 9474 929
chisq.test(five_common_cartype)
## 
##  Pearson's Chi-squared test
## 
## data:  five_common_cartype
## X-squared = 1614.5, df = 20, p-value < 2.2e-16

Interpretation: At significant level \(\alpha\) = 0.05, the result of chi-square shows that \(\chi^2\) > \(\chi_{crit}\), so we reject the null hypothesis and conclude that there is at least one car type’s proportion of accident amounts different from others.

Proportion test - The proportions of car accidents among boroughs

We want to see whether the car accident rates are the same among boroughs, so we conduct a proportion test. We obtained the population of each borough from the most recent census.

H0: The proportions of the car accidents are equal among boroughs.

H1: The proportions of the car accidents are not equal among boroughs.

url = "https://www.citypopulation.de/en/usa/newyorkcity/"
nyc_population_html = read_html(url)

population = nyc_population_html %>% 
  html_elements(".rname .prio2") %>% 
  html_text()

boro = nyc_population_html %>% 
  html_elements(".rname a span") %>% 
  html_text()

nyc_population = tibble(
  borough = boro,
  population = population %>% str_remove_all(",") %>% as.numeric()
) 
  
car_accident = accidents1 %>%
  filter(!is.na(borough)) %>%
  count(borough) %>% 
  mutate(borough = str_to_title(borough))

acci_popu_boro = left_join(car_accident, nyc_population)

acci_popu_boro %>% 
  knitr::kable(caption = "Results Table", caption.pos = "top")
Results Table
borough n population
Bronx 9417 1472654
Brooklyn 16907 2736074
Manhattan 7353 1694251
Queens 14017 2405464
Staten Island 1446 495747
prop.test(acci_popu_boro$n, acci_popu_boro$population)
## 
##  5-sample test for equality of proportions without continuity correction
## 
## data:  acci_popu_boro$n out of acci_popu_boro$population
## X-squared = 1482.5, df = 4, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##      prop 1      prop 2      prop 3      prop 4      prop 5 
## 0.006394577 0.006179292 0.004339971 0.005827150 0.002916810

Interpretation: From the test result, we can see that the \(p-value\) is smaller than 0.01, so we have enough evidence to conclude that the proportions of car accidents are different across boroughs.

ANOVA Test - Month and accidents

In order to study how month is associated with the number of car accidents, we try to use an ANOVA test across months.

H0: The average numbers of accidents are equal across months.

H1: The average numbers of accidents are not equal across months.

fit_accidents = 
  accidents1 %>% 
  mutate(month = as.factor(month)) %>% 
  group_by(month, weekday, day) %>% 
  dplyr::summarize(num_accidents = n()) 
fit_accidents_month = lm(num_accidents ~ month, data = fit_accidents)  
anova(fit_accidents_month) %>% 
  knitr::kable(caption = "One way anova of number of accidents and month", caption.pos = "top")
One way anova of number of accidents and month
Df Sum Sq Mean Sq F value Pr(>F)
month 7 2916412 416630.245 75.89771 0
Residuals 234 1284511 5489.365 NA NA

Interpretation: As indicated by the result of the ANOVA test, the \(p-value\) is very small. Therefore, the null hypothesis is rejected and we can conclude that the average numbers of accidents are different across months in New York City in 2020.