Statistical Tests

Chi-Squared test - Car accidents on weekdays among boroughs

To test whether the proportions of car accidents in each weekday among boroughs are equal, we perform the Chi-Squared test.

H0: The proportions of car accidents on weekdays among boroughs are equal.

H1: Not all proportions of car accidents on weekdays among boroughs are equal.

week_accidents = 
  accidents1 %>%
  dplyr::select(crash_date, borough) %>%
  mutate(weekdays = weekdays(accidents1$crash_date, abbreviate = T)) %>% 
  filter(!is.na(borough)) %>%
  mutate(weekdays = as.factor(weekdays),
         weekdays = fct_relevel(weekdays, "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))

table(week_accidents$borough, week_accidents$weekdays)

##                
##                  Mon  Tue  Wed  Thu  Fri  Sat  Sun
##   Bronx         1347 1339 1296 1448 1511 1378 1098
##   Brooklyn      2337 2450 2391 2557 2758 2363 2051
##   Manhattan      993 1103 1084 1184 1286  934  769
##   Queens        1995 1939 2014 2011 2222 2046 1790
##   Staten Island  178  221  198  209  250  205  185

chisq.test(table(week_accidents$borough, week_accidents$weekdays))

## 
##  Pearson's Chi-squared test
## 
## data:  table(week_accidents$borough, week_accidents$weekdays)
## X-squared = 73.531, df = 24, p-value = 6.303e-07

x_crit = qchisq(0.95, 24)
x_crit

## [1] 36.41503

Interpretation: At significant level \(\alpha\) = 0.05, \(p-value\) = 6.303e-07 < 0.05, so we reject the null hypothesis and conclude that there is at least one borough’s proportion of car accidents for weekdays different from others.

Chi-square test - Car type’s proportion of accident amounts among boroughs

To test whether the proportions of car accidents in five car types among boroughs are equal, we performed the Chi-square test.

H0: Proportions of accident amounts for five car types are equal among boroughs.
H1: Not all proportions of accident amounts for five car types are not equal among boroughs.

five_common_cartype = 
  accidents1 %>%
  select(borough, vehicle_type_code_1) %>% 
  filter(vehicle_type_code_1 %in%
           c("Sedan",
             "Station Wagon/Sport Utility Vehicle",
             "Taxi",
             "Pick-up Truck",
             "Box Truck")) %>%
  count(vehicle_type_code_1, borough) %>% 
  pivot_wider(
    names_from = "vehicle_type_code_1",
    values_from = "n"
  )  %>% 
  data.matrix() %>% 
  subset(select = -c(borough))

rownames(five_common_cartype) <- c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island", "Others")

five_common_cartype %>% 
  knitr::kable(caption = "Table of Top Five Car Type", caption.pos = "top")

Table of Top Five Car Type
	Box Truck	Pick-up Truck	Sedan	Station Wagon/Sport Utility Vehicle	Taxi
Bronx	158	187	4451	3288	411
Brooklyn	309	417	7865	6329	425
Manhattan	275	213	2828	2279	749
Queens	188	359	6460	5721	250
Staten Island	7	49	847	450	4
Others	480	657	11898	9474	929

chisq.test(five_common_cartype)

## 
##  Pearson's Chi-squared test
## 
## data:  five_common_cartype
## X-squared = 1614.5, df = 20, p-value < 2.2e-16

Interpretation: At significant level \(\alpha\) = 0.05, the result of chi-square shows that \(\chi^2\) > \(\chi_{crit}\), so we reject the null hypothesis and conclude that there is at least one car type’s proportion of accident amounts different from others.

Proportion test - The proportions of car accidents among boroughs

We want to see whether the car accident rates are the same among boroughs, so we conduct a proportion test. We obtained the population of each borough from the most recent census.

H0: The proportions of the car accidents are equal among boroughs.

H1: The proportions of the car accidents are not equal among boroughs.

url = "https://www.citypopulation.de/en/usa/newyorkcity/"
nyc_population_html = read_html(url)

population = nyc_population_html %>% 
  html_elements(".rname .prio2") %>% 
  html_text()

boro = nyc_population_html %>% 
  html_elements(".rname a span") %>% 
  html_text()

nyc_population = tibble(
  borough = boro,
  population = population %>% str_remove_all(",") %>% as.numeric()
) 
  
car_accident = accidents1 %>%
  filter(!is.na(borough)) %>%
  count(borough) %>% 
  mutate(borough = str_to_title(borough))

acci_popu_boro = left_join(car_accident, nyc_population)

acci_popu_boro %>% 
  knitr::kable(caption = "Results Table", caption.pos = "top")

Results Table
borough	n	population
Bronx	9417	1472654
Brooklyn	16907	2736074
Manhattan	7353	1694251
Queens	14017	2405464
Staten Island	1446	495747

prop.test(acci_popu_boro$n, acci_popu_boro$population)

## 
##  5-sample test for equality of proportions without continuity correction
## 
## data:  acci_popu_boro$n out of acci_popu_boro$population
## X-squared = 1482.5, df = 4, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##      prop 1      prop 2      prop 3      prop 4      prop 5 
## 0.006394577 0.006179292 0.004339971 0.005827150 0.002916810

Interpretation: From the test result, we can see that the \(p-value\) is smaller than 0.01, so we have enough evidence to conclude that the proportions of car accidents are different across boroughs.

ANOVA Test - Month and accidents

In order to study how month is associated with the number of car accidents, we try to use an ANOVA test across months.

H0: The average numbers of accidents are equal across months.

H1: The average numbers of accidents are not equal across months.

fit_accidents = 
  accidents1 %>% 
  mutate(month = as.factor(month)) %>% 
  group_by(month, weekday, day) %>% 
  dplyr::summarize(num_accidents = n()) 
fit_accidents_month = lm(num_accidents ~ month, data = fit_accidents)  
anova(fit_accidents_month) %>% 
  knitr::kable(caption = "One way anova of number of accidents and month", caption.pos = "top")

One way anova of number of accidents and month
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
month	7	2916412	416630.245	75.89771	0
Residuals	234	1284511	5489.365	NA	NA

Interpretation: As indicated by the result of the ANOVA test, the \(p-value\) is very small. Therefore, the null hypothesis is rejected and we can conclude that the average numbers of accidents are different across months in New York City in 2020.