Chapter 4 Missing values
4.1 Human Trafficking
Alternatively, we can also view missing data as we can see below:
## DATA_YEAR ORI PUB_AGENCY_NAME
## 0 0 0
## PUB_AGENCY_UNIT AGENCY_TYPE_NAME STATE_ABBR
## 3191 0 0
## STATE_NAME DIVISION_NAME COUNTY_NAME
## 0 0 81
## REGION_NAME POPULATION_GROUP_CODE POPULATION_GROUP_DESC
## 0 0 0
## OFFENSE_SUBCAT_ID OFFENSE_NAME OFFENSE_SUBCAT_NAME
## 0 0 0
## ACTUAL_COUNT UNFOUNDED_COUNT CLEARED_COUNT
## 0 0 0
## JUVENILE_CLEARED_COUNT
## 0
PUB_AGENCY_UNIT and COUNTY_NAME contain most missing values, and since there is no way to impute this information, we will ignore it for our analysis.
It is also useful to identify if this dataset contains information about all fifty states:
length(unique(df_fbi_ht$STATE_NAME))
## [1] 45
As seen above this dataset contains information about only 45 states. We handle this problem when cleaning the dataset.
4.2 NYPD Arrests Data
The main feature of concern here is LAW_CAT_CD which depicts level of offense (felony, misdemeanor, violation), but since there is no way to impute this information of a categorical variable, we will ignore this feature for our analysis.
There are some missing values for feature OFNS_DESC. Since this is an important feature for our arrests analysis, and since the percentage of missing values is very low, we filter out these missing values.
4.3 Shootings in NYC
- dropping the missing perpetrator columns, as they cannot be imputed in anyway. We intend to focus on the victim trend in this analysis. Hence, it is fine to not consider the above columns in the current analysis.
4.4 FBI Drug Arrests Data
For FBI Drug Arrests data, we analyze missing values as following:
Alternatively, we can also view missing data as we can see below:
## id year state_abbr
## 0 0 23
## agencies population total_arrests
## 0 0 0
## total_manufacture opioid_manufacture marijuana_manufacture
## 0 0 0
## synthetic_manufacture other_manufacture total_possess
## 0 0 0
## opioid_possess marijuana_possess synthetic_possess
## 0 0 0
## other_possess
## 0
State_abbr has been found to be completely missing.
4.5 Hate Crimes
For Hate Crimes data, we analyze missing values as following:
We have converted complaint ID from int to character string now.
## Full.Complaint.ID Complaint.Year.Number Month.Number Record.Create.Date
## Length:728 Min. :2019 Min. : 1.000 Length:728
## Class :character 1st Qu.:2019 1st Qu.: 3.000 Class :character
## Mode :character Median :2019 Median : 7.000 Mode :character
## Mean :2019 Mean : 6.434
## 3rd Qu.:2020 3rd Qu.:10.000
## Max. :2020 Max. :12.000
## Complaint.Precinct.Code Patrol.Borough.Name County
## Min. : 1.00 Length:728 Length:728
## 1st Qu.: 25.00 Class :character Class :character
## Median : 69.00 Mode :character Mode :character
## Mean : 62.45
## 3rd Qu.: 90.00
## Max. :123.00
## Law.Code.Category.Description Offense.Description PD.Code.Description
## Length:728 Length:728 Length:728
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Bias.Motive.Description Offense.Category Other.Motive.Description
## Length:728 Length:728 Length:728
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Arrest.Date Arrest.Id
## Length:728 Length:728
## Class :character Class :character
## Mode :character Mode :character
##
##
##
We now explore if there are missing values in this dataset.
## Full.Complaint.ID Complaint.Year.Number
## 0 0
## Month.Number Record.Create.Date
## 0 0
## Complaint.Precinct.Code Patrol.Borough.Name
## 0 0
## County Law.Code.Category.Description
## 0 0
## Offense.Description PD.Code.Description
## 0 0
## Bias.Motive.Description Offense.Category
## 0 0
## Other.Motive.Description Arrest.Date
## 0 0
## Arrest.Id
## 0
Alternate Method:
## Full.Complaint.ID Complaint.Year.Number
## 0 0
## Month.Number Record.Create.Date
## 0 0
## Complaint.Precinct.Code Patrol.Borough.Name
## 0 0
## County Law.Code.Category.Description
## 0 0
## Offense.Description PD.Code.Description
## 0 0
## Bias.Motive.Description Offense.Category
## 0 0
## Other.Motive.Description Arrest.Date
## 0 0
## Arrest.Id
## 0
## NOTE: The following pairs of variables appear to have the same missingness pattern.
## Please verify whether they are in fact logically distinct variables.
## [,1] [,2]
## [1,] "Arrest.Date" "Arrest.Id"
Arrest.Id and Arrest.Date have the same missing pattern. Other.Motive.Description has been found to have more than 700 missing values.
4.6 Park Crime
For Park Crime data, we analyze missing values as following:
## PARK BOROUGH
## 0 0
## SIZE..ACRES. CATEGORY
## 0 0
## MURDER RAPE
## 0 0
## ROBBERY FELONY.ASSAULT
## 0 0
## BURGLARY GRAND.LARCENY
## 0 0
## GRAND.LARCENY.OF.MOTOR.VEHICLE TOTAL
## 0 0
## YEAR QUARTER
## 0 0
None of the variables in this dataset are missing.