Chapter 4 Missing values

4.1 Human Trafficking

Alternatively, we can also view missing data as we can see below:

##              DATA_YEAR                    ORI        PUB_AGENCY_NAME 
##                      0                      0                      0 
##        PUB_AGENCY_UNIT       AGENCY_TYPE_NAME             STATE_ABBR 
##                   3191                      0                      0 
##             STATE_NAME          DIVISION_NAME            COUNTY_NAME 
##                      0                      0                     81 
##            REGION_NAME  POPULATION_GROUP_CODE  POPULATION_GROUP_DESC 
##                      0                      0                      0 
##      OFFENSE_SUBCAT_ID           OFFENSE_NAME    OFFENSE_SUBCAT_NAME 
##                      0                      0                      0 
##           ACTUAL_COUNT        UNFOUNDED_COUNT          CLEARED_COUNT 
##                      0                      0                      0 
## JUVENILE_CLEARED_COUNT 
##                      0

PUB_AGENCY_UNIT and COUNTY_NAME contain most missing values, and since there is no way to impute this information, we will ignore it for our analysis.

It is also useful to identify if this dataset contains information about all fifty states:

length(unique(df_fbi_ht$STATE_NAME))
## [1] 45

As seen above this dataset contains information about only 45 states. We handle this problem when cleaning the dataset.

4.2 NYPD Arrests Data

The main feature of concern here is LAW_CAT_CD which depicts level of offense (felony, misdemeanor, violation), but since there is no way to impute this information of a categorical variable, we will ignore this feature for our analysis.

There are some missing values for feature OFNS_DESC. Since this is an important feature for our arrests analysis, and since the percentage of missing values is very low, we filter out these missing values.

4.3 Shootings in NYC

  • dropping the missing perpetrator columns, as they cannot be imputed in anyway. We intend to focus on the victim trend in this analysis. Hence, it is fine to not consider the above columns in the current analysis.

4.4 FBI Drug Arrests Data

For FBI Drug Arrests data, we analyze missing values as following:

Alternatively, we can also view missing data as we can see below:

##                    id                  year            state_abbr 
##                     0                     0                    23 
##              agencies            population         total_arrests 
##                     0                     0                     0 
##     total_manufacture    opioid_manufacture marijuana_manufacture 
##                     0                     0                     0 
## synthetic_manufacture     other_manufacture         total_possess 
##                     0                     0                     0 
##        opioid_possess     marijuana_possess     synthetic_possess 
##                     0                     0                     0 
##         other_possess 
##                     0

State_abbr has been found to be completely missing.

4.5 Hate Crimes

For Hate Crimes data, we analyze missing values as following:

We have converted complaint ID from int to character string now.

##  Full.Complaint.ID  Complaint.Year.Number  Month.Number    Record.Create.Date
##  Length:728         Min.   :2019          Min.   : 1.000   Length:728        
##  Class :character   1st Qu.:2019          1st Qu.: 3.000   Class :character  
##  Mode  :character   Median :2019          Median : 7.000   Mode  :character  
##                     Mean   :2019          Mean   : 6.434                     
##                     3rd Qu.:2020          3rd Qu.:10.000                     
##                     Max.   :2020          Max.   :12.000                     
##  Complaint.Precinct.Code Patrol.Borough.Name    County         
##  Min.   :  1.00          Length:728          Length:728        
##  1st Qu.: 25.00          Class :character    Class :character  
##  Median : 69.00          Mode  :character    Mode  :character  
##  Mean   : 62.45                                                
##  3rd Qu.: 90.00                                                
##  Max.   :123.00                                                
##  Law.Code.Category.Description Offense.Description PD.Code.Description
##  Length:728                    Length:728          Length:728         
##  Class :character              Class :character    Class :character   
##  Mode  :character              Mode  :character    Mode  :character   
##                                                                       
##                                                                       
##                                                                       
##  Bias.Motive.Description Offense.Category   Other.Motive.Description
##  Length:728              Length:728         Length:728              
##  Class :character        Class :character   Class :character        
##  Mode  :character        Mode  :character   Mode  :character        
##                                                                     
##                                                                     
##                                                                     
##  Arrest.Date         Arrest.Id        
##  Length:728         Length:728        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

We now explore if there are missing values in this dataset.

##             Full.Complaint.ID         Complaint.Year.Number 
##                             0                             0 
##                  Month.Number            Record.Create.Date 
##                             0                             0 
##       Complaint.Precinct.Code           Patrol.Borough.Name 
##                             0                             0 
##                        County Law.Code.Category.Description 
##                             0                             0 
##           Offense.Description           PD.Code.Description 
##                             0                             0 
##       Bias.Motive.Description              Offense.Category 
##                             0                             0 
##      Other.Motive.Description                   Arrest.Date 
##                             0                             0 
##                     Arrest.Id 
##                             0

Alternate Method:

##             Full.Complaint.ID         Complaint.Year.Number 
##                             0                             0 
##                  Month.Number            Record.Create.Date 
##                             0                             0 
##       Complaint.Precinct.Code           Patrol.Borough.Name 
##                             0                             0 
##                        County Law.Code.Category.Description 
##                             0                             0 
##           Offense.Description           PD.Code.Description 
##                             0                             0 
##       Bias.Motive.Description              Offense.Category 
##                             0                             0 
##      Other.Motive.Description                   Arrest.Date 
##                             0                             0 
##                     Arrest.Id 
##                             0
## NOTE: The following pairs of variables appear to have the same missingness pattern.
##  Please verify whether they are in fact logically distinct variables.
##      [,1]          [,2]       
## [1,] "Arrest.Date" "Arrest.Id"

Arrest.Id and Arrest.Date have the same missing pattern. Other.Motive.Description has been found to have more than 700 missing values.

4.6 Park Crime

For Park Crime data, we analyze missing values as following:

##                           PARK                        BOROUGH 
##                              0                              0 
##                   SIZE..ACRES.                       CATEGORY 
##                              0                              0 
##                         MURDER                           RAPE 
##                              0                              0 
##                        ROBBERY                 FELONY.ASSAULT 
##                              0                              0 
##                       BURGLARY                  GRAND.LARCENY 
##                              0                              0 
## GRAND.LARCENY.OF.MOTOR.VEHICLE                          TOTAL 
##                              0                              0 
##                           YEAR                        QUARTER 
##                              0                              0

None of the variables in this dataset are missing.