Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (2024)

comments

By Jason Wittenauer, Lead Data Scientist at Huron Consulting Group

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (1)

In this tutorial you will learn how to prepare geographical data for time series predictions.


When reviewing geographical data, it can be difficult to prepare the data for an analysis. There are specific algorithms that can be used, but they can be limited in what they do. If we spend some time splitting up the data into grids and then adding a time point for each grid point, it opens up more possibilities for modeling algorithms and other features that can be added. However, it does create an issue with the size of the data set.

A five year crime data set can easily consist of 250,000 records. Once that is extrapolated into a time series grid of an entire city, it can easily hit 75 million data points. When dealing with data of this size, it is helpful to use a database to cleanse the data before sending it to a modeling script. The steps we will follow are listed below:

  • Importing data into a SQL Server database.
  • Cleansing and grouping data into a map grid.
  • Adding time data points to the set of grid data and filling in the gaps where no crimes occurred.
  • Importing the data into R
  • Running an XGBoost model to determine where crimes will occur on a specific day

At the end, we will discuss the next steps for making the predictions more usable to end users in BI tools like Tableau.

Prerequisites


Before beginning this tutorial, you will need:

  • SQL Server Express installed
  • SQL Management Studio or similar IDE to interface with SQL Server
  • R installed
  • R Studio, Jupyter notebook, or other IDE to interface with R
  • A general working knowledge of SQL and R

Data Flow Overview


We will be keeping the database and data flow in a simple structure for now. The database itself will be very flat and the final data handoff into the Tableau dashboard will be executed via a text file. In a more production ready version, we would be saving the predictions back into the database and pulling from that database into the reporting dashboard. For this example, we are not triyng to architect everything perfectly, but just understand the basics of doing geographical time series predictions.

Our data flow is below:

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (2)

Setup the Database


Our prediction model will be using crime data for the Baltimore area from 2012 to 2017. This is located in the "Data" folder within this repo and the filename is "Baltimore Incident Data.zip". Before importing this data you will need to follow one of the options below to setup the database:

Option 1

  • Restore a SQL database using the backup file located in the "Database Objects\Clean Backup\" folder.

Option 2

  • Use the scripts located in the "Database Objects" folder under "Tables" and "Procedures" to manually create all of the objects.

Import the Data


Once the database has been successfully created, you can now import the data. This will require you to do the following:

  • Unzip the "Baltimore Incident Data.zip" file found in the "Data" folder.
  • Run the "Insert_StagingCrime" procedure and make sure it is pointed to the correct import file and to the correct format file (found in the "Data" folder named "FormatFile.fmt").
EXEC Insert_StagingCrime

This procedure will be truncating the Staging_Crime and inserting data from the file directly into it using a BULK INSERT. The staging table itself has all VARCHAR(MAX) data types which we will convert into better data types in the next phase of the import process. A code snippet of the procedure is below.

BULK INSERT Staging_Crime FROM 'C:\Projects\Crime Prediction\Data\Baltimore Incident Data.csv'WITH (FIRSTROW = 2, FORMATFILE = 'C:\Projects\Crime Prediction\Data\FormatFile.fmt')

Review the Data.


Now that the data has been imported into the staging table, you can view it by running the below code in SQL Management Studio:

SELECT TOP 10 *FROM [dbo].[Staging_Crime]

Giving you the following results.

CrimeDateCrimeTimeCrimeCodeAddressDescriptionInsideOutsideWeaponPostDistrictNeighborhoodLocationPremiseTotalIncidents
02/28/201723:50:006D2400 KEYWORTH AVELARCENY FROM AUTOONULL533NORTHERNGreenspring(39.3348700000, -76.6590200000)STREET1
02/28/201723:36:004D200 DIENER PLAGG. ASSAULTIHANDS843SOUTHWESTERNIrvington(39.2830300000, -76.6878200000)APT/CONDO1
02/28/201723:02:004E1800 N MOUNT STCOMMON ASSAULTIHANDS742WESTERNSandtown-Winchester(39.3092400000, -76.6449800000)ROW/TOWNHO1
02/28/201723:00:006D200 S CLINTON STLARCENY FROM AUTOONULL231SOUTHEASTERNHighlandtown(39.2894600000, -76.5701900000)STREET1
02/28/201722:00:006E1300 TOWSON STLARCENYONULL943SOUTHERNLocust Point(39.2707100000, -76.5911800000)STREET1
02/28/201721:40:006J1000 WILMOT CTLARCENYONULL312EASTERNOldtown(39.2993600000, -76.6034100000)STREET1
02/28/201721:40:006J2400 PENNSYLVANIA AVELARCENYONULL733WESTERNPenn North(39.3094200000, -76.6417700000)STREET1
02/28/201721:30:005D1500 STACK STBURGLARYINULL943SOUTHERNRiverside(39.2721500000, -76.6033600000)OTHER/RESI1
02/28/201721:30:006D2100 KOKO LNLARCENY FROM AUTOONULL731WESTERNPanway/Braddish Avenue(39.3117800000, -76.6633200000)STREET1
02/28/201721:10:003CF800 W LEXINGTON STROBBERY - COMMERCIALOFIREARM712WESTERNPoppleton(39.2910500000, -76.6310600000)STREET1

Notice how this data has the point in time the crime occurred, the latitude/longitude, and even crime categories. Those categories can be very useful for more advanced modeling techniques and analytics.

Cleanse the Data and Create Grids of the Map


Next we can move our data into a "Crime" table that has the correct data types associated with it. During this step we will also split out the location field into longitude and latitude fields. All of the logic to complete these steps can be done by running the below execution statement:

EXEC Insert_Crime

And now we will have data populating the "Crime" table. This will allow us to complete the next step to create a grid on the city and assign each crime to one of the grid squares. You will notice that we create two grids (small and large). This allows us to create features that are at the crime location and a little bit further away from the crime location. Essentially giving us hotspots of crime throughout the city.

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (3)

Run the below code to create the grids and assign a SmallGridID and LargeGridID to the "Crimes" table:

EXEC Update_CrimeCoordinates

This procedure is doing three separate tasks:

  1. Creating a small grid of squares on the map within the GridSmall table (Procedure: Insert_GridSmall).
  2. Creating a large grid of squares on the map within the GridLarge table (Procedure: Insert_GridLarge).
  3. Assigning all the crime records to a small and large square on the map.

The two procedures that create the grid squares are taking in variables to determine the corners of the map and how many squares we want to have on our map. This is defaulting to a small grid that is 200 by 200 and a large grid that has squares twice as big at 100 by 100.

You can view some of the grid data by running the following:

SELECT TOP 10 c.CrimeId, gs.*FROM [dbo].[Crime] cJOIN GridSmall gs ON gs.GridSmallId = c.GridSmallId
CrimeIdGridSmallIdBotLeftLatitudeTopRightLatitudeBotLeftLongitudeTopRightLongitude
13101239.334128200000139.3349965000001-76.6594308000001-76.6585152000001
21912139.282898539.2837668-76.68873-76.6878144
32519839.308947500000139.3098158000001-76.6456968000001-76.6447812000001
42065739.288976639.2898449-76.5706176000001-76.5697020000001
51621239.26987439.2707423-76.5916764000001-76.5907608000001
62283239.298527900000139.2993962000001-76.6035792000001-76.6026636000001
72520239.308947500000139.3098158000001-76.6420344000001-76.6411188000001
81660139.271610639.2724789-76.6035792000001-76.6026636000001
92578139.311552400000139.3124207000001-76.6640088-76.6630932
102099239.290713239.2915815-76.6319628000001-76.6310472000001

Notice how there are two points being calculated for a square, the top right and bottom left points. We don't have to calculate all four points on the square even though there is technically some curve to a real map when draw longitude and latitude lines. When the scale gets small enough, we can just assume that the squares are mostly straight lines

Create Crime Grid and Lag Features


The last step we need to complete is creating the entire grid of the map for all time periods we want to evaulate. In our case, that is one grid point per day to determine if a crime is going to occur. To complete this step, the below procedure needs to be executed:

EXEC Insert_CrimeGrid

During this step each crime will be grouped together on the map squares so we can determine when and how many crimes occurred on each date in our data set. We will also need to calculate all the filler dates for each square when no crime occurred to make a complete data set.

The procedure take quite awhile to run. On my laptop, it ran for about 1 hour and the subsequent table ("CrimeGrid") contains about 75 million records. The nice part is that the output is now saved into a table, so we will not have to run it in our R script where large data operations might not run as efficiently as inside a database.

Also during this step we will be creating "lag features". These will be columns that tell us how many times within the last day, two days, week, month, etc. that a crime occured on the grid square. This is essentially helping us do a "hotspot" analysis of the data, which can be used to look at other grid squares nearby to see if the crime is localized to our single square or if it is clustered to all nearby squares, similar to aftershocks in an earthquake. These features may or may not be necessary depending on the type of modeling you do.

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (4)

Prediction Setup


With all of the data cleansing and feature engineering being done on the database side, the code for doing predictions is quite simple. In our example we will be using XGBoost in R to analyze five years of training data to predict future crimes.

To start, we load our libraries.

# Load required librarieslibrary(RODBC)library(xgboost)library(ROCR)library(caret)
Loading required package: gplotsAttaching package: 'gplots'The following object is masked from 'package:stats': lowessLoading required package: latticeLoading required package: ggplot2Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang

Then import the data into R directly from the database. You could also export the data to text files and read in CSV data if that is your preferred method. The queries to pull the data are just written as SQL statements with some date limiters. These could be re-written to pull directly from reporting stored procedures with date parameters for a more productionized version of the code.

# Set seedset.seed(1001)# Read in datadbhandle <- odbcDriverConnect('driver={SQL Server};server=DESKTOP-VLN71V7\\SQLEXPRESS;database=crime;trusted_connection=true')train <- sqlQuery(dbhandle, 'select IncidentOccurred as target, GridSmallId, GridLargeId, DayOfWeek, MonthOfYear, DayOfYear, Year, PriorIncident1Day, PriorIncident2Days, PriorIncident3Days, PriorIncident7Days, PriorIncident14Days, PriorIncident30Days, PriorIncident1Day_Large, PriorIncident2Days_Large, PriorIncident3Days_Large, PriorIncident7Days_Large, PriorIncident14Days_Large, PriorIncident30Days_Large from crimegrid where crimedate <= \'2/20/2017\' and crimedate >= \'6/1/2012\'')test <- sqlQuery(dbhandle, 'select IncidentOccurred as target, GridSmallId, GridLargeId, DayOfWeek, MonthOfYear, DayOfYear, Year, PriorIncident1Day, PriorIncident2Days, PriorIncident3Days, PriorIncident7Days, PriorIncident14Days, PriorIncident30Days, PriorIncident1Day_Large, PriorIncident2Days_Large, PriorIncident3Days_Large, PriorIncident7Days_Large, PriorIncident14Days_Large, PriorIncident30Days_Large from crimegrid where crimedate >= \'2/21/2017\' and crimedate <= \'2/27/2017\'')# Convert integers to numeric for DMatrixtrain[] <- lapply(train, as.numeric)test[] <- lapply(test, as.numeric)head(train)
targetGridSmallIdGridLargeIdDayOfWeekMonthOfYearDayOfYearYearPriorIncident1DayPriorIncident2DaysPriorIncident3DaysPriorIncident7DaysPriorIncident14DaysPriorIncident30DaysPriorIncident1Day_LargePriorIncident2Days_LargePriorIncident3Days_LargePriorIncident7Days_LargePriorIncident14Days_LargePriorIncident30Days_Large
03780990592622013000000000000
03781991592622013000000000000
03782991592622013000000000000
03783992592622013000000000000
03784992592622013000000000000
03785993592622013000000000000

Notice in the test data set we are pull 7 days worth of data to test at once, but still keeping the features coming in as if we knew what happened the prior day. This is just so we can test 7 days at once and see how multiple days worth of predictions look in our reporting at the very end.

Another item to note is that the features labeled "_Large" are for the larger grid vs. the smaller grid.

Create Feature Set and Model Parameters


We are defining the features as all of the columns that come after the first "target" column in the SQL query. These are passed in as part of the labeling on the train and test data sets.

# Get feature names (all but first column which is the target)feature.names <- names(train)[2:ncol(train)]print(feature.names)# Make train and test matricesdtrain <- xgb.DMatrix(data.matrix(train[,feature.names]), label=train$target)dtest <- xgb.DMatrix(data.matrix(test[,feature.names]), label=test$target)
 [1] "GridSmallId" "GridLargeId" [3] "DayOfWeek" "MonthOfYear" [5] "DayOfYear" "Year" [7] "PriorIncident1Day" "PriorIncident2Days" [9] "PriorIncident3Days" "PriorIncident7Days" [11] "PriorIncident14Days" "PriorIncident30Days" [13] "PriorIncident1Day_Large" "PriorIncident2Days_Large" [15] "PriorIncident3Days_Large" "PriorIncident7Days_Large" [17] "PriorIncident14Days_Large" "PriorIncident30Days_Large"

Next, setup the parameters for model training. We are keeping them pretty basic for now. Make sure that the evaluation metric is AUC so we can try to maximize our True Positive Rate. This will prevent police officers from patrolling areas unnecessarily. There could be additional work to cover all predicted areas with clever patrolling, but we are not going to get into it during this tutorial.

# Training parameterswatchlist <- list(eval = dtest, train = dtrain)param <- list( objective = "binary:logistic", booster = "gbtree", eta = 0.01, max_depth = 10, eval_metric = "auc")

Run the Model


Now that everything is setup, we can run the model and see how well it is predicting. Keep in mind that this is using a pretty basic set of features just to demonstrate one way to deal with geographic time series data. While the data sets can get quite large, they are easy to understand and can run fairly quickly when using cloud based services like AWS.

This particular training data set had 70 million rows and took about 30 minutes to complete 67 rounds of evaluation on my laptop.

# Run modelclf <- xgb.train( params = param, data = dtrain, nrounds = 100, verbose = 2, early_stopping_rounds = 10, watchlist = watchlist, maximize = TRUE)
[14:30:32] WARNING: amalgamation/../src/learner.cc:686: Tree method is automatically selected to be 'approx' for faster speed. To use old behavior (exact greedy algorithm on single machine), set tree_method to 'exact'.[1]eval-auc:0.858208train-auc:0.858555 Multiple eval metrics are present. Will use train_auc for early stopping.Will train until train_auc hasn't improved in 10 rounds.[2]eval-auc:0.858208train-auc:0.858555 [3]eval-auc:0.858208train-auc:0.858555 [4]eval-auc:0.858208train-auc:0.858556 [5]eval-auc:0.858208train-auc:0.858556 [6]eval-auc:0.858208train-auc:0.858556 [7]eval-auc:0.858311train-auc:0.858997 [8]eval-auc:0.858311train-auc:0.858997 [9]eval-auc:0.858311train-auc:0.858997 [10]eval-auc:0.858315train-auc:0.859000 [11]eval-auc:0.858436train-auc:0.859110 [12]eval-auc:0.858436train-auc:0.859110 [13]eval-auc:0.858512train-auc:0.859157 [14]eval-auc:0.858493train-auc:0.859157 [15]eval-auc:0.858496train-auc:0.859160 [16]eval-auc:0.858498train-auc:0.859160 [17]eval-auc:0.858498train-auc:0.859160 [18]eval-auc:0.858342train-auc:0.859851 [19]eval-auc:0.858177train-auc:0.859907 [20]eval-auc:0.858228train-auc:0.859971 [21]eval-auc:0.858231train-auc:0.859971 [22]eval-auc:0.858206train-auc:0.860695 [23]eval-auc:0.858207train-auc:0.860695 [24]eval-auc:0.858731train-auc:0.860894 [25]eval-auc:0.858702train-auc:0.860844 [26]eval-auc:0.858607train-auc:0.860844 [27]eval-auc:0.858574train-auc:0.860842 [28]eval-auc:0.858602train-auc:0.860892 [29]eval-auc:0.858576train-auc:0.860843 [30]eval-auc:0.858574train-auc:0.860841 [31]eval-auc:0.858607train-auc:0.860893 [32]eval-auc:0.858578train-auc:0.860843 [33]eval-auc:0.858611train-auc:0.860894 [34]eval-auc:0.858612train-auc:0.860895 [35]eval-auc:0.858614train-auc:0.860898 [36]eval-auc:0.858615train-auc:0.860899 [37]eval-auc:0.858616train-auc:0.860897 [38]eval-auc:0.858573train-auc:0.860870 [39]eval-auc:0.858546train-auc:0.860822 [40]eval-auc:0.858575train-auc:0.860872 [41]eval-auc:0.858622train-auc:0.860898 [42]eval-auc:0.858578train-auc:0.860875 [43]eval-auc:0.858583train-auc:0.860870 [44]eval-auc:0.859223train-auc:0.861768 [45]eval-auc:0.859220train-auc:0.861760 [46]eval-auc:0.859221train-auc:0.861760 [47]eval-auc:0.859099train-auc:0.861719 [48]eval-auc:0.859112train-auc:0.861735 [49]eval-auc:0.859112train-auc:0.861735 [50]eval-auc:0.859094train-auc:0.861734 [51]eval-auc:0.859125train-auc:0.861785 [52]eval-auc:0.859021train-auc:0.861771 [53]eval-auc:0.859028train-auc:0.861784 [54]eval-auc:0.859029train-auc:0.861781 [55]eval-auc:0.859028train-auc:0.861784 [56]eval-auc:0.859035train-auc:0.861788 [57]eval-auc:0.859037train-auc:0.861789 [58]eval-auc:0.859035train-auc:0.861775 [59]eval-auc:0.859035train-auc:0.861774 [60]eval-auc:0.859010train-auc:0.861738 [61]eval-auc:0.859011train-auc:0.861739 [62]eval-auc:0.859039train-auc:0.861778 [63]eval-auc:0.859016train-auc:0.861739 [64]eval-auc:0.859017train-auc:0.861741 [65]eval-auc:0.859018train-auc:0.861746 [66]eval-auc:0.859019train-auc:0.861747 [67]eval-auc:0.859024train-auc:0.861755 Stopping. Best iteration:[57]eval-auc:0.859037train-auc:0.861789

The model found the patterns quickly and didn't get too much improvement over each round. This could be enhanced with better model parameters and more features.

Review Importance Matrix


Time to analyze our features to see which ones are rating highly in the model by looking at the importance matrix. This can help us determine whether the new features that we add are really worth it on a large data set (larger data set = extra processing time per feature added).

# Compute feature importance matriximportance_matrix <- xgb.importance(feature.names, model = clf)# Graph important featuresxgb.plot.importance(importance_matrix[1:10,])

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (5)

It looks like the model it using a combination of small and large grid features to determine if an incident will occur on the current day. It is interesting that the longer term features seem to be more important, indicating a history of criminal activity in the area and/or surrounding areas.

Predict on Test and Check ROC Curve


The last step is to check our predictions against the test data set and see how well they did. We always hope that the ROC curve will spike up really high and really quick, but that is not always the case. In our example, we have a decent score with only basic features included in the model. This definitely shows us that we can do the predictions and more time should be invested to make them better.

# Predict on test datapreds <- predict(clf, dtest)# Graph AUC curvexgb.pred <- prediction(preds, test$target)xgb.perf <- performance(xgb.pred, "tpr", "fpr")plot(xgb.perf, avg="threshold", colorize=TRUE, lwd=1, main="ROC Curve w/ Thresholds", print.cutoffs.at=seq(0, 1, by=0.05), text.adj=c(-0.5, 0.5), text.cex=0.5)grid(col="lightgray")axis(1, at=seq(0, 1, by=0.1))axis(2, at=seq(0, 1, by=0.1))abline(v=c(0.1, 0.3, 0.5, 0.7, 0.9), col="lightgray", lty="dotted")abline(h=c(0.1, 0.3, 0.5, 0.7, 0.9), col="lightgray", lty="dotted")lines(x=c(0, 1), y=c(0, 1), col="black", lty="dotted")

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (6)

Review the Confusion Matrix


We know there is a decent AUC score but let's look at the actual output of what we predicted vs. what actually happened. The easiest way to do this is to review the confusion matrix. We want the top left and bottom right boxes (good predictions) to be big and the others (bad predictions) to be small.

# Set our cutoff thresholdpreds.resp <- ifelse(preds >= 0.5, 1, 0)# Create the confusion matrixconfusionMatrix(as.factor(preds.resp), as.factor(test$target), positive = "1")
Confusion Matrix and Statistics ReferencePrediction 0 1 0 280581 454 1 62 367 Accuracy : 0.9982 95% CI : (0.998, 0.9983) No Information Rate : 0.9971 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 0.5864 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.447016 Specificity : 0.999779 Pos Pred Value : 0.855478 Neg Pred Value : 0.998385 Prevalence : 0.002917 Detection Rate : 0.001304 Detection Prevalence : 0.001524 Balanced Accuracy : 0.723397 'Positive' Class : 1 

Over the course of 7 days, it looks like there were 821 incidents and our model correctly predicted 367 of them. The other key point is that we incorrectly predicted 62 incidents, essentially sending police officers to areas where we thought a crime could occur but did not. At face value, this doesn't seem too bad but we would need to see how this affects response times to criminal activity vs. what the current response times are at right now. The idea being that police officers are in areas close enough to a crime that they can either prevent it with their presence or respond to it very quickly to prevent as much harm as possible.

Prepare Data for Reporting


Next we can add in the longitude and latitude coordinates for the test data set and export it to a CSV file. This will allow us to take a look at the actual predictions on a map in a BI tool like Tableau.

# Read in the grid coordinatesgridsmall <- sqlQuery(dbhandle, 'select * from gridsmall')# Merge the predictions with the test dataresults <- cbind(test, preds)# Merge the grid coordinates with the test dataresults <- merge(results, gridsmall, by="GridSmallId")head(results)# Save to filewrite.csv(results,"Data\\CrimePredictions.csv", row.names = TRUE)
GridSmallIdtargetGridLargeId.xDayOfWeekMonthOfYearDayOfYearYearPriorIncident1DayPriorIncident2DaysPriorIncident3Days...PriorIncident3Days_LargePriorIncident7Days_LargePriorIncident14Days_LargePriorIncident30Days_LargepredsBotLeftLatitudeTopRightLatitudeBotLeftLongitudeTopRightLongitudeGridLargeId.y
10132522017000...00000.282268939.2004139.20128-76.71162-76.71071
10152542017000...00000.282268939.2004139.20128-76.71162-76.71071
10172562017000...00000.282268939.2004139.20128-76.71162-76.71071
10142532017000...00000.282268939.2004139.20128-76.71162-76.71071
10122582017000...00000.282268939.2004139.20128-76.71162-76.71071
10112572017000...00000.282268939.2004139.20128-76.71162-76.71071

Visualizing the Predictions


It is time to see what these predictions actually look like over the course of 7 days. The Tableau dashboard itself can be found here:

While looking at the predictions that were incorrect (red dots), it looks like they are pretty close to predictions that were correct. This would put police officers in the general vicinity of a crime occurring. So maybe the incorrect predictions are not affecting the overall results very much at all. Example screenshots are shown below with the correct predictions (blue) and incorrect predictions (red).

Incorrect Predictions

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (7)

Incorrect and Correct Predictions

Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (8)

Overall it does not seem too bad, but we will need more features and/or more data to capture all those missing predictions. Also, there is probably a lot more we can do to focus on specific types of crimes that are occuring and key in on specific prediction modeling to handle each type.

What Next?


This tutorial got us started with doing geographical time series predictions using crime data. We can see that the predictions are definitely working, but there is more work to be done with creating features. We might want to add in some other features that check a larger area for prior crime occurences. Another useful step would be to change our predictions to run hourly and map squad car patrol routes by time of day. Even if the predictions are not perfect, as long as you are putting a police officer in the general vicinity of a crime and it is better than current patrol methodologies, then they can respond much faster or even prevent the crime from occurring in the first place with just their presence.

Other ideas to think about:

  • Remove crime data in the vicinity of police stations, fire stations, hospitals, etc. since those could be biased against people submitting reports.
  • Add in demographic features related to census information.
  • Map crimes to neighborhoods instead of a square grid to predict against.

Hope you enjoyed the tutorial!


Bio: Jason Wittenauer is a data scientist focused on improving hospital earnings with a background in R, Python, Microsoft SQL Server, Tableau, TIBCO Spotfire, and .NET web programming. His areas of focus in healthcare are business operations, revenue improvement, and expense reduction.

Original. Reposted with permission.

Related:

  • Stock Market Forecasting Using Time Series Analysis
  • What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem
  • Justice Can’t Be Colorblind: How to Fight Bias with Predictive Policing

More On This Topic

  • Multivariate Time-Series Prediction with BQML
  • Handling Missing Values in Time-series with SQL
  • KDnuggets News, June 29: 20 Basic Linux Commands for Data Science…
  • Market Data and News: A Time Series Analysis
  • How to label time series efficiently – and boost your AI
  • Time Series Forecasting with Ploomber, Arima, Python, and Slurm
Introduction to Geographical Time Series Prediction with Crime Data in R, SQL, and Tableau - KDnuggets (2024)

References

Top Articles
Latest Posts
Article information

Author: Otha Schamberger

Last Updated:

Views: 5746

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.