Geospatial queries and indices in MongoDB

Daan Schutte
Picnic Engineering

--

In order to add some exciting new features, at Picnic, we recently became interested in which customers are added to and removed from a particular delivery area as soon as the delivery area is modified. A delivery area is a region on the ground that we have determined to be the optimal area within which to deliver groceries in a certain time slot, to ensure the best possible customer experience.

From time to time, we update the bounds of our delivery areas in response to optimizations according to our distribution algorithms. For example, with new data we may find it is more efficient to send two delivery vehicles to a wider region instead of just one, splitting up a larger delivery area or vice versa. Or we may add more batteries to our vehicles extending their range, which means a certain delivery area may grow bigger, and so on.

Because we want to affect immediate changes when changes occur, we decided to use an event-driven approach in this context. When a delivery area is modified, an event is produced over RabbitMQ which contains both the previous and the updated Polygons for a certain delivery area. This event is consumed by a different service in our backend that computes the delta in customer membership to the modified delivery area.

We decided that the most efficient location to determine which customers belong to a given region would be in the database, rather than in Java land, as our previous implementation would have required. In the previous implementation, geolocation information was simply stored in our MongoDB database on the Customer Document as two separate entries, one field for latitude, and one for longitude.

MongoDB provides geospatial query functionality and, in this context, is useful to us is the $geoWithin operator. This operator selects documents with geospatial data that exist entirely within a specified shape. What this means is that we can pass a geometry of type Polygon or MultiPolygon to a $geoWithin query and all the documents with a geolocation in the specified geometry are returned. A Polygon is simply an array of coordinates describing an enclosed area on a plane (in our case a map). A Polygon may or may not have “holes”, which are areas that should be excluded from the search. A MultiPolygon, as the name suggests, is just an array of Polygons.

An example of a Polygon, here in the center of Amsterdam

To facilitate this query, MongoDB requires that the data be stored in GeoJSON format (or legacy coordinate pairs on a plane). In our case, since the address is represented by a specific point on the ground, a GeoJSON Point was the obvious choice.

"address":{ 
"geolocation":{
"point":{
"type":"Point",
"coordinates":[4.865037,52.3568582]
}
}
}

After migrating our data to the required format (note that the order of the coordinates must be [longitude, latitude]), we created a 2dsphere index on the geolocation data. A 2dsphere index supports queries that calculate geometries on an earth-like sphere and supports all MongoDB geospatial queries. The index can be created on both GeoJSON objects and legacy coordinate pairs, the latter of which simply gets converted into a GeoJSON object by the index.

db.customers.createIndex({ "address.geolocation.point": "2dsphere" }, { background: true, name: "address_geolocation" })

With our data prepared and our index created, we can perform a lookup for all customers that are contained within a given Polygon or MultiPolygon. For example, if we want to find all customers located in the image below (the park being a hole, which is excluded from the lookup), we can simply run the following query:

db.customers.find({ 
'address.geolocation.point.coordinates':{
$geoWithin:{
$geometry:{
type:'Polygon',
coordinates:[
[
[4.904574751853943,52.37165931612484], // A
[4.905551075935364,52.37210146261991], // B
[4.904177784919739,52.37284818887275], // C
[4.903421401977539,52.37229797075253], // D
[4.904574751853943,52.37165931612484] // A
],
[
[4.904354810714721,52.37206216088845], // 1
[4.90465521812439,52.372258669195950], // 2
[4.904150962829589,52.37254360468889], // 3
[4.903829097747803,52.37232744689703], // 4
[4.904354810714721,52.37206216088845] // 1
]
]
}
}
}
})
The same Polygon, with coordinates.

Since we can now efficiently locate all of our customers located in a specified area, we can easily determine the delta in customer delivery area membership when a delivery area changes. We can simply compute the difference in membership of the two areas and thus we know which customers were added and which were removed.

--

--