Entity Framework Spatial Part 1: Loading data
SQL Server has had spatial data types since 2008. Spatial data types are great if you need to manage data that involves geographic points, distances, areas and other stuff that might go on a map. With the recent release of Entity Framework 5, .NET developers now also have a convinient way to manipulate spatial data in SQL Server without writing raw SQL.
We're writing a few blog posts to show how to take advantage of this new feature and to demonstrate some of the use cases spatial support in Entity Framework 5 opens up. This is the first one, and in this post, we're covering how to load data from shape files.
Shape files are an ancient format for exchanging geographic data. It's still much used though, and if you can read Shape files, a lot geographic data is yours for the taking.
Note that Scott Hanselman already has a great blog post on how to get started with storing points with ASP.NET MVC and Entity Framework – check it out if you want a quick primer.
Initial setup
This sample is based on the Entity Framework Code First automatic migration setup that we wrote about some months ago. We'll be using Guid
's for id's, which requires makes the abstract
base entity class look like so:
public abstract class Entity
{
[Key, DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public Guid Id { get; set; }
}
The concrete entity class looks like this:
public class Feature : Entity
{
public string Name { get; set; }
public DbGeography Geography { get; set; }
}
Entity Framework takes care of the mapping the DbGeography
attribute to the correct SQL Server column type.
There's no spatial support in SQL Server Compact Edition, so you can't use the embedded approach for local development mentioned in the automatic migrations post. Instead, you'll have to connect to a local SQL Server Express instance with spatial support enabled when you're debugging.
Example Data
In this example, we'll be using data from the Danish mapping authority. They publish a great deal of data, including boundaries of the various administrative regions in Denmark. In particular, we'll load the shapes of the 98 Danish municipalities (and 1 special administrative region). This will let us figure out what municipal jurisdiction any location is under and display the boundary of that municipality on a map.
Another part of the Danish government already runs such a service (and the code is even open source), which makes this a somewhat academic, but still instructive, exercise.
Reading Shape Files
With a suitable data set and the data model in place, we can start importing data. We'll be using NetTopologySuite to parse the actual shape files. That gives us an IDataReader
interface to read the shapes found in a file. In this dataset, each municipality may be represented by multiple records. In particular, a municipality made up of multiple polygons will have a record for each polygon. For that reason, we need to be able to combine multiple polygons representing one municipality into one multipolygon:
public class ShapeFileHelper
{
public IEnumerable<KeyValuePair<string, IGeometry>> Read(string filePath,
string nameColumnIdentifier)
{
return from x in ReadShapes(filePath, nameColumnIdentifier)
group x by x.Key into g
select new KeyValuePair<string, IGeometry>(g.Key,
(IGeometry)GeometryFactory.Default.CreateMultiPolygon(g.Select(y =>
(IPolygon)y.Value).ToArray()));
}
private IEnumerable<KeyValuePair<string, IGeometry>> ReadShapes(string filePath,
string nameColumnIdentifier)
{
using (var reader = new ShapefileDataReader(filePath, GeometryFactory.Default))
{
while (reader.Read())
{
yield return new KeyValuePair<string, IGeometry>(
(string)reader[nameColumnIdentifier], reader.Geometry);
}
}
}
}
The data we're using for this sample has some other quirks that must be addressed before it can be passed to Entity Framework and SQL Server. One is the reference point ("datum") and coordinate system used to encode coordinates. A good format to store geographic data in, is WSG 84, which is what GPS devices use. Data from Danish authorities comes in a coordinate system called ETRS89 and uses the UTM 32 datum however. To transform the coordinates to WSG 84, we'll use DotSpatial which comes with NetTopologySuite. Setting up the transform looks like this:
private const string _wgs84wkt = "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.01745329251994328,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]]";
private const string _etrs89utm32wkt = "PROJCS[\"ETRS89 / UTM zone 32N\",GEOGCS[\"ETRS89\",DATUM[\"D_ETRS_1989\",SPHEROID[\"GRS_1980\",6378137,298.257222101]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.017453292519943295]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"Meter\",1]]";
private readonly DotSpatialMathTransform _transform;
public DagiShapeFileReader()
{
var etrs89utmprojection = ProjectionInfo.FromEsriString(_etrs89utm32wkt);
var wgs84projection = ProjectionInfo.FromEsriString(_wgs84wkt);
_transform = new DotSpatialMathTransform(etrs89utmprojection, wgs84projection);
}
Another technicality is how SQL Server treats polygons. Imagine a polygon demarcating an area on a map. Is that supposed to mean that the area is what's inside the polygon or that it's the rest of the world except what's inside the polygon? We'll use that the heuristic that the outcome of these two interpretations with the smallest out area is likely to be correct. So for all shapes read, we invert the shape, check to see if the inverted area is smaller than the original and if so, use the inverted shape. With the transform, that comes to this:
public IEnumerable<KeyValuePair<string, DbGeography>> Read(string filePath,
string nameColumnIdentifier)
{
var shapes = new ShapeFileHelper().Read(filePath, nameColumnIdentifier);
return shapes.AsParallel().Select(x =>
{
var transformedGeography = GeometryTransform.TransformGeometry(
GeometryFactory.Default, x.Value, _transform);
var sqlGeography =
SqlGeography.STGeomFromText(new SqlChars(transformedGeography.AsText()), 4326)
.MakeValid();
var invertedSqlGeography = sqlGeography.ReorientObject();
if (sqlGeography.STArea() > invertedSqlGeography.STArea())
{
sqlGeography = invertedSqlGeography;
}
var dbGeography = DbSpatialServices.Default.GeographyFromProviderValue(sqlGeography);
return new KeyValuePair<string, DbGeography>(x.Key, dbGeography);
});
}
Now we can finally read the shapefile shapes into SQL Server using Entity Framework (KOMNAVN
is the column that holds the name of the municipality):
using (var context = new Context())
{
foreach (var shape in new DagiShapeFileReader().Read(pathToShapeFile, "KOMNAVN"))
{
context.Features.Add(new Feature { Name = shape.Key, Geography = shape.Value });
}
context.SaveChanges();
}
The next installment will cover how to query and display the data we've read. We'll also release the full solution. In the meantime you can try out the end result here. Click anywhere on the Danish land-mass to see the boundary of the municipality you clicked on.