Here I’ve made a collection of common problems I’ve found with data sets, projections and coordinate systems on my spatial data journey. I suggest some ways to solve them.
You are given a dataset with coordinate pairs (X&Y or Long/Lat) but you don´t know in which coordinate system they are. However, when importing the data set into a GIS application or into R, you always need to specify the coordinate system (especially if you want to use those coordinate points to make some calculation afterwards, for example, extract values of environmental variables at these given locations).
Lets’ have a look at this issue with an example.
Your colleagues are studying the fish populations of two species (Fish1 and Fish2) in some dryland rivers of central Australia. During the field season they captured individuals of these two species on different waterholes across different rivers and extracted blood samples when possible. They gave you their data and asked to create a map showing the sampled sites.
Here a subset of the data set
The first issue you observe is that there are missing coordinates (yellow cells). You need to contact the data collectors or other experts with knowledge in the study area and ask for help with the missing coordinates. In this case you have the name of the river and the waterhole to help in your search for missing coordinates (you could use auxiliary spatial data like a layer of waterbodies and rivers to help you guess where these sites are located). In the worst case scenario you might need to exclude these records from your map.
Your colleagues didn´t give you any information about the coordinate system in which data were collected. The first two numbers of the Easting/Northing column and the letter J give you a hint about the fact that the coordinates are not geographic (lat/long) but projected (53 is the UTM zone). Also, you know that lat/long values range between -90° to 90° and -180° to 180° respectively, so the coordinates 671824 6940022 don´t fall in that range.
Since data were collected recently (2012), you can initially assume that the coordinate system in which data were collected is the currently accepted gridded system in Australia GDA_1994_MGA_Zone53 /Zone 54. If you are working in any other area in the world you have to find out the official coordinate system/s most commonly used in your study area.
So, your best guess is that the coordinate system is GDA_1994_MGA_Zone53 /Zone 54. The next step is to check whether you are right or wrong. Plot the data assuming this coordinate system and check the resulting layer on top of a layer of streams and/or an aerial image to see if locations make sense. Otherwise, you should try other alternative coordinate systems (e.g. AGD1984_AMG Zone 53)
But hang on…. the coordinates for Algebuckina Waterhole are provided in 13 different UTM zones (53-65)!! This is an extreme example and clearly it has been created when dragging the 53 J 580170 6913592 cell down in the spreadsheet when typing it the Easting/Northing values for this waterhole. Excel assumed it is a numeric series and filled the empty cells with consecutive increasing values starting from 53.
If any of your records seem to have a weird UTM zone, check for other records in the spreadsheet for the same site or nearby location and copy the UTM zone from these registers. Plot them and overlay the resulting point‐layer with a layer of rivers (in this case) and check if the point is placed in the river it should correspond to.
You´ve been sampling a plant species across Victoria and NSW in Australia (presence-absence and abundances). You want to explore whether the observed changes in abundance of the species between the different sampled sites relate to the variation in rainfall among locations. You downloaded a map of average rainfall values across Australia from the Australian Bureau of Meteorology. Your idea is to extract the values of average rainfall at each sample location and then make a model relating species abundance with rainfall. Any spatial query involving two or more layers (in this case extract values from a raster to a set of points) requires all layers to be projected in the same coordinate system.
When you import the rainfall data in a GIS program you find that the information about the coordinate system is missing. Check if the metadata for the layer (rainfall raster) is available on the source website. If the metadata doesn’t specify the coordinate system, you should be able to find that information somewhere else in the website. All (well…most) official and reliable sources of spatial data provide information about the coordinate system they have projected their data into (although in most cases you have to dig deep into menus and tabs to find it!). If you can´t find it, you should contact the data curators. My experience is that they are keen to help.