In this blog post you will learn how to extract stats from spatial data to respond to questions like: how much did it rain in a given catchment last year? what is the dominant land use/forest type in a given area (e.g. region)? what is the total population of a given area?
Why is this type of data/information important in Ecology? Most of the time, a phenomenon that is observed at a given spot (e.g. % of organic matter in a given river segment/strech) depends not just on what´s happening in that particular place (e.g. is there livestock using that stretch of the river for drinking? or is there any kind of punctual discharge of organic matter at that place?) but also on what´s happening in its surroundings or other connected parts of the landscape (e.g. how much did it rain in the upper parts of the catchment last week? after a heavy rain lots of nutrients can be washed out from the soils to streams increasing the loads of organic matter).
In this example I´ll calculate the total annual rainfall fallen in the biggest river basin in Australia (the Murray Darling basin) during the last 30 years.
What type of spatial data do I need for this calculation?
- A vector layer delimiting the boundaries of my study area (the Murray Darling basin).
- A raster layer of total rainfall with the same extent or larger than my study area.
In my case, I obtained the layer delimiting the boundaries of the Murray Darling basin from the Australian Hydrological Geospatial Fabric, Bureau of Meteorology, Australian Government, Australia. URL: http://www.bom.gov.au/water/geofabric/ (the shapefile is called NCBLevel1DrainageDivision.shp and you´ll find it inside the geotabase SH_Catchments.gdb).
The data about rainfall totals is also available through the Australian Bureau of Meteorology: http://www.bom.gov.au > climate & past weather> maps-average conditions> decadal and multi‐decadal conditions. In particular I downloaded ‘multi-decadal rainfall totals’ for the period 1976-2005 in grid format. The format of the downloaded grid file is .txt (equivalent to .ascii) which is the best interchangeable raster data format since it is recognized by all GIS software (also R!). To visualize this file in a map, you should CONVERT it into .GRID . TIF or .IMG formats.
To do this in ARCGIS you´ll choose an appropriate tool from the set of CONVERSION tools in the ArcToolBox. You can open the ArcToolBox by clicking on its icon (a red box on top of an executable window). In this toolbox you will find, execute and manage geoprocessing tools. For this particular exercise, go to Conversion tools > to Raster > ascii to Raster.
Indicate the format of the output file (e.g. ‘.tif’). Default in ARCGIS is GRID but it creates 6 files in your working directory. ‘.img’ or ´.tif´ formats create 2 files only.
The output data type can be FLOAT (with decimals) or INTEGER (decimals not allowed). The INTEGER type is used frequently for categorical data (e.g. a raster with values 1 – Forest, 2 – Non forest) and the FLOAT raster type for continuous data (e.g. a scale of temperatures). But think twice about the nature of your data before importing the raster (e.g. population totals is an example of continuous data that would be imported into INTEGER format).
Ok! Now you can visualize your data (both the layer of drainage basins and the raster of rainfall totals). In a previous post I explained that to perform calculations in ARCGIS (as in any other geographic information system, including R) all layers involved in the analysis have to specify the same coordinate system and if they don´t, we should do something about it (reproject).
In this case I´ve looked into the source of the spatial data (the Australian Bureau of Meteorology) to find out that both layers are in the same coordinate system (GCS_GDA_1994). Here the info: http://www.bom.gov.au/climate/averages/climatology/gridded-data-info/gridded-climate-data.shtml
However, when I explore the properties of the raster layer (right click on the layer´s name > properties> Source), it indicates that the spatial reference is unknown.
So, the first thing to do it is to define the coordinate system of the raster layer. ArcToolBox > Data Management Tools > Projections and Transformations > Define projection.
If you chose the option select, you will find the coordinate system GCS_GDA_1994 under Geographic Coordinate Systems > Australia and New Zealand > Geocentric Datum of Australia 1994.prj. Alternatively, you can import the coordinate system from the layer ‘NCBLevel1DrainageDivision.shp’.
Once both layers are in the same coordinate system, we will extract the rainfall stats for the stydy area. Within the ArcToolBox go to Spatial Analyst Tools > Zonal > Zonal Statistics as Table.
In this case we will ask for all possible statistics. If your input value raster is a categorical raster (or INTEGER e.g. land cover types), then you can also get zonal calculations for majority, median, minority and variety. Be aware that many of these stats might not make any ecological sense (e.g. a mean value of 1.5 for two land cover categories – 1. forest and 2. grasslands – doesn´t mean a pixel is partly occupied by forest and parly occupied by grasslands).
If you have a look at results you will see that the tool has extracted stats for all major basins in Australia. We can filter this table and retain the values for the Murray-Darling basin only. Because we are only interested in one basin, we can also first select the basin of interest (or region) and then, run the Zonal Statistics as Table: the program will return the stats for the selected basin only. You will find the ‘Select’ tool in the ‘Tools’ toolbar (it is highlighted in the image below with a red rectangle)