Exploring .bag
Bathymetry Data Files
An exploration of data and metadata in Bathymetric Attributed Grid (BAG) files.
References:
BAG website: https://marinemetadata.org/references/bag
Format Specification Document: http://www.opennavsurf.org/papers/ons_fsd.pdf
A slightly dated, Python 2 based video lesson on accessing BAG files: https://www.youtube.com/watch?v=dEtC6bRcjvc
Working environment for this notebook:
Python 3
conda
packages:h5py
- Python interface to HDF5 format used by BAGlxml
- XML parser and manipulation library to access BAG metadatanumpy
- for n-dimensional arraysmatplotlib
- for plottingnotebook
- Jupyter notebook
“Keep Calm and Conda Install”
If you are looking at this in the Salish Sea Tools docs at http://salishsea-meopar-tools.readthedocs.io/en/latest/bathymetry/ExploringBagFiles.html, you can find the source notebook that generated the page in the Salish Sea project tools repo at tools/bathymetry/ExploringBagFiles.ipynb
or download the notebook by itself (instead of cloning the tools repo to get it) from
http://nbviewer.jupyter.org/github/SalishSeaCast/tools/blob/master/bathymetry/ExploringBagFiles.ipynb.
[1]:
from io import BytesIO
import h5py
from lxml import etree
import matplotlib.pyplot as plt
import numpy as np
[2]:
%matplotlib inline
BAG Dataset
Load the BAG dataset and explore some of its basic attributes:
[3]:
bag = h5py.File('/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag')
[4]:
print(type(bag))
print(bag.name)
print(bag.filename)
<class 'h5py._hl.files.File'>
/
/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag
[5]:
for item in bag.items():
print(item)
for value in bag.values():
print(value)
('BAG_root', <HDF5 group "/BAG_root" (4 members)>)
<HDF5 group "/BAG_root" (4 members)>
[6]:
list(bag['BAG_root'].items())
[6]:
[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]
The list above contains the 4 elements that the BAG specification tells us should be in the file:
elevation
is the depths as negative 32-bit floats, with1.0e6
as the “no data” value (land, typically)metadata
is the BAG metadata, a blob of XMLtracking_list
is adjustments to theelevation
data values made by a hydrographeruncertainty
is the vertical uncertainty in theelevation
data values
Note that under Python 3 the h5py
library maked heavy use of memoryview
objects which are iterators. The transformation to a list
object above, or the use of a for
loop above that collects the items from the memoryview
.
One odd thing to note is that the metadata is stored as a collection of 1-character strings which turn out to be single bytes in Python 3. We’re going to have to do something about that…
Peeling away the HDF5 group layer:
[7]:
root = bag['BAG_root']
print(root.name)
print(root.parent)
list(root.items())
/BAG_root
<HDF5 group "/" (1 members)>
[7]:
[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]
The elevation
Element
Pulling the elevation
dataset out of the BAG, and the depths data out of the dataset:
[8]:
elev_node = root['elevation']
print(type(elev_node))
<class 'h5py._hl.dataset.Dataset'>
[9]:
elev = elev_node.value
print(type(elev))
<class 'numpy.ndarray'>
[10]:
print(elev.min(), elev.max())
-341.917 1e+06
As noted above 1e+06
indicates no data at a point, typically meaning land. Let’s replace those with NumPy NaN
s so that we can work with the data more easily:
[11]:
elev[elev > 9e5] = np.NAN
print(np.nanmin(elev), np.nanmax(elev))
-341.917 4.2
[12]:
fig, ax = plt.subplots(1, 1)
ax.imshow(elev)
ax.invert_yaxis()
The metadata
Element
Pulling the metadata
element out of the BAG, and getting it into a form that we can work with:
[13]:
metadata_node = root['metadata']
print(type(metadata_node))
print(metadata_node)
<class 'h5py._hl.dataset.Dataset'>
<HDF5 dataset "metadata": shape (9730,), type "|S1">
As noted above, the metadata is a collection of single characters in the form of bytes. We need to collect those bytes into a buffer and parse them to get an XML tree object that we can work with in code:
[14]:
buffer = BytesIO(metadata_node.value)
tree = etree.parse(buffer)
root = tree.getroot()
Now we can get a somewhat readable rendering of the metadata in all its verbose XML glory:
[15]:
print(etree.tostring(root, pretty_print=True).decode('ascii'))
<gmi:MI_Metadata xmlns:gmi="http://www.isotc211.org/2005/gmi" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:bag="http://www.opennavsurf.org/schema/bag">
<gmd:fileIdentifier>
<gco:CharacterString>2db1df98-90f2-4e20-a91e-6089111e2f5d</gco:CharacterString>
</gmd:fileIdentifier>
<gmd:language>
<gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
</gmd:language>
<gmd:characterSet>
<gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8">utf8</gmd:MD_CharacterSetCode>
</gmd:characterSet>
<gmd:hierarchyLevel>
<gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="dataset">dataset</gmd:MD_ScopeCode>
</gmd:hierarchyLevel>
<gmd:contact>
<gmd:CI_ResponsibleParty>
<gmd:individualName>
<gco:CharacterString>dillt</gco:CharacterString>
</gmd:individualName>
<gmd:organisationName>
<gco:CharacterString>CHS</gco:CharacterString>
</gmd:organisationName>
<gmd:positionName>
<gco:CharacterString> MDH</gco:CharacterString>
</gmd:positionName>
<gmd:role>
<gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="pointOfContact">pointOfContact</gmd:CI_RoleCode>
</gmd:role>
</gmd:CI_ResponsibleParty>
</gmd:contact>
<gmd:dateStamp>
<gco:Date>2014-02-07</gco:Date>
</gmd:dateStamp>
<gmd:metadataStandardName>
<gco:CharacterString>ISO 19115</gco:CharacterString>
</gmd:metadataStandardName>
<gmd:metadataStandardVersion>
<gco:CharacterString>2003/Cor.1:2006</gco:CharacterString>
</gmd:metadataStandardVersion>
<gmd:spatialRepresentationInfo>
<gmd:MD_Georectified>
<gmd:numberOfDimensions>
<gco:Integer>2</gco:Integer>
</gmd:numberOfDimensions>
<gmd:axisDimensionProperties>
<gmd:MD_Dimension>
<gmd:dimensionName>
<gmd:MD_DimensionNameTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode" codeListValue="row">row</gmd:MD_DimensionNameTypeCode>
</gmd:dimensionName>
<gmd:dimensionSize>
<gco:Integer>337</gco:Integer>
</gmd:dimensionSize>
<gmd:resolution>
<gco:Measure uom="Metres">500</gco:Measure>
</gmd:resolution>
</gmd:MD_Dimension>
</gmd:axisDimensionProperties>
<gmd:axisDimensionProperties>
<gmd:MD_Dimension>
<gmd:dimensionName>
<gmd:MD_DimensionNameTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode" codeListValue="column">column</gmd:MD_DimensionNameTypeCode>
</gmd:dimensionName>
<gmd:dimensionSize>
<gco:Integer>448</gco:Integer>
</gmd:dimensionSize>
<gmd:resolution>
<gco:Measure uom="Metres">500</gco:Measure>
</gmd:resolution>
</gmd:MD_Dimension>
</gmd:axisDimensionProperties>
<gmd:cellGeometry>
<gmd:MD_CellGeometryCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CellGeometryCode" codeListValue="point">point</gmd:MD_CellGeometryCode>
</gmd:cellGeometry>
<gmd:transformationParameterAvailability>
<gco:Boolean>1</gco:Boolean>
</gmd:transformationParameterAvailability>
<gmd:checkPointAvailability>
<gco:Boolean>0</gco:Boolean>
</gmd:checkPointAvailability>
<gmd:cornerPoints>
<gml:Point gml:id="id1">
<gml:coordinates decimal="." cs="," ts=" ">-13804000.000000000000,6075000.000000000000 -13580500.000000000000,6243000.000000000000</gml:coordinates>
</gml:Point>
</gmd:cornerPoints>
<gmd:pointInPixel>
<gmd:MD_PixelOrientationCode>center</gmd:MD_PixelOrientationCode>
</gmd:pointInPixel>
</gmd:MD_Georectified>
</gmd:spatialRepresentationInfo>
<gmd:referenceSystemInfo>
<gmd:MD_ReferenceSystem>
<gmd:referenceSystemIdentifier>
<gmd:RS_Identifier>
<gmd:code>
<gco:CharacterString>PROJCS["WRLDMERC",
GEOGCS["unnamed",
DATUM["WGS_1984",
SPHEROID["WGS_1984",6378137,298.2572201434276],
TOWGS84[0,0,0,0,0,0,0]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
EXTENSION["Scaler","0,0,0,0.01,0.01,0.0001"],
EXTENSION["Source","CARIS"]],
PROJECTION["Mercator_1SP"],
PARAMETER["central_meridian",0],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["Meter",1]]</gco:CharacterString>
</gmd:code>
<gmd:codeSpace>
<gco:CharacterString>WKT</gco:CharacterString>
</gmd:codeSpace>
</gmd:RS_Identifier>
</gmd:referenceSystemIdentifier>
</gmd:MD_ReferenceSystem>
</gmd:referenceSystemInfo>
<gmd:referenceSystemInfo>
<gmd:MD_ReferenceSystem>
<gmd:referenceSystemIdentifier>
<gmd:RS_Identifier>
<gmd:code>
<gco:CharacterString>VERT_CS["Unknown", VERT_DATUM[Unknown, 2000]]</gco:CharacterString>
</gmd:code>
<gmd:codeSpace>
<gco:CharacterString>WKT</gco:CharacterString>
</gmd:codeSpace>
</gmd:RS_Identifier>
</gmd:referenceSystemIdentifier>
</gmd:MD_ReferenceSystem>
</gmd:referenceSystemInfo>
<gmd:identificationInfo>
<bag:BAG_DataIdentification>
<gmd:citation>
<gmd:CI_Citation>
<gmd:title>
<gco:CharacterString>BDB_92B_500m_WorldMerc_2014-02-05_extract_final.csar</gco:CharacterString>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2014-02-07</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">creation</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:citedResponsibleParty>
<gmd:CI_ResponsibleParty>
<gmd:individualName>
<gco:CharacterString>dillt</gco:CharacterString>
</gmd:individualName>
<gmd:organisationName>
<gco:CharacterString>CHS</gco:CharacterString>
</gmd:organisationName>
<gmd:positionName>
<gco:CharacterString> MDH</gco:CharacterString>
</gmd:positionName>
<gmd:role>
<gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="originator">originator</gmd:CI_RoleCode>
</gmd:role>
</gmd:CI_ResponsibleParty>
</gmd:citedResponsibleParty>
</gmd:CI_Citation>
</gmd:citation>
<gmd:abstract>
<gco:CharacterString>unknown</gco:CharacterString>
</gmd:abstract>
<gmd:status>
<gmd:MD_ProgressCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ProgressCode" codeListValue="onGoing">onGoing</gmd:MD_ProgressCode>
</gmd:status>
<gmd:spatialRepresentationType>
<gmd:MD_SpatialRepresentationTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_SpatialRepresentationTypeCode" codeListValue="grid">grid</gmd:MD_SpatialRepresentationTypeCode>
</gmd:spatialRepresentationType>
<gmd:language>
<gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
</gmd:language>
<gmd:characterSet>
<gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8">utf8</gmd:MD_CharacterSetCode>
</gmd:characterSet>
<gmd:topicCategory>
<gmd:MD_TopicCategoryCode>elevation</gmd:MD_TopicCategoryCode>
</gmd:topicCategory>
<gmd:extent>
<gmd:EX_Extent>
<gmd:geographicElement>
<gmd:EX_GeographicBoundingBox>
<gmd:westBoundLongitude>
<gco:Decimal>-124.003</gco:Decimal>
</gmd:westBoundLongitude>
<gmd:eastBoundLongitude>
<gco:Decimal>-121.996</gco:Decimal>
</gmd:eastBoundLongitude>
<gmd:southBoundLatitude>
<gco:Decimal>47.9995</gco:Decimal>
</gmd:southBoundLatitude>
<gmd:northBoundLatitude>
<gco:Decimal>49.0024</gco:Decimal>
</gmd:northBoundLatitude>
</gmd:EX_GeographicBoundingBox>
</gmd:geographicElement>
</gmd:EX_Extent>
</gmd:extent>
<bag:verticalUncertaintyType>
<bag:BAG_VertUncertCode codeList="http://www.opennavsurf.org/schema/bag/bagCodelists.xml#BAG_VertUncertCode" codeListValue="unknown">unknown</bag:BAG_VertUncertCode>
</bag:verticalUncertaintyType>
</bag:BAG_DataIdentification>
</gmd:identificationInfo>
<gmd:dataQualityInfo>
<gmd:DQ_DataQuality>
<gmd:scope>
<gmd:DQ_Scope>
<gmd:level>
<gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="dataset">dataset</gmd:MD_ScopeCode>
</gmd:level>
</gmd:DQ_Scope>
</gmd:scope>
<gmd:lineage>
<gmd:LI_Lineage>
<gmd:processStep>
<bag:BAG_ProcessStep>
<gmd:description>
<gco:CharacterString/>
</gmd:description>
<gmd:dateTime>
<gco:DateTime/>
</gmd:dateTime>
<bag:trackingId>
<gco:CharacterString/>
</bag:trackingId>
</bag:BAG_ProcessStep>
</gmd:processStep>
<gmd:processStep>
<bag:BAG_ProcessStep>
<gmd:description>
<gco:CharacterString>Designated soundings applied by automated procedure.</gco:CharacterString>
</gmd:description>
<gmd:dateTime>
<gco:DateTime/>
</gmd:dateTime>
<bag:trackingId>
<gco:CharacterString>0</gco:CharacterString>
</bag:trackingId>
</bag:BAG_ProcessStep>
</gmd:processStep>
</gmd:LI_Lineage>
</gmd:lineage>
</gmd:DQ_DataQuality>
</gmd:dataQualityInfo>
<gmd:metadataConstraints>
<gmd:MD_LegalConstraints>
<gmd:useConstraints>
<gmd:MD_RestrictionCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_RestrictionCode" codeListValue="otherRestrictions">otherRestrictions</gmd:MD_RestrictionCode>
</gmd:useConstraints>
<gmd:otherConstraints>
<gco:CharacterString>Not for navigation, not to be redistributed</gco:CharacterString>
</gmd:otherConstraints>
</gmd:MD_LegalConstraints>
</gmd:metadataConstraints>
<gmd:metadataConstraints>
<gmd:MD_SecurityConstraints>
<gmd:classification>
<gmd:MD_ClassificationCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ClassificationCode" codeListValue="unclassified">unclassified</gmd:MD_ClassificationCode>
</gmd:classification>
<gmd:userNote>
<gco:CharacterString>Contact Pete Wills for inquiries: Peter.Wills@dfo-mpo.gc.ca, 250-363-6384</gco:CharacterString>
</gmd:userNote>
</gmd:MD_SecurityConstraints>
</gmd:metadataConstraints>
</gmi:MI_Metadata>
To get information out of the tree we need to deal with the namespaces that are used for the various tags:
[16]:
root.nsmap
[16]:
{'bag': 'http://www.opennavsurf.org/schema/bag',
'gco': 'http://www.isotc211.org/2005/gco',
'gmd': 'http://www.isotc211.org/2005/gmd',
'gmi': 'http://www.isotc211.org/2005/gmi',
'gml': 'http://www.opengis.net/gml/3.2',
'xlink': 'http://www.w3.org/1999/xlink',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}
Building the tags that we need to get to the resolution, and then walking the tree to get the resolution and its units:
[17]:
sri = etree.QName(root.nsmap['gmd'], 'spatialRepresentationInfo').text
adp = etree.QName(root.nsmap['gmd'], 'axisDimensionProperties').text
dim = etree.QName(root.nsmap['gmd'], 'MD_Dimension').text
res = etree.QName(root.nsmap['gmd'], 'resolution').text
res_meas = etree.QName(root.nsmap['gco'], 'Measure').text
[18]:
resolution = (
root
.find('.//{}'.format(sri))
.find('.//{}'.format(adp))
.find('.//{}'.format(dim))
.find('.//{}'.format(res))
.find('.//{}'.format(res_meas))
)
print(resolution.text, resolution.get('uom'))
500 Metres
There might be a more elegant way of doing the sequence of find
s above if one were to dig more deeply into XPATH syntax.
Similarily for the data region boundaries:
[19]:
id_info = etree.QName(root.nsmap['gmd'], 'identificationInfo').text
bag_data_id = etree.QName(root.nsmap['bag'], 'BAG_DataIdentification').text
extent = etree.QName(root.nsmap['gmd'], 'extent').text
ex_extent = etree.QName(root.nsmap['gmd'], 'EX_Extent').text
geo_el = etree.QName(root.nsmap['gmd'], 'geographicElement').text
geo_bb = etree.QName(root.nsmap['gmd'], 'EX_GeographicBoundingBox').text
west_bound_lon = etree.QName(root.nsmap['gmd'], 'westBoundLongitude').text
east_bound_lon = etree.QName(root.nsmap['gmd'], 'eastBoundLongitude').text
north_bound_lat = etree.QName(root.nsmap['gmd'], 'northBoundLatitude').text
south_bound_lat = etree.QName(root.nsmap['gmd'], 'southBoundLatitude').text
decimal = etree.QName(root.nsmap['gco'], 'Decimal').text
[20]:
bbox = (
root
.find('.//{}'.format(id_info))
.find('.//{}'.format(bag_data_id))
.find('.//{}'.format(extent))
.find('.//{}'.format(ex_extent))
.find('.//{}'.format(geo_el))
.find('.//{}'.format(geo_bb))
)
west_lon = (
bbox
.find('.//{}'.format(west_bound_lon))
.find('.//{}'.format(decimal))
)
print('west:', west_lon.text)
east_lon = (
bbox
.find('.//{}'.format(east_bound_lon))
.find('.//{}'.format(decimal))
)
print('east:', east_lon.text)
north_lat = (
bbox
.find('.//{}'.format(north_bound_lat))
.find('.//{}'.format(decimal))
)
print('north:', north_lat.text)
south_lat = (
bbox
.find('.//{}'.format(south_bound_lat))
.find('.//{}'.format(decimal))
)
print('south:', south_lat.text)
west: -124.003
east: -121.996
north: 49.0024
south: 47.9995