Exploring .bag Bathymetry Data Files

An exploration of data and metadata in Bathymetric Attributed Grid (BAG) files.

References:

Working environment for this notebook:

  • Python 3

  • conda packages:

    • h5py - Python interface to HDF5 format used by BAG

    • lxml - XML parser and manipulation library to access BAG metadata

    • numpy - for n-dimensional arrays

    • matplotlib - for plotting

    • notebook - Jupyter notebook

“Keep Calm and Conda Install”

If you are looking at this in the Salish Sea Tools docs at http://salishsea-meopar-tools.readthedocs.io/en/latest/bathymetry/ExploringBagFiles.html, you can find the source notebook that generated the page in the Salish Sea project tools repo at tools/bathymetry/ExploringBagFiles.ipynb or download the notebook by itself (instead of cloning the tools repo to get it) from http://nbviewer.jupyter.org/urls/bitbucket.org/salishsea/tools/raw/tip/bathymetry/ExploringBagFiles.ipynb.

[1]:
from io import BytesIO

import h5py
from lxml import etree
import matplotlib.pyplot as plt
import numpy as np
[2]:
%matplotlib inline

BAG Dataset

Load the BAG dataset and explore some of its basic attributes:

[3]:
bag = h5py.File('/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag')
[4]:
print(type(bag))
print(bag.name)
print(bag.filename)
<class 'h5py._hl.files.File'>
/
/ocean/sallen/allen/research/MEOPAR/chs_bathy/092B.bag
[5]:
for item in bag.items():
    print(item)

for value in bag.values():
    print(value)
('BAG_root', <HDF5 group "/BAG_root" (4 members)>)
<HDF5 group "/BAG_root" (4 members)>
[6]:
list(bag['BAG_root'].items())
[6]:
[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
 ('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
 ('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
 ('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]

The list above contains the 4 elements that the BAG specification tells us should be in the file:

  • elevation is the depths as negative 32-bit floats, with 1.0e6 as the “no data” value (land, typically)

  • metadata is the BAG metadata, a blob of XML

  • tracking_list is adjustments to the elevation data values made by a hydrographer

  • uncertainty is the vertical uncertainty in the elevation data values

Note that under Python 3 the h5py library maked heavy use of memoryview objects which are iterators. The transformation to a list object above, or the use of a for loop above that collects the items from the memoryview.

One odd thing to note is that the metadata is stored as a collection of 1-character strings which turn out to be single bytes in Python 3. We’re going to have to do something about that…

Peeling away the HDF5 group layer:

[7]:
root = bag['BAG_root']
print(root.name)
print(root.parent)
list(root.items())
/BAG_root
<HDF5 group "/" (1 members)>
[7]:
[('elevation', <HDF5 dataset "elevation": shape (337, 448), type "<f4">),
 ('metadata', <HDF5 dataset "metadata": shape (9730,), type "|S1">),
 ('tracking_list', <HDF5 dataset "tracking_list": shape (0,), type "|V20">),
 ('uncertainty', <HDF5 dataset "uncertainty": shape (337, 448), type "<f4">)]

The elevation Element

Pulling the elevation dataset out of the BAG, and the depths data out of the dataset:

[8]:
elev_node = root['elevation']
print(type(elev_node))
<class 'h5py._hl.dataset.Dataset'>
[9]:
elev = elev_node.value
print(type(elev))
<class 'numpy.ndarray'>
[10]:
print(elev.min(), elev.max())
-341.917 1e+06

As noted above 1e+06 indicates no data at a point, typically meaning land. Let’s replace those with NumPy NaNs so that we can work with the data more easily:

[11]:
elev[elev > 9e5] = np.NAN
print(np.nanmin(elev), np.nanmax(elev))
-341.917 4.2
[12]:
fig, ax = plt.subplots(1, 1)
ax.imshow(elev)
ax.invert_yaxis()
../_images/bathymetry_ExploringBagFiles_19_0.png

The metadata Element

Pulling the metadata element out of the BAG, and getting it into a form that we can work with:

[13]:
metadata_node = root['metadata']
print(type(metadata_node))
print(metadata_node)
<class 'h5py._hl.dataset.Dataset'>
<HDF5 dataset "metadata": shape (9730,), type "|S1">

As noted above, the metadata is a collection of single characters in the form of bytes. We need to collect those bytes into a buffer and parse them to get an XML tree object that we can work with in code:

[14]:
buffer = BytesIO(metadata_node.value)
tree = etree.parse(buffer)
root = tree.getroot()

Now we can get a somewhat readable rendering of the metadata in all its verbose XML glory:

[15]:
print(etree.tostring(root, pretty_print=True).decode('ascii'))
<gmi:MI_Metadata xmlns:gmi="http://www.isotc211.org/2005/gmi" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:bag="http://www.opennavsurf.org/schema/bag">
  <gmd:fileIdentifier>
    <gco:CharacterString>2db1df98-90f2-4e20-a91e-6089111e2f5d</gco:CharacterString>
  </gmd:fileIdentifier>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
  </gmd:language>
  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8">utf8</gmd:MD_CharacterSetCode>
  </gmd:characterSet>
  <gmd:hierarchyLevel>
    <gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="dataset">dataset</gmd:MD_ScopeCode>
  </gmd:hierarchyLevel>
  <gmd:contact>
    <gmd:CI_ResponsibleParty>
      <gmd:individualName>
        <gco:CharacterString>dillt</gco:CharacterString>
      </gmd:individualName>
      <gmd:organisationName>
        <gco:CharacterString>CHS</gco:CharacterString>
      </gmd:organisationName>
      <gmd:positionName>
        <gco:CharacterString> MDH</gco:CharacterString>
      </gmd:positionName>
      <gmd:role>
        <gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="pointOfContact">pointOfContact</gmd:CI_RoleCode>
      </gmd:role>
    </gmd:CI_ResponsibleParty>
  </gmd:contact>
  <gmd:dateStamp>
    <gco:Date>2014-02-07</gco:Date>
  </gmd:dateStamp>
  <gmd:metadataStandardName>
    <gco:CharacterString>ISO 19115</gco:CharacterString>
  </gmd:metadataStandardName>
  <gmd:metadataStandardVersion>
    <gco:CharacterString>2003/Cor.1:2006</gco:CharacterString>
  </gmd:metadataStandardVersion>
  <gmd:spatialRepresentationInfo>
    <gmd:MD_Georectified>
      <gmd:numberOfDimensions>
        <gco:Integer>2</gco:Integer>
      </gmd:numberOfDimensions>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension>
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode" codeListValue="row">row</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>337</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution>
            <gco:Measure uom="Metres">500</gco:Measure>
          </gmd:resolution>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:axisDimensionProperties>
        <gmd:MD_Dimension>
          <gmd:dimensionName>
            <gmd:MD_DimensionNameTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_DimensionNameTypeCode" codeListValue="column">column</gmd:MD_DimensionNameTypeCode>
          </gmd:dimensionName>
          <gmd:dimensionSize>
            <gco:Integer>448</gco:Integer>
          </gmd:dimensionSize>
          <gmd:resolution>
            <gco:Measure uom="Metres">500</gco:Measure>
          </gmd:resolution>
        </gmd:MD_Dimension>
      </gmd:axisDimensionProperties>
      <gmd:cellGeometry>
        <gmd:MD_CellGeometryCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CellGeometryCode" codeListValue="point">point</gmd:MD_CellGeometryCode>
      </gmd:cellGeometry>
      <gmd:transformationParameterAvailability>
        <gco:Boolean>1</gco:Boolean>
      </gmd:transformationParameterAvailability>
      <gmd:checkPointAvailability>
        <gco:Boolean>0</gco:Boolean>
      </gmd:checkPointAvailability>
      <gmd:cornerPoints>
        <gml:Point gml:id="id1">
          <gml:coordinates decimal="." cs="," ts=" ">-13804000.000000000000,6075000.000000000000 -13580500.000000000000,6243000.000000000000</gml:coordinates>
        </gml:Point>
      </gmd:cornerPoints>
      <gmd:pointInPixel>
        <gmd:MD_PixelOrientationCode>center</gmd:MD_PixelOrientationCode>
      </gmd:pointInPixel>
    </gmd:MD_Georectified>
  </gmd:spatialRepresentationInfo>
  <gmd:referenceSystemInfo>
    <gmd:MD_ReferenceSystem>
      <gmd:referenceSystemIdentifier>
        <gmd:RS_Identifier>
          <gmd:code>
            <gco:CharacterString>PROJCS["WRLDMERC",
    GEOGCS["unnamed",
        DATUM["WGS_1984",
            SPHEROID["WGS_1984",6378137,298.2572201434276],
            TOWGS84[0,0,0,0,0,0,0]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        EXTENSION["Scaler","0,0,0,0.01,0.01,0.0001"],
        EXTENSION["Source","CARIS"]],
    PROJECTION["Mercator_1SP"],
    PARAMETER["central_meridian",0],
    PARAMETER["scale_factor",1],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["Meter",1]]</gco:CharacterString>
          </gmd:code>
          <gmd:codeSpace>
            <gco:CharacterString>WKT</gco:CharacterString>
          </gmd:codeSpace>
        </gmd:RS_Identifier>
      </gmd:referenceSystemIdentifier>
    </gmd:MD_ReferenceSystem>
  </gmd:referenceSystemInfo>
  <gmd:referenceSystemInfo>
    <gmd:MD_ReferenceSystem>
      <gmd:referenceSystemIdentifier>
        <gmd:RS_Identifier>
          <gmd:code>
            <gco:CharacterString>VERT_CS["Unknown", VERT_DATUM[Unknown, 2000]]</gco:CharacterString>
          </gmd:code>
          <gmd:codeSpace>
            <gco:CharacterString>WKT</gco:CharacterString>
          </gmd:codeSpace>
        </gmd:RS_Identifier>
      </gmd:referenceSystemIdentifier>
    </gmd:MD_ReferenceSystem>
  </gmd:referenceSystemInfo>
  <gmd:identificationInfo>
    <bag:BAG_DataIdentification>
      <gmd:citation>
        <gmd:CI_Citation>
          <gmd:title>
            <gco:CharacterString>BDB_92B_500m_WorldMerc_2014-02-05_extract_final.csar</gco:CharacterString>
          </gmd:title>
          <gmd:date>
            <gmd:CI_Date>
              <gmd:date>
                <gco:Date>2014-02-07</gco:Date>
              </gmd:date>
              <gmd:dateType>
                <gmd:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">creation</gmd:CI_DateTypeCode>
              </gmd:dateType>
            </gmd:CI_Date>
          </gmd:date>
          <gmd:citedResponsibleParty>
            <gmd:CI_ResponsibleParty>
              <gmd:individualName>
                <gco:CharacterString>dillt</gco:CharacterString>
              </gmd:individualName>
              <gmd:organisationName>
                <gco:CharacterString>CHS</gco:CharacterString>
              </gmd:organisationName>
              <gmd:positionName>
                <gco:CharacterString> MDH</gco:CharacterString>
              </gmd:positionName>
              <gmd:role>
                <gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="originator">originator</gmd:CI_RoleCode>
              </gmd:role>
            </gmd:CI_ResponsibleParty>
          </gmd:citedResponsibleParty>
        </gmd:CI_Citation>
      </gmd:citation>
      <gmd:abstract>
        <gco:CharacterString>unknown</gco:CharacterString>
      </gmd:abstract>
      <gmd:status>
        <gmd:MD_ProgressCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ProgressCode" codeListValue="onGoing">onGoing</gmd:MD_ProgressCode>
      </gmd:status>
      <gmd:spatialRepresentationType>
        <gmd:MD_SpatialRepresentationTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_SpatialRepresentationTypeCode" codeListValue="grid">grid</gmd:MD_SpatialRepresentationTypeCode>
      </gmd:spatialRepresentationType>
      <gmd:language>
        <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng">eng</gmd:LanguageCode>
      </gmd:language>
      <gmd:characterSet>
        <gmd:MD_CharacterSetCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="utf8">utf8</gmd:MD_CharacterSetCode>
      </gmd:characterSet>
      <gmd:topicCategory>
        <gmd:MD_TopicCategoryCode>elevation</gmd:MD_TopicCategoryCode>
      </gmd:topicCategory>
      <gmd:extent>
        <gmd:EX_Extent>
          <gmd:geographicElement>
            <gmd:EX_GeographicBoundingBox>
              <gmd:westBoundLongitude>
                <gco:Decimal>-124.003</gco:Decimal>
              </gmd:westBoundLongitude>
              <gmd:eastBoundLongitude>
                <gco:Decimal>-121.996</gco:Decimal>
              </gmd:eastBoundLongitude>
              <gmd:southBoundLatitude>
                <gco:Decimal>47.9995</gco:Decimal>
              </gmd:southBoundLatitude>
              <gmd:northBoundLatitude>
                <gco:Decimal>49.0024</gco:Decimal>
              </gmd:northBoundLatitude>
            </gmd:EX_GeographicBoundingBox>
          </gmd:geographicElement>
        </gmd:EX_Extent>
      </gmd:extent>
      <bag:verticalUncertaintyType>
        <bag:BAG_VertUncertCode codeList="http://www.opennavsurf.org/schema/bag/bagCodelists.xml#BAG_VertUncertCode" codeListValue="unknown">unknown</bag:BAG_VertUncertCode>
      </bag:verticalUncertaintyType>
    </bag:BAG_DataIdentification>
  </gmd:identificationInfo>
  <gmd:dataQualityInfo>
    <gmd:DQ_DataQuality>
      <gmd:scope>
        <gmd:DQ_Scope>
          <gmd:level>
            <gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="dataset">dataset</gmd:MD_ScopeCode>
          </gmd:level>
        </gmd:DQ_Scope>
      </gmd:scope>
      <gmd:lineage>
        <gmd:LI_Lineage>
          <gmd:processStep>
            <bag:BAG_ProcessStep>
              <gmd:description>
                <gco:CharacterString/>
              </gmd:description>
              <gmd:dateTime>
                <gco:DateTime/>
              </gmd:dateTime>
              <bag:trackingId>
                <gco:CharacterString/>
              </bag:trackingId>
            </bag:BAG_ProcessStep>
          </gmd:processStep>
          <gmd:processStep>
            <bag:BAG_ProcessStep>
              <gmd:description>
                <gco:CharacterString>Designated soundings applied by automated procedure.</gco:CharacterString>
              </gmd:description>
              <gmd:dateTime>
                <gco:DateTime/>
              </gmd:dateTime>
              <bag:trackingId>
                <gco:CharacterString>0</gco:CharacterString>
              </bag:trackingId>
            </bag:BAG_ProcessStep>
          </gmd:processStep>
        </gmd:LI_Lineage>
      </gmd:lineage>
    </gmd:DQ_DataQuality>
  </gmd:dataQualityInfo>
  <gmd:metadataConstraints>
    <gmd:MD_LegalConstraints>
      <gmd:useConstraints>
        <gmd:MD_RestrictionCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_RestrictionCode" codeListValue="otherRestrictions">otherRestrictions</gmd:MD_RestrictionCode>
      </gmd:useConstraints>
      <gmd:otherConstraints>
        <gco:CharacterString>Not for navigation, not to be redistributed</gco:CharacterString>
      </gmd:otherConstraints>
    </gmd:MD_LegalConstraints>
  </gmd:metadataConstraints>
  <gmd:metadataConstraints>
    <gmd:MD_SecurityConstraints>
      <gmd:classification>
        <gmd:MD_ClassificationCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ClassificationCode" codeListValue="unclassified">unclassified</gmd:MD_ClassificationCode>
      </gmd:classification>
      <gmd:userNote>
        <gco:CharacterString>Contact Pete Wills for inquiries: Peter.Wills@dfo-mpo.gc.ca, 250-363-6384</gco:CharacterString>
      </gmd:userNote>
    </gmd:MD_SecurityConstraints>
  </gmd:metadataConstraints>
</gmi:MI_Metadata>

To get information out of the tree we need to deal with the namespaces that are used for the various tags:

[16]:
root.nsmap
[16]:
{'bag': 'http://www.opennavsurf.org/schema/bag',
 'gco': 'http://www.isotc211.org/2005/gco',
 'gmd': 'http://www.isotc211.org/2005/gmd',
 'gmi': 'http://www.isotc211.org/2005/gmi',
 'gml': 'http://www.opengis.net/gml/3.2',
 'xlink': 'http://www.w3.org/1999/xlink',
 'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}

Building the tags that we need to get to the resolution, and then walking the tree to get the resolution and its units:

[17]:
sri = etree.QName(root.nsmap['gmd'], 'spatialRepresentationInfo').text
adp = etree.QName(root.nsmap['gmd'], 'axisDimensionProperties').text
dim = etree.QName(root.nsmap['gmd'], 'MD_Dimension').text
res = etree.QName(root.nsmap['gmd'], 'resolution').text
res_meas = etree.QName(root.nsmap['gco'], 'Measure').text
[18]:
resolution = (
    root
    .find('.//{}'.format(sri))
    .find('.//{}'.format(adp))
    .find('.//{}'.format(dim))
    .find('.//{}'.format(res))
    .find('.//{}'.format(res_meas))
)
print(resolution.text, resolution.get('uom'))
500 Metres

There might be a more elegant way of doing the sequence of finds above if one were to dig more deeply into XPATH syntax.

Similarily for the data region boundaries:

[19]:
id_info = etree.QName(root.nsmap['gmd'], 'identificationInfo').text
bag_data_id = etree.QName(root.nsmap['bag'], 'BAG_DataIdentification').text
extent = etree.QName(root.nsmap['gmd'], 'extent').text
ex_extent = etree.QName(root.nsmap['gmd'], 'EX_Extent').text
geo_el = etree.QName(root.nsmap['gmd'], 'geographicElement').text
geo_bb = etree.QName(root.nsmap['gmd'], 'EX_GeographicBoundingBox').text

west_bound_lon = etree.QName(root.nsmap['gmd'], 'westBoundLongitude').text
east_bound_lon = etree.QName(root.nsmap['gmd'], 'eastBoundLongitude').text
north_bound_lat = etree.QName(root.nsmap['gmd'], 'northBoundLatitude').text
south_bound_lat = etree.QName(root.nsmap['gmd'], 'southBoundLatitude').text

decimal = etree.QName(root.nsmap['gco'], 'Decimal').text
[20]:
bbox = (
    root
    .find('.//{}'.format(id_info))
    .find('.//{}'.format(bag_data_id))
    .find('.//{}'.format(extent))
    .find('.//{}'.format(ex_extent))
    .find('.//{}'.format(geo_el))
    .find('.//{}'.format(geo_bb))
)
west_lon = (
    bbox
    .find('.//{}'.format(west_bound_lon))
    .find('.//{}'.format(decimal))
)
print('west:', west_lon.text)

east_lon = (
    bbox
    .find('.//{}'.format(east_bound_lon))
    .find('.//{}'.format(decimal))
)
print('east:', east_lon.text)

north_lat = (
    bbox
    .find('.//{}'.format(north_bound_lat))
    .find('.//{}'.format(decimal))
)
print('north:', north_lat.text)

south_lat = (
    bbox
    .find('.//{}'.format(south_bound_lat))
    .find('.//{}'.format(decimal))
)
print('south:', south_lat.text)
west: -124.003
east: -121.996
north: 49.0024
south: 47.9995