6. Reading and Writing Data

Larch has several built-in functions for reading and writing scientific data. The intention is that the types and varieties of supported files will increase. In addition, because standard Python modules can be used from Larch, many types of standard data and image types can be used by importing the appropriate Python module. This chapter describes the Larch functions for data handling.

6.1. Simple ASCII Column Files

A simple way to store small amounts of numerical data, and one that is widely used in the XAFS community, is to store data in plaintext (ASCII encoded) data files, with whitespace delimited numbers layed out as a table, with a fix number of columns and rows indicated by newlines. Typically a comment character such as “#” is used to signify header information. For instance:

# room temperature FeO.
# data from 20-BM, 2001, as part of NXS school
#------------------------
#   energy     xmu       i0
  6911.7671  -0.35992590E-01  280101.00
  6916.8730  -0.39081634E-01  278863.00
  6921.7030  -0.42193483E-01  278149.00
  6926.8344  -0.45165576E-01  277292.00
  6931.7399  -0.47365589E-01  265707.00

This file and others like it can be read with the builtin read_ascii() function.

_io.read_ascii(filename, comentchar='#;*%', labels=None)

opens and read an plaintext data file, returning a new group containing the data.

Parameters:
  • filename (string) – name of file to read.
  • commentchar (string) – string of valid comment characters
  • labels (string, None, or False) – string to split for column labels

Some examples of read_ascii():

larch> g = read_ascii('mydata.dat')
larch> show(g)
== Group ascii_file mydata.dat: 6 symbols ==
  attributes: <Group header attributes from mydata.dat>
  column_labels: ['energy', 'xmu', 'i0']
  energy: array<shape=(412,), type=dtype('float64')>
  filename: 'mydata.dat'
  i0: array<shape=(412,), type=dtype('float64')>
  xmu: array<shape=(412,), type=dtype('float64')>
larch>

which reads the data file and sets array names according to the column labels in the file. You can be explicit:

larch> g = read_ascii('mydata.dat', label='e mutrans monitor')
larch> show(g)
== Group ascii_file mydata.dat: 6 symbols ==
  attributes: <Group header attributes from mydata.dat>
  column_labels: ['e', 'mutrans', 'monitor']
  e: array<shape=(412,), type=dtype('float64')>
  filename: 'mydata.dat'
  monitor: array<shape=(412,), type=dtype('float64')>
  mutrans: array<shape=(412,), type=dtype('float64')>
larch>

and to get the data as a 2-D array:

 larch> g  = read_ascii('mydata.dat', labels=False)
 larch> show(g)
 == Group ascii_file mydata.dat: 4 symbols ==
   attributes: <Group header attributes from mydata.dat>
   column_labels: []
   data: array<shape=(3, 412), type=dtype('float64')>
   filename: 'mydata.dat'
larch>
_io.write_ascii(filename, *args, commentchar='#', label=None, header=None)

opens and writes arrays, scalars, and text to an ASCII file.

Parameters:
  • commentchar – character for comment (‘#’)
  • label – array label line (autogenerated)
  • header – array of strings for header
_io.write_group(filename, group, scalars=None, arrays=None, arrays_like=None, commentchar='#')

write data from a specified group to an ASCII data file. This is pretty minimal and may work poorly for large groups of complex data.

6.2. Using HDF5 Files

HDF5 is an increasingly popular data format for scientific data, as it can efficiently hold very large arrays in a heirarchical format that holds “metadata” about the data, and can be explored with a variety of tools. The interface used in Larch is based on h5py, which should be consulted for further documentation.

_io.h5_group(filename)

opens and maps and HDF5 file to a Larch Group, with HDF5 Groups map as Larch Groups. Note that the full set of data is not read and copied. Instead, the HDF5 file is kept open and data accessed from the file as needed.

An example using h5_group() shows that one can browse through the data heirarchy of the HDF5 file, and pick out the needed data:

larch> g = h5group('test.h5')
larch> show(g)
== Group test.h5: 3 symbols ==
  attrs: {u'Collection Time': ': Sat Feb 4 13:29:00 2012', u'Version': '1.0.0',
          u'Beamline': 'GSECARS, 13-IDC / APS', u'Title': 'Epics Scan Data'}
  data: <Group test.h5/data>
  h5_file: <HDF5 file "test.h5" (mode r)>
larch>show(g.data)
== Group test.h5/data: 5 symbols ==
  attrs: {u'scan_prefix': '13IDC:', u'start_time': ': Sat Feb 4 13:29:00 2012',
        u'correct_deadtime': 'True', u'dimension': 2,
        u'stop_time': ': Sat Feb 4 13:44:52 2009'}
  environ: <Group test.h5/data/environ>
  full_xrf: <Group test.h5/data/full_xrf>
  merged_xrf: <Group test.h5/data/merged_xrf>
  scan: <Group test.h5/data/scan>


larch> g.data.scan.sums
<HDF5 dataset "det": shape (15, 26, 26), type "<f8">

larch> imshow(g.data.scan.sums[8:,:,:])

This interface is general-purpose but somewhat low-level. As HDF5 formats and schemas become standardized, better interfaces can easily be made on top of this approach.

6.3. Reading NetCDF Files

NetCDF4 is an older and less flexible file format than HDF5, but is efficient for storing array data and still in wide use.

_io.netcdf_group(filename)

returns a group with data from a NetCDF4 file.

_io.netcdf_file(filename, mode='r')

opens and returns a netcdf file.

6.4. Reading TIFF Images

TIFF is a popular image format used by many cameras and detectors. The interface used in Larch is based on code from Chrisoph Gohlke.

_io.read_tiff(fname)

reads a TIFF image from a TIFF File. This returns just the image data as an array, and does return any metadata.

_io.tiff_object(fname)

opens and returns a TIFF file. This is useful for extracting metadata and multiple series.

6.5. Working with Epics Channel Access

Many synchrotron facilities use the Epics control system. If the Epics Channel Access layer, which requires network access and configuration discussed elsewhere, are set correcty, then Larch can read and write data from Epics Process Variables (PVs). The interface used in Larch is based on pyepics, which should be consulted for further documentation. The access is encapsulated into three functions:

_io.caget(PV_name, as_string=False)

get the value of the Process Variable. The optional as_string argument ensures the returned value is the string representation for the variable.

_io.caput(PV_name, value, wait=False)

set the value of the Process Variable. If the optional wait is True, the function will not return until the put “completes”. For some types of data, this may wait for some process (moving a motor, triggering a detector) to finish before returning.

_io.PV(PV_name)

create and return an Epics PV object for a Process Variable. This will have get() and put() methods, and allows you to add callback functions which will be run with new values everytime the PV value changes.

6.6. Reading Scan Data from APS Beamlines

This list is minimal, but can be expanded easily to accomodate more facilities and beamlines.

_io.read_mda(filename, maxdim=4)

read a binary MDA (multi-Dimensional Array) file from the Epics SScan Record, and return a group based on the scans it contains. This is not very well tested – use with caution!

_io.read_gsescan(filename)

read a (old-style) GSECARS Escan data file into a group.

_io.read_stepscan(filename)

read a GSECARS StepScan data file into a group.

6.7. Reading XAFS Data Interchange (XDI) Files

The X-ray Data Interchange Format has been developed as part of an effort to standardize the format of XAFS data files (see xdi.)

_io.read_xdi(filename)

read an XDI data file into a Larch group.

6.8. Saving and Restoring Larch Groups

It is often useful to save groups of data and be able to open them again later. The save() / restore() mechanism here allows you to save the state of a number of Larch groups and use them in another session.

Some precautions should be kept in mind, as not all Larch data is easily transferrable. Most importantly, Python functions cannot be saved to any sort of data that can be recovered in a meaningful way. This is actually not as big of a problem as you might expect: you want to save data, and the functions will be present in the later session. All the built-in Larch groups and data structures can be saved and restored.

_io.save(filename, list_of_groups)

save a set of Larch groups and data into an HDF5 file.

_io.restore(filename, group=None)

recover groups from a Larch ‘save’ file. If group is None, the groups in the save file will be returned (in the order in which they were saved). If group is an existing Larch group, the groups in the save file will be put inside that group, and will not be returned.