.. _tutor-io_sec: ======================================================= Tutorial: Reading and Writing Data ======================================================= Larch has several built-in functions for reading scientific data. The intention that the types of supported files will increase. In addition, many Python modules for reading standard types of image data can be used. .. module:: _io :synopsis: Basic Input/Output Functions Simple ASCII Column Files ============================ A simple way to store small amounts of numerical data, and one that is widely used in the XAFS community, is to store data in plaintext (ASCII encoded) data files, with whitespace delimited numbers layed out as a table, with a fix number of columns and rows indicated by newlines. Typically a comment character such as "#" is used to signify header information. For instance:: # room temperature FeO. # data from 20-BM, 2001, as part of NXS school #------------------------ # energy xmu i0 6911.7671 -0.35992590E-01 280101.00 6916.8730 -0.39081634E-01 278863.00 6921.7030 -0.42193483E-01 278149.00 6926.8344 -0.45165576E-01 277292.00 6931.7399 -0.47365589E-01 265707.00 This file and others like it can be read with the builtin :func:`read_ascii` function. .. function:: read_ascii(filename, comentchar='#;*%', labels=None) opens and read an plaintext data file, returning a new group containing the data. :param filename: name of file to read. :type filename: string :param commentchar: string of valid comment characters :type commentchar: string :param labels: string to split for column labels :type labels: string, ``None``, or ``False`` The commentchar argument (#;% by default) sets the valid comment characters: if the first character in a line matches one of these, the line is marked as a header lines. Header lines continue until a line with '#----' (that is, any commentchar followed by 4 '-' The line immediately following that is read as column labels (space delimited) If the header is of the form:: # KEY : VAL (ie commentchar key ':' value) these key-value pairs (all as strings) will be parsed into an 'attributes' sub-group. If labels has the default value ``None``, column labels from the line following the line of '#----' (if available) will be used. If labels is ``False``, the group will have a *data* variable contain the 2-dimensional data. Some examples of :func:`read_ascii`:: larch> g = read_ascii('mydata.dat') larch> show(g) == Group ascii_file mydata.dat: 6 symbols == attributes: column_labels: ['energy', 'xmu', 'i0'] energy: array filename: 'mydata.dat' i0: array xmu: array larch> which reads the data file and sets array names according to the column labels in the file. You can be explicit:: larch> g = read_ascii('mydata.dat', label='e mutrans monitor') larch> show(g) == Group ascii_file mydata.dat: 6 symbols == attributes: column_labels: ['e', 'mutrans', 'monitor'] e: array filename: 'mydata.dat' monitor: array mutrans: array larch> and to get the data as a 2-D array:: larch> g = read_ascii('mydata.dat', labels=False) larch> show(g) == Group ascii_file mydata.dat: 4 symbols == attributes: column_labels: [] data: array filename: 'mydata.dat' larch> .. function:: write_ascii(filename, *args, commentchar='#', label=None, header=None) opens and writes arrays, scalars, and text to an ASCII file. :param commentchar: character for comment ('#') :param label: array label line (autogenerated) :param header: array of strings for header .. function:: write_group(filename, group, scalars=None, arrays=None, arrays_like=None, commentchar='#') write data from a specified group to an ASCII data file Using HDF5 Files ======================== HDF5 is an increasingly popular data format for scientific data, as it can efficiently hold very large arrays in a heirarchical format that holds "metadata" about the data, and can be explored with a variety of tools. .. function h5_group(filename) opens and maps and HDF5 file to a Larch Group, with HDF5 Groups map as Larch Groups. Note that the full set of data is not read and copied. Instead, the HDF5 file is kept open and data accessed from the file as needed. An example using :func:`h5_group` shows that one can browse through the data heirarchy of the HDF5 file, and pick out the needed data:: larch> g = h5group('test.h5') larch> show(g) == Group test.h5: 3 symbols == attrs: {u'Collection Time': ': Sat Feb 4 13:29:00 2012', u'Version': '1.0.0', u'Beamline': 'GSECARS, 13-IDC / APS', u'Title': 'Epics Scan Data'} data: h5_file: larch>show(g.data) == Group test.h5/data: 5 symbols == attrs: {u'scan_prefix': '13IDC:', u'start_time': ': Sat Feb 4 13:29:00 2012', u'correct_deadtime': 'True', u'dimension': 2, u'stop_time': ': Sat Feb 4 13:44:52 2009'} environ: full_xrf: merged_xrf: scan: larch> g.data.scan.sums larch> imshow(g.data.scan.sums[8:,:,:]) This interface is general-purpose but somewhat low-level. As HDF5 formats and schemas become standardized, better interfaces can easily be made on top of this approach.