15.2.2. Differences between Larch and Python

Larch is based on and very similar to Python, but there are some very important differences that need to be noted, especially for those familiar with Python. These differences are not here because we feel Python is somehow inadequate or imperfect, but because Larch is intended as a domain-specific-language. The fact that Larch’s syntax is so close to Python is really something of an implementation detail, so straying from the “purity” of Python shouldn’t be seen as an allegation of imperfection on either Larch’s or Python’s parts – they have different goals. The principle differences with Python are:

  1. Using ‘end*’ instead of significant white-space for code blocks.

  2. Groups versus Modules

  3. Changing the lookup rules for finding symbol names

  4. Not implementing several python concepts, notably Class, and lambda.

Each of these is discussed in more detail below.

Some background and discussion of Larch implementation may help inform the discussion below, and help describe many of the design decisions made in Larch. First and foremost, Larch is designed to be a domain-specific macro language that makes heavy use of Python’s wonderful tools for processing scientific data. Having a macro language that was similar to Python was not the primary goal. The first version of Larch actually had much weaker correspondance to Python. It turned out that the implementation was much easier and more robust when using syntax close to Python.

When larch code is run, the text is first translated into Python code and then parsed by Python’s own ast module which parses Python code into an abstract syntax tree that is much more convenient for a machine to execute. As a brief description of what this module does, the statement:

a*sin(2*b)+c

will be parsed and translated into something like:

Add(Name('c'), Mult(Name('a'), Call(Name('sin'), Args([Mult(Num(2), Name('b'))]))))

Larch then walks through this tree, executing each Add, Mult, etc on its arguments. If you’ve ever done any text processing or thought about how a compiler works, you’ll see that having this translation step done by proven tools is a huge benefit. For one thing, using Python’s own interpreter means that Larch simply does not having parsing errors – any problem would be translation of Larch code into Python, or in executing the compiled code above. This also makes the core code implementing Larch much easier (the core functionality is fewer than 3000 lines of code).

Given this main implementation feature of Larch, you can probably see where and how the differences with Python arise:

  • The Larch-to-Python translation step converts the ‘end*’ keywords into significant whitespace (‘commenting out ‘endif’ etc if needed).

  • The lookup for symbols in Name(‘c’) is done at run-time, allowing changes from the standard Python name lookup rules.

  • Unimplemented Python constructs (class, lambda, etc) are parsed, but

You can also see that Python’s syntax is followed very closely, so that the translation from Larch-to-Python is minimal.

15.2.2.1. Code Block Ends

Unlike Python, Larch does not use significant whitespace to define blocks. There, that was easy. Instead, Larch uses “end blocks”, of the form:

if test:
   <block of statements>
endif

Each of the Keywords if, while, for, def, and try must be matched with a corresponding ‘end’ keyword: endif, endwhile, endfor, enddef, and endtry. You do not need an endelse, endelif, endexcept, etc, as this is not ambiguous.

As a special note, you can place a ‘#’ in front of ‘end’. Note that this means exactly 1 ‘#’ and exactly in front of ‘end’, so that ‘#endif’ is allowed but not ‘####endif’ or ‘# endfor’. This allows you to follow Python’s indenting rules and write code that is valid Larch and valid Python, which can be useful in translating code:

for i in range(5)
    print(i, i/2.0)
#endfor

This code is both valid Larch and valid Python.

15.2.2.2. Groups vs Modules

This is at least partly a semantic distinction. Larch organizes data and code into Groups – simple containers that hold data, functions, and other groups. These are implemented as a simple, empty class that is part of the symbol table.

15.2.2.3. Symbol Lookup Rules

Looking up symbol names is a key feature of any language. Python and Larch both allow namespaces in which symbols can be nested into a heirarchy, using a syntax of parent.child, with a dot (‘.’) separating the components of the name. Such parent/child relationships for symbol names are used for modules (files of code), and object attributes. Thus, one could have data objects named:

cu_01.data.chi
cu_02.path1.chi
cu_03.model.chi

Where the name chi is used repeatedly, but with different (and multiple) parents. The issue of name lookups is how to know what (if any) to use if chi is specified without its parents names.

In Python, name lookups are quite straightforward and strict: “local, module”. Here, local means “inside the current function or method” and module means “inside the current module (file of code text)”. More specifically, each function or method and each module is given its own namespace, and symbols are looked for first in the local namespace, and then in the module namespace. These rules are focused on code rather than data, and leads to having a lot of “import” statements at the top of python modules. For example, to access the sqrt() function from the numpy module, one typically does one of these:

import numpy

def sqrt_array(npts=10):
    x = np.arange(npts)/2.
    return numpy.sqrt(x)

or:

import numpy as np

def sqrt_array(npts=10):
    x = np.arange(npts)/2.
    return np.sqrt(x)

In both of these examples the numpy module is brought into the module level namespace, either named as ‘numpy’ or renamed to ‘np’ (a common convention in scientific python code). Inside the function sqrt_array(), the names ‘npts’ and ‘x’ are in the local namespace – they are not available outside the function. The functions arange() and sqrt() are taken from the module-level namespace, using the name as defined in the import statement. A third alternative would be to import only the names ‘sqrt’ and ‘arange’ into the modules namespace:

from numpy import sqrt, arange

def sqrt_array(npts=10):
    x = arange(npts)/2.
    return sqrt(x)

For quick and dirty Python scripts, there is a tendency to use import *, as in:

from numpy import *
def sqrt_array(npts=10):
    x = arange(npts)/2.
    return sqrt(x)

which imports several hundred names into the module level namespace. Many experienced developers will tell you to avoid this like the plague.

In Larch, the general problem of how to lookup the names of objects remains, but the rules are changed slightly. Since Group objects are used extensively throughout Larch exactly to provide namespaces as a way to organize data, we might as well use them. Instead of using import *, Larch has a top-level group ‘_math’ in which it stores several hundred names of functions, mostly from the numpy module. It also uses top-level groups ‘_sys’ and ‘_builtin’, which hold non-mathematical builtin functions and data, and many plugins will add top-level groups (such as ‘_plotter’, ‘_xafs’, and ‘_xray’). So, to access sqrt() and arange() in Larch, you could write _math.sqrt() and _math.arange(). But you don’t have to.

Symbol lookup in Larch uses a list of Groups which is searched for names. This list of groups is held in _sys.searchGroups (which holds the group names) and _sys.searchGroupObjects (which holds references to the groups themselves). These will be changed as the program runs. They can be changed dynamically, this is not encouraged (and can lead to Larch not being able to work well).

Changed in version 0.9.34: _sys.paramGroup is no longer used.

Larch also has 2 special variables that it uses to hold references to groups that are always included in the search of names. These are ‘_sys.localGroup’, which holds the group for a currently running function while it is running; ‘_sys.moduleGroup’, which holds the namespace for a module associated with a currently running function.

15.2.2.4. Unimplemented features

A domain-specific-language like Larch does not need to be as full-featured as Python, so we left a few things out. These include (this may not be an exhaustive list):

  • eval – Larch is sort of a Python eval

  • lambda

  • class

  • global

  • generators, yield

  • decorators