How to Develop
When you first come to InDiCA as a developer, you may find the idea of making even small changes to be quite daunting. This guide will try to walk you through how to start developing and explain the steps for adding different types of functionality. Before you continue, it is strongly recommended that you read about the Code Design.
Obtaining a Copy of the Code
The git version control system is used when developing InDiCA. If you are not familiar with it, there are many tutorials available online. The code is hosted on GitHub. You can download a copy with the command
# If you have permission to make changes to the repository
git clone git@github.com:ukaea/Indica.git
# If you do not have permission and won't contribute changes upstream
git clone https://github.com/ukaea/Indica.git
If you want to contribute your changes upstream but don’t have permission to do so, fork the repository and do your development there. Then make a pull request (see below) from you forked version.
All development should be performed on a dedicated branch:
git branch my_new_feature
git checkout my_new_feature
Setting up the Development Environment
InDiCA development uses Poetry to manage dependencies, control the testing environment, and handle packaging. Follow the instructions on installing poetry. In the repository, run:
poetry install
This will install all the necessary dependencies in a virtual environment. To run a command from this virtual environment, use:
poetry run <command>
Next you need to set up pre-commit to enable the automatic running of various checks before you can commit your code. Just run:
poetry run pre-commit install
Now open your favourite text editor/IDE and start developing!
Adding Features
InDiCA provides a framework for analysing and performing calculations with diagnostic data. The vast majority of new features will not require any fundamental changes to this framework and only the adding of new classes within it.
Reading from New Databases
To support a new database (e.g., one at another fusion experiment),
you must subclass DataReader
. The base
class provides all of the functionality for assembling data from
different diagnostics into DataArray objects. It also handles the
creation of provenance. Subclasses must implement logging into the
database (if necessary) and fetching the raw data.
An example of implementing such a reader can be seen in
PPFReader
. You should
place your new class in the readers
directory and make it
available in the __init__.py
module of said directory.
Reader Constructor
You should first write the constructor for your class. This must make a call to the parent’s constructor, providing the following information:
- tstart
Time into the pulse at which to start keeping data.
- tend
Time into the pulse at which to stop keeping data.
- max_freq
The maximum frequency of data collection. If data is collected above this frequency, some of it will be dropped.
- selector
A callback allowing the user to select which channels to drop when reading data.
- sess
The
Session
object for this run of the code.
As such, you will want these to be arguments for your new reader class’s constructor. You will also probably want an argument identifying which pulse to read data for and possibly one indicating the server holding the database. Extra arguments such of these should be passed to the parent constructor as keyword arguments, for use when creating provenance.
Within the constructor you should also do anything required to set up reading of data, such as instantiating the database client.
Authentication
Most likely your database will require a username and/or password to
log in. If this is the case, you should implement the
requires_authentication()
property
(indicating when this is necessary) and the
authenticate()
method.
Diagnostic Fetching Methods
Each diagnostic fetcher in the base reader class (e.g.,
get_radiation()
,
get_thomson_scattering()
, etc.)
requires a corresponding private version of the method to be
implemented which returns the raw data as NumPy arrays. Each of these
private methods have docstrings describing the data they must
return. Not all reader classes need to implement all diagnostics.
For each diagnostic you implement, you must provide some information
on the sort of data it can return. First, you should define a
static/class-level attribute INSTRUMENT_METHODS
, which is a dictionary
mapping between INSTRUMENT names (in the JET parlance; they are the
“instrument” argument to the getter methods) and the specific
get-method used to read that data. In effect, this is defining what
type of diagnostic each supported instrument provides.
You may also need to provide an _IMPLENTATION_QUANTITIES
static
attribute. This is similar to the _AVAILABLE_QUANTITIES
attribute
in the base class. The latter describes default quantities which are
available for each diagnostic and what datatype they will have. _IMPLEMENTATION_QUANTITIES
allows you to
override this for specific instruments. It maps from instrument names
to dictionaries. These dictionaries have keys that are the name of
available quantities and values that are the datatype of the quantity.
Note
You may wish to cache the raw data you have fetched from the
database, to speed up future reading. This is done by
PPFReader
. However, it is not mandatory.
Bad Channels
Sometimes a channel is known to provide bad data. This might be
because it corresponds to a line of sight which is facing the
divertor. You must implement the private method _get_bad_channels
which will return a list of these channels given a particular
instrument and quantity.
Provenance
Most of the work of generating provenance is handled by the base
class. However, you should provide a NAMESPACE
attribute on the
child class, as either a class or and object attribute. This is tuple
containing a short name for the namespace and a URL. This URL will
likely be that of the server you are fetching the data from. (See
information on PROV namespaces.)
Additionally, you will see that when implementing the private getter
methods you are required to include <quantity_name>_records
data
in the result. This is a list of strings, each of which should
uniquely identify the database records you have accessed. This does
not need to contain any information on the database URL, however, as
that will be included by the base class.
Supporting New Coordinate Systems
To implement a converter for a new coordinate system, you need to
subclass CoordinateTransform
. You will
then need to provide the methods
convert_to_Rz()
,
convert_from_Rz()
, and
__eq__()
. A good
example to start from when creating a new coordinate transform is
TransectCoordinates
. You should place
your new class in the readers
directory and make it available in
the __init__.py
module of said directory.
Standard Functionality
You will most likely also need to provide a constructor, although
there are no particular constraints or requirements on what this
does. It should just take whatever configurations are needed for your
coordinate system. You will also need to declare the attributes
x1_name
and x2_name
which are the names which should be used
for the first and second spatial coordinates. These may be class
attributes, if they will always be the same for these types of
coordinates (e.g., in the
TrivialTransform
. Alternatively, they
may be object attributes if each instance can represent a distinct
coordinate system (e.g.,
TransectCoordinates
).
The convert_from_Rz
and convert_to_Rz
methods are fairly
self-explanatory. They should convert from R-z coordinates to your new
coordinate system and vice versa, respectively. The equality operator
should check whether two transforms describe identical coordinate
systems. It must start with a call to the _abstract_equals
method,
which will check equality of attributes on the base class and the
coordinate names.
Shortcut Methods
Some coordinate systems have a natural means of converting between
each other (e.g., LinesOfSightTransform
and ImpactParameterCoordinates
) which
will be more efficient than doing so via the R-z system. Often
calculation of R-z coordinates will require converting to this other
coordinate system first. In those cases you should implement those
calculations in separate methods. You can then override the
get_converter()
method
to return one of these “shortcut” methods for converting to the other
coordinate system, if such a shortcut is available. Otherwise, it
should just return None
.
There are some subtleties to this of which you should be wary. First,
often such a shortcut conversion will only be possible for a
particular instance of the other coordinate system (as is the case
for lines of sight and impact parameters: the shortcut only makes
sense of lines of sight coordinates are the same ones for which the
impact parameters were calculated). It should also be noted that the
get_converter
method has the argument reverse
, which indicates
that you are looking for the reverse conversion (convert from the
other coordinates system to this one, instead of from this one to the
other). If reverse == False
and you could not find a suitable
converter on your object, you should always make a call to
other.get_converter(self, reverse=True)
to see if that object has
a suitable conversion method. This is necessary because often the
necessary information for both directions of the conversion is only
held by one of the coordinate systems and it must implement both of
the shortcut methods.
Other Notes
In rare cases it may be necessary to implement a custom
distance()
method. This is the case if, for some reason, the distance between
successive points in R-z space does not correspond to the actual
distance along the coordinate. This would happen if the coordinate has
some component in the toroidal direction, as is the case for the
LinesOfSightTransform
.
Coordinate transforms do not record provenance.
Performing New Operations
Most development will likely focus on performing new calculations with
the data. This will require you to create new subclasses of
Operator
. You will need to implement a
constructor, return_types()
and
the __call__()
methods. CalcZeff
provides a simple
example of an operator which you can examine. New operator classes
should be placed in the operators
directory and made available in
the __init__.py
file in that directory.
Operator Constructor
Your constructor must make a call to the constructor on the parent
class. This requires you to pass a Session
object, which your subclass’s constructor should also take as an
argument. Any other arguments to the subclass’s constructor should be
passed as keyword arguments to the superclass’s constructor so they
can be included in provenance.
Argument Types and Return Types
All operators must provide an ARGUMENT_TYPES
attributes, which is
a list of datatypes. This may be either
a class attribute or an instance attribute, as appropriate. Datatypes
in the list may contain None
for the specific datatype and/or, in the
case of data arrays, the general datatype as well. This indicates that
the type is unconstrained. The final element of the list may be an
Ellipsis object (...
), which indicates that the operator takes
variadic arguments. The type of the variadic arguments will be that of
the penultimate item in the list. If that datatype is unconstrained
(i.e., contains None
) then the type of all variadic arguments must
match that of the first variadic argument.
You will also need to implement the
return_types()
method. This takes
datatypes as arguments. These correspond to the datatypes of some
hypothetical arguments for the operator. The method will then return a
tuple of the datatypes of the results of the operator. The number and
types of results will often depend on the number and types of
arguments to the operator, hence why a method is needed to determine
them.
The Calculation Itself
The operator’s calculation is performed in the
__call__()
method. These methods
break strict static typing, as each operator will take a different
number of arguments, with different names. However, all arguments must
be positional and none should be optional. Variadic positional
arguments are allowed. In order to prevent mypy from complaining that
your __call__
method does not match the call signature of the
original on the base class, you should add # type:
ignore[override]
to the method declaration.
The first thing you should do in the method is call
validate_arguments()
. This will
check that all arguments are of the expected type. It will also take
note of these arguments for the purpose of generating provenance.
Your operator should then proceed to the calculation. The details of this will vary greatly from case to case. If your calculation is expected to take a long time, it may be worth printing some messages describing the progress. These can also be useful for debugging.
Remember that many, many mathematical operations are
available in SciPy and do not need to be
implemented from scratch. When performing coordinate transformations,
make use of the
convert_coords()
method (and
its equivalent for datasets) so results will be stored in the data
array/dataset and be available for later reuse. If you need to perform
an interpolation, remember that xarray offers this builtin
(xarray.DataArray.interp()
) and, should you need to perform
cubic interpolation over two dimensions, this is available through the
interp2d()
method.
It is often useful to return intermediate results of the calculation for reuse elsewhere. If there is some output of the calculation that is neither a dataset nor a data array, then it can be assigned as metadata to one of the other results.
Once the calculation is finished, be sure that all of your results
have the necessary metadata, such as datatype, coordinate transform,
and an equilibrium set. You will also need to assign provenance data
to each result. You should do this by calling
assign_provenance()
for each
result.
Contributing Changes Upstream
If you implement new features, you should consider submitting them for inclusion in the official version of InDiCA. You can do this by submitting a pull request on GitHub. In your pull request, please explain what you have implemented, with reference to any of the repository’s issues which it may address.
In order for your pull request to be accepted, it must meet the following standards:
pass all pre-commit hooks (e.g., it must obey the black formatting style
use Python type-hints wherever possible
pass mypy
provide unit tests (and ideally integration tests as well)
not introduce any regressions in existing functionality
provide docstrings for all functions and classes, using the NumPy style
depending on the sort of change you make, explain the new features in the sphinx documentation held in the
doc/
directory.
Most of these will be checked automatically by the continuous integration system when you create your pull request. You should also expect your code to undergo review and you may be requested to make various stylistic changes or adopt a more idiomatic approach to using InDiCA features.