You need Python >= 3.7.
Depending on your system settings you might need to run the above with root rights and replace pip by pip3, e.g. sudo pip3 install soil-sdk
This will ask for your credentials. You will need an application id and an API key provided by the admin. Usually, you should leave the default URL provider. In case you enter a wrong input, run soil configure --reset and reenter the required information again.
An example configuration file should look as follows:
This will ask you your Amalfi credentials and store a token in the folder
Note that you always have to run login if you have made any changes in the steps above.
This will generate a folder with the boilerplate elements and some example code.
A SOIL Application has three main concepts:
- Scripts: these scripts won't run outside the SOIL platform and are the entry points to the application.
- Modules: They run inside the SOIL platform and contain instructions to transform the data. They are decorated with
@modulifyand can be written or imported from the SOIL library.
- Data structures: They contain the data, the metadata and instructions on how to serialize, deserialize and query it.
A SOIL application consists in a set of scripts that will upload or query some data, transform it and store it again.
A script will contain one or more pipelines that will look like this:
Pipelines are lazily evaluated, this means that they are not ran until the data is needed. In the example the pipeline won't run until the line
print(statistics.data). This way the data transfer is minimized. The calls to a data structure that will trigger a pipleine run are
ds.get_data(**kwargs). The pipeline only runs with the first call to the data and it is blocking. This means that the line
print(statistics.data) is blocking but
print(statistics.metadata) is not.
Intermediate results are not stored, meaning that if we want to do for example
women.data the partial pipeline will run even if we have computed the intermediate results before to get