Managing Datacards
The Bailo python client enables intuitive interaction with the Bailo service, from within a python environment. This example notebook will run through the following concepts:
Creating and populating a new datacard on Bailo.
Retrieving datacards from the service.
Making changes to the datacard.
Prerequisites:
Python 3.8.1 or higher (including a notebook environment for this demo).
A local or remote Bailo service (see https://github.com/gchq/Bailo).
Introduction
The Bailo python client is split into two sub-packages: core and helper.
Core: For direct interactions with the service endpoints.
Helper: For more intuitive interactions with the service, using classes (e.g. Datacard) to handle operations.
In order to create helper classes, you will first need to instantiate a Client()
object from the core. By default, this object will not support any authentication. However, Bailo also supports PKI authentication, which you can use from Python by passing a PkiAgent()
object into the Client()
object when you instantiate it.
[ ]:
# Necessary import statements
# Install dependencies...
! pip install mlflow bailo
from bailo import Datacard, Client
# Instantiating the PkiAgent(), if using.
# agent = PkiAgent(cert='', key='', auth='')
# Instantiating the Bailo client
client = Client("http://127.0.0.1:8080") # <- INSERT BAILO URL (if not hosting locally)
Creating a new datacard in Bailo
Creating and updating the base datacard
In this section, we’ll create a new datacard using the Datacard.create()
classmethod. On the Bailo service, a datacard must consist of at least 4 parameters upon creation. These are name, description, visibility and team_id. Below, we use the Client()
object created before when instantiating the new Datacard()
object.
NOTE: This creates the datacard on your Bailo service too! The datacard_id
is assigned by the backend, and we will use this later to retrieve the datacard. Like with models on Bailo, the actual datacard has not been populated at this stage.
[ ]:
datacard = Datacard.create(client=client, name="ImageNet", description="ImageNet dataset consisting of images.", team_id="uncategorised")
datacard_id = datacard.datacard_id
You may make changes to these attributes and then call the update()
method to relay the changes to the service, as below:
datacard.name = "New Name"
datacard.update()
Populating the datacard
When creating a datacard, first we need to generate an empty card using the card_from_schema()
method. In this instance, we will use minimal-data-card-v10. You can manage custom schemas using the Schema()
helper class, but this is out of scope for this demo.
[ ]:
datacard.card_from_schema(schema_id='minimal-data-card-v10')
print(f"Datacard version is {datacard.data_card_version}.")
If successful, the above will have created a new datacard, and the data_card_version
attribute should be set to 1.
Next, we can populate the data using the update_data_card()
method. This can be used any time you want to make changes, and the backend will create a new datacard version each time. We’ll learn how to retrieve datacards later (either the latest, or a specific release).
NOTE: Your datacard must match the schema, otherwise an error will be thrown.
[ ]:
new_card = {
'overview': {
'storageLocation': 'S3',
}
}
datacard.update_data_card(data_card=new_card)
print(f"Datacard version is {datacard.data_card_version}.")
If successful, the data_card_version
will now be 2!
Retrieving an existing datacard
Using the .from_id() method
In this section, we’ll retrieve our previous datacard using the Datacard.from_id()
classmethod. This will create your Datacard()
object as before, but using existing information retrieved from the service.
[ ]:
datacard = Datacard.from_id(client=client, datacard_id=datacard_id)
print(f"Datacard description: {datacard.description}")