Aller au contenu principal

Install

Dependencies

Before deploying Label studio you need at least PostgreSQL.

S3 MinIO is optional but recommended so we will use it. The alternative is managing persistent storage manually.

PostgreSQL requires monitoring related CRDs as a dependency.

In this repo we'll be using TopoLVM for PostgreSQL but it's optional.

TopoLVM will add cert-manager as a dependency.

Deploy using Helmfile

Helmfile is prepared with all the aforementioned optional dependencies

cd helmfile
helmfile sync -f helmfile.yaml.gotmpl

Deploying using Helm

Alternatively you can deploy Helm charts manually in case you have your own

Dependencies

Here is how to deploy the dependencies:

Monitoring CRDs

helm install monitoringcrds ../monitoring/kube-prometheus-stack/charts/crds \
--create-namespace -n monitoring

Cert manager and topolvm

helm install cert-manager ../cert-manager/cert-manager \
--create-namespace -n kosmos-system \
-f helmfile/values/cert-manager.yaml
helm install topolvm ../lvm-csi/topolvm\
--create-namespace -n kosmos-system \
-f helmfile/values/lvm-csi.yaml

PostgreSQL

helm install cnpg ../postgresql/cloudnative-pg \
--create-namespace -n kosmos-sql \
-f helmfile/values/psql-operator.yaml
helm install ../postgresql/cluster \
--create-namespace -n kosmos-sql \
-f helmfile/values/psql-minimal.yaml

S3 Minio

helm install operator ../s3/operator\
--create-namespace -n kosmos-s3 \
-f helmfile/values/s3-operator-min.yaml
helm install minio-secrets ../s3/minio-secrets\
--create-namespace -n kosmos-s3
helm install s3-tenant ../s3/tenant-5.0.15.tgz\
--create-namespace -n kosmos-s3\
-f helmfile/values/s3-tenant-min.yaml

Deploy Label Studio

Before deploying Label Studio, we first need to initiate a PostgreSQL database, as well as a S3 bucket. Once these are initiated, we also need to store the associated credentials in a secret that will be located in the same namespace as Label Studio.

To do so, initpg, inits3 and label-studio-secrets Helm charts are used.

cd kosmos-apps/init-datastore

helm install label-studio-initpg initpg \
--namespace kosmos-sql \
--create-namespace \
--set appDbUserPrefix="labelstudio"
--set appDbName="labelstudio"

helm install label-studio-inits3 inits3 \
--namespace kosmos-s3 \
--create-namespace \
--set appBucketUserPrefix="labelstudio"
--set appBucketName="labelstudio"


cd kosmos-apps/label-studio
# generate secrets
helm install label-studio-secrets label-studio-secrets \
--namespace kosmos-data \
--create-namespace \
--set DB_PASSWORD=$(kubectl get secret -n kosmos-sql label-studio-initpg-secret -o jsonpath="{.data.app_db_password}" | base64 --decode) \
--set MINIO_ACCESS_KEY=$(kubectl get secret -n kosmos-s3 mlflow-inits3-secret -o jsonpath="{.data.app_bucket_user}" | base64 --decode) \
--set MINIO_SECRET_KEY=$(kubectl get secret -n kosmos-s3 mlflow-inits3-secret -o jsonpath="{.data.app_bucket_password}" | base64 --decode) \

# install label-studio
helm install label-studio label-studio \
--namespace kosmos-data \

Configuration

Accessing UI

If your UI doesn't load correctly then you either don't have LABEL_STUDIO_HOST set for some reason or you didn't enter http(s):// in the beginning of your url. Make sure you explicitly enter the protocol instead of letting the browser infer it.

Main parameters

First thing to know is that the variable LABEL_STUDIO_HOST is very important in a Kubernetes setup because it's an external host, it is configured automatically if your ingress is enabled.

When this env variable is set, all links are generated with an absolute path, and using LABEL_STUDIO_HOST as host, this also impacts the s3 content, even if you have s3.endpointUrl set. If the variable isn't set, links will be relative and won't work if the browser doesn't have access to the internal host.

You can configure S3 persistence and admin username password this way:

We need to set up a default user, remember that the default user name has to include @ as label studio expects an email. We are also using an external domain since we're in a Kubernetes ingress setting.

These are the parameters needed for that:

app:
extraEnvironmentVars:
LABEL_STUDIO_USERNAME: admin@localhost #or admin@athea.tech
LABEL_STUDIO_PASSWORD: password
LABEL_STUDIO_HOST: https://label-studio.wip

We will be using s3 storage for persistence, however one thing to keep in mind is that all connections to S3 backend are not done by the server, they are done directly by the client so we need to use S3 API ingress.

Here is the configuration for that:

global:
persistence:
enabled: true
type: s3 # s3, azure, gcs
config:
s3:
accessKeyExistingSecret: "label-studio-secrets"
accessKeyExistingSecretKey: "MINIO_ACCESS_KEY"
secretKeyExistingSecret: "label-studio-secrets"
secretKeyExistingSecretKey: "MINIO_SECRET_KEY"
region: "us-east-1"
endpointUrl: "http://minio.kosmos-s3.svc.cluster.local"

Considerations

There are things to look out for here.

Label Studio uses S3 storage in two main different ways:

  • Through the browser to retrieve annotation files via urls to fill the labeling interface. This goes through ingress minio.<domain>
  • Through the container to retrieve task files, check connections, push outputs, save tasks, etc. This goes through service minio.<namespace>.svc.cluster.local.

However for both these different ways, the same URL is used and you cannot submit one for each. This can be worked around by adding a new rewrite entry to CoreDNS to make it redirect minio's ingress domain name to service whenever requested from inside the cluster.

Once this is handled, you'll also have a problem with mixed-content since this means Label Studio which is in HTTPS contacts the Minio service which is in HTTP. Once you work around this by allowing mixed-content you'll have to provide the ingress CA to label studio's trust store via environment variable CUSTOM_CA_CERTS.

Umbrella Helm Installation

With the "label-studio-umbrella" Helm chart, you can process to an out-of-the-box Label Studio installation using Rancher UI.

Prerequisites:

  • Zot registry must be available in Rancher registry list (see link )
  • A kosmos postgresql cluster must be available in kosmos-sql namespace
  • A kosmos s3 cluster must be available in kosmos-s3 namespace
  • "kosmos" realm must be present in the main Keycloak (the one in kosmos-iam namespace)
  • A clusterissuer in the Kubernetes installation

Namespace

Each Label Studio instance needs a dedicated namespace.

Create your dedicated namespace, in the project of the tenant, in the Rancher interface.

You can then install Label Studio in this namespace

Find label studio-umbrella in Zot Registry

In Rancher UI, click on Apps, then on Charts. You can then select "zot" in the repositories displayed on the left and search for the "Label Studio" chart, which is a prettier display name for label-studio-umbrella chart.

Complete the form

Select the namespace you just created and define a release name for you Label Studio instance

attention

Please don't use any special characters in your chart name, only alphanumeric (no -, _ for exemple)

Last step before installing, you need to define a few options exposed by the form:

attention

Options are grouped by functionnal scope. These sets of options are located on the left of the form.

The way Rancher displays these sets makes them easy to be unnoticed, especially when you scroll down the form, so please make sure to look for them.

Deployment Configuration :

  • Domain: end of your Label Studio url (https://(your_release_name).(your_domain)/), please make sure it is valid:
  • Enforce network policies: Does your Kubernetes cluster require network policies to allow intra-cluster communications ?
  • Display tile: Should a Label Studio tile be displayed in the portal ? if true :
    • Portal tile name: Title of the tile to be displayed in the portal
    • Portal tile description: Description to be displayed in the tile
  • Certificate manager: find it with "kubectl get clusterissuer" command

Label Studio Config :

  • Admin email: username for the local Label Studio admin account
  • Bucket Name: name of the S3 bucket to persist data. It is needed to be defined manually because of Label Studio chart limitations.
attention

Please give your bucket a name that is seemingly unique to avoid erasing an already-existing bucket ( i.e. labelstudio-(your_release_name )

Role Mappings :

  • Keycloak groups to be mapped to each Label Studio role (non-existent groups will be created)
attention

Make sure to write down the fullname for the Keycloak groups, i.e. if they need to be suffixed, you must append it manually in the following form : role_(your_release_name).

If you want a Label Studio role to not be mapped to any Keycloak group, just leave an empty line under the question, if you completly delete all the entries under a question, Helm default values will be taken into account.

Edit de YAML file :

Clic the button "Edit YAML" and add

inits3:
s3Endpoint: http://minio.kosmos-s3.svc.cluster.local

The inits3 section should already exists, just add s3Endpoint in it

Install :

Once you have filled the form click on the "Install" button.

Label Studio installation will :

  • Create a technical postgresql database
  • Create a technical s3 bucket
  • Create a Keycloak client (as well as realm roles and groups if they don't already exist)
  • If you checked the "Display tile ?" option : write a line in kosmos 'portal' database, and a right in Keycloak 'portal' client (named labelstudio-(your_release_name)-(namespace)_IHM)
attention

All those elements will not be deleted on Label Studio uninstallation.

Not deleting the postgresql database will likely result in a failure for a further deployment with same release name and namespace.

Access to Label Studio UI

To access your application in an environment with a WAF reverse proxy, you should create a WAF entry, add a certificate if needed and add a DNS entry in the zone technique.artemis.

To grant access to your application to one or more user, give your user one of the access group defined through "Role mapping" section in the form.

To access Label Studio as admin, once you are logged with your Keycloak account, with the admin username you defined through "Label Studio Config" section in the form, you have to retrieve the associated password located in a secret deployed by the umbrella.

Via Rancher UI

In the namespace selector on top of the screen, filter on the namespace you deployed Label Studio in.

Then, on the resource selector on the left menu, click on Storage > Secrets.

Then you can search for the secret named "label-studio-secrets", your password is under the key "LABEL_STUDIO_PASSWORD".

With kubectl

kubectl --namespace <your-namespace> get secret label-studio-secrets -o jsonpath='{.data.LABEL_STUDIO_PASSWORD}' | base64 --decode