On the collaborative task and the compute facilities usage
Several sources of compute will be available for the collaborative tasks. Access to these sources follows some rules that are explained here.
The Jean Zay supercomputer
Access to the supercomputer is restricted and only given to specific individuals (not organisations) as such access is currently possible for a selected number of participants.
Current list of participants with access to the supercomputer: [Meta-group] Members with access to Jean Zay
We don’t expect the Jean Zay supercomputer to be openly accessible to all the participants.
The access to this facility will remain limited to a limited number of participants for the reasons detailed below:
- Access to the supercomputer is only possible from a fixed and static IP address located in France
- Access is only granted after a background check from the French government which can take a few months (no recourse is possible if the decision is negative)
- The project was originally filed by HuggingFace as the entity so extension should be validated by the supercomputer administrative staff first
- The project is defined with regard to the supercomputing facility by the scope outlined in the original grant application. Experiments running on the supercomputing facility as part of the project should in any case stay in-line with this proposal.
This being said, adding other participants can be discussed on a case-by-case basis and in relation with the supercomputer administrative staff as well as the Public Infrastructure Steering Committee.
Other sources for compute budgets
While the supercomputer has the interest of supplying a large number of GPUs usable in parallel, many sub-tasks in the shared task does not require such a large GPU cluster and can be either run on CPU (data processing) or on smaller GPU clusters. Current options are under investigation, and we will provide later updates.
In general, the following process has been proposed to organize experiments a bit more:
- Experiments are discussed on the discussion forum. Scientific question, hypothesis, potential outcome, next steps if it fails/succeed
- If participants can run an experiment on their own server, that's the simplest case. Otherwise, (and especially) if it's larger scale, the experiments should be designed to run on other credits through OVH (or someone else) or we run it on JZ (but let's keep the biggest chunk of it for the very last scaling)
- At any time, the collaborative task should have a "spreadsheet" tracking the current experiments, and where + by who.