Hereafter are the founding members who participated in the writing of the original application submitted to the to the Jean Zay supercomputer.
Hugging Face, at the inception of the BigScience project, develops open-source research tools that are widely used in the NLP community.
GENCI is the architect of the Jean Zay supercomputing facility. IDRIS is the major centre of very high performance intensive numerical computation for the French National Centre for Scientific Research (CNRS) and operates several supercomputing facilities in France, including the Jean Zay supercomputer.
ESPCI and LAMSADE (Dauphine Université, PSL, CNRS)I am a member of the MILES team (Machine Intelligence and Learning Systems) and my research focuses on the development of neural models for NLP and their robustness. I worked on language modelling along with generative models for sequences, learning criterion for large vocabulary applications, and dynamical models.
MELODI team at IRIT/University of ToulouseOur team focuses on semantics and pragmatics, including discourse parsing, sentiment analysis, and bias detection. Contextual embeddings are used mostly for fine-tuning models, and studies on interpretability.
IRISA - LinkMedia team - IMATAG/CNRSOur group develops research in NLP and multimodal analysis (eg. text/image). The expected outcomes of this project would be very valuable for LinkMedia's current research topics relying on the use of artificially generated content: detection of (textual and multimodal) fake news, artificial datasets for classification or tagging and Information Retrieval.
Université de Lorraine, ATILF - UMR 7118 - CNRS / ULOur team is interested in resources, normalization, annotation and exploitation in NLP.
University of ParisThe Laboratoire de Linguistique Formelle is a member of the Labex EFL - Empirical Foundations of Linguistics. The team studies all aspects of language. We participated in the release of the FlauBERT model and adapted the now-famous GPT-2 model in French. The paper is currently under submission and the model, which counts over 1 billion parameters, will be released in open-source upon acceptance.
GdR TAL (CNRS)The GDR TAL is the French academic hub for researchers in NLP. It is interested in language in all its forms: written, oral, signed. It deals with the themes of computer modeling and machine learning of language, its manifestations and applications in society, including their ethics issues.
CNRS DR1, INSERM UMR1093, UBFC, DijonLanguage encoding models help explain language processing in the human brain by learning functions that predict brain responses from the language stimuli that elicited them (Jain & Huth 2018 BioRxiv). The proposed project should lead to a significant advance in the quality of language models that we can use to explain brain responses in humans from the same stimuli. Relevant publication: Uchida, T., Lair, N., Ishiguro, H., & Dominey, P. F. (2021) Neurobiology of Language, 2(1), 83-105.