Founding members

Hereafter are the founding members who participated in the writing of the original application submitted to the to the Jean Zay supercomputer.

Thomas Wolf, Victor Sanh, Yacine Jernite (HuggingFace - open-science team)

Hugging Face, at the inception of the BigScience project, develops open-source research tools that are widely used in the NLP community.

Stéphane Requena (GENCI), Pierre-François Lavallée (IDRIS)

GENCI is the architect of the Jean Zay supercomputing facility. IDRIS is the major centre of very high performance intensive numerical computation for the French National Centre for Scientific Research (CNRS) and operates several supercomputing facilities in France, including the Jean Zay supercomputer.

Alexandre Allauzen (Professor)

ESPCI and LAMSADE (Dauphine Université, PSL, CNRS)I am a member of the MILES team (Machine Intelligence and Learning Systems) and my research focuses on the development of neural models for NLP and their robustness. I worked on language modelling along with generative models for sequences, learning criterion for large vocabulary applications, and dynamical models.

Farah Benamara (Associate professor), Chloé Braud (Researcher), Philippe Muller (Associate professor), Véronique Moriceau (Associate professor)

MELODI team at IRIT/University of ToulouseOur team focuses on semantics and pragmatics, including discourse parsing, sentiment analysis, and bias detection. Contextual embeddings are used mostly for fine-tuning models, and studies on interpretability.

Vincent Claveau (Researcher), Antoine Chaffin (PhD student)

IRISA - LinkMedia team - IMATAG/CNRSOur group develops research in NLP and multimodal analysis (eg. text/image). The expected outcomes of this project would be very valuable for LinkMedia's current research topics relying on the use of artificially generated content: detection of (textual and multimodal) fake news, artificial datasets for classification or tagging and Information Retrieval.

Mathieu Constant (Full professor)

Université de Lorraine, ATILF - UMR 7118 - CNRS / ULOur team is interested in resources, normalization, annotation and exploitation in NLP.

Benoît Crabbé (Professor), Marie Candito (Assistant professor), Antoine Simoulin (Phd Student) and members of the LLF

University of ParisThe Laboratoire de Linguistique Formelle is a member of the Labex EFL - Empirical Foundations of Linguistics. The team studies all aspects of language. We participated in the release of the FlauBERT model and adapted the now-famous GPT-2 model in French. The paper is currently under submission and the model, which counts over 1 billion parameters, will be released in open-source upon acceptance.

Béatrice Daille (Full professor)

GdR TAL (CNRS)The GDR TAL is the French academic hub for researchers in NLP. It is interested in language in all its forms: written, oral, signed. It deals with the themes of computer modeling and machine learning of language, its manifestations and applications in society, including their ethics issues.

Peter Ford Dominey (Research director)

CNRS DR1, INSERM UMR1093, UBFC, DijonLanguage encoding models help explain language processing in the human brain by learning functions that predict brain responses from the language stimuli that elicited them (Jain & Huth 2018 BioRxiv). The proposed project should lead to a significant advance in the quality of language models that we can use to explain brain responses in humans from the same stimuli. Relevant publication: Uchida, T., Lair, N., Ishiguro, H., & Dominey, P. F. (2021) Neurobiology of Language, 2(1), 83-105.