This working group is responsible for helping define language choices and local and global representativeness criteria, analysing the diversity of existing text sources for each region in terms of social contexts represented, and finding diverse sources of text to meet these criteria, including both online and offline text in all available media.
Entry document: Dataset Org: Data Sourcing and Representativeness
Karën Fort, Sam Bowman, Halil Akin, Caiming Xiong, Guillaume Klein, Samson Tan, Myle Ott, Philippe Muller, Ruiqi Zhong, Luke Zettlemoyer, Yacine Jernite, Wietse de Vries, Max Ryabinin, Antoine Neuraz , Tsvetomila Mihaylova, Hady Elsahar, Manan Dey, Shanya Sharma, Minh Quang Pham, Jin Koay, Ari Jankelowitz, Elizabeth Keleshian,Shiyue Zhang, Evan Dufraisse, Edoardo M. Ponti, Han Wang, Ona de Gibert Bonet, Zaid Alyafeai, Md Rabiul Awal, Kaustubh Dhole, Jonathan Chang, Maximin Coavoux, Adrian Popescu, Maraim Masoud, Ben Peters, Tasnim Mohiuddin, Rabin Banjade, Vinay Uday Prabhu, Aitor Soroa, Trishala Neeraj, Luca Soldaini, Rodrigo Wilkens, Canwen Xu, Sheng Shen, Michael McKenna, Rishi Bommasani, Patrick Drouin, Fredrik Olsson, Sadid A. Hasan, Francesco De Toni, Huu Nguyen, Laurent Besacier, Nicolas Hervé, Ludovic Tanguy, Benoît Sagot, Salomey Osei, Alham Fikri Aji, Filip Ginter, Sampo Pyysalo, Gérard Dupont, Aakanksha Naik, Olivier Nguyen, Trieu Le, Emily Reif, Tolga Bolukbasi
Angie McMillan-Major, Pedro Ortiz Suárez, Zeerak Waseem
Scoping out further collaborative tasks within the group.