The purpose of this call for EOI is to identify projects to submit full proposals to develop open and accessible datasets for machine learning applications that will enable natural language processing for languages in sub-Saharan Africa. The ability to communicate and be understood in one’s own language is fundamental to digital and societal inclusion. Natural language processing techniques have enabled critical AI applications that facilitate digital inclusion and improvements in numerous fields, including: education, finance, healthcare, agriculture, communication, and disaster response, among others. Many advances in both fundamental and applied NLP have stemmed from openly licensed and publicly available datasets.
However, such open, publicly available datasets are scarce to non-existent for many African languages, and this means the benefits of NLP are not accessible to speakers of these languages. Where relevant datasets do exist, they are often based on religious, missionary, or judiciary texts, leading to outmoded language and bias. There is a need for openly accessible text, speech, and other datasets to facilitate breakthroughs based on NLP technologies for African languages. Lacuna Fund seeks Expressions of Interest (EOIs) from qualified organizations to develop open and accessible training and evaluation datasets for ML applications for NLP in sub-Saharan Africa. The TAP recognizes the importance of datasets that would create significant impact regardless of the number of speakers of the included language, as well as the need for multi-lingual datasets. EOIs may include, but not limited to:- Collecting and/or annotating new data;
- Annotating or releasing existing data;
- Augmentation of existing datasets in all areas to decrease bias (such as gender bias or other types of bias or discrimination) or increase the usability of NLP technology in low- and middle-income contexts;
- Creating small, higher-quality benchmark data for NLP tasks in low-resource African languages.
While the focus of Lacuna Fund is primarily on dataset creation, annotation, augmentation, and maintenance, proposals may include the development of a baseline model to ensure the quality of the funded dataset and/or to facilitate the use of dataset for socially beneficial applications.
For more information about the opportunity, click here