Google CHAII Research Challenge: Answering to Indian Language Questions

India is one of the most populous countries in the world, second only to China. Despite that major Indian languages such as Hindi and Tamil have an under-representation on the web and public datasets. This makes the performance of Natural Language Understanding(NLU) models plummet on Indian Languages which subsequently affects the users of these models in India. To address these issues, Google has launched CHAII Hindi and Tamil Question Answering Research Challenge. CHAII- Challenge in AI for India, an initiative by Google Research India to focus on problems faced by India and addressing them through the Application of Artificial Intelligence. Through CHAII, Google aims to get help from the Machine Learning community for making use of new Machine Learning solutions and to share new high-quality datasets for addressing the problems.

Google CHAII Challenge
Image by Free-Photos from Pixabay

The Competition

In this competition, you will be tasked with the problem of answering real-world questions about Wikipedia articles. For this task, you will be making use of the chaii-1: a new dataset of question and answer pairs in the Hindi and Tamil languages. The dataset consists of a passage(the context feature) based on which a question(the question feature) will be asked. You have to train your model to predict the answer to this question(the answer_text feature which will be the target variable). Additionally, since the dataset consists of both Hindi and Tamil languages, there will be an additional field(the language feature) that contains the language of the question and the answer(i.e, Hindi or Tamil).

The solutions will be evaluated by the word-level Jaccard Score metric averaged over the entire test dataset. The run-time for the entire notebook, irrespective of whether a CPU is used or a GPU, should not exceed 5 hours.

You will also be provided with the baseline model and inference code on which you will have to improve. Also, the use of free and publicly available datasets and the use of pre-trained models is allowed in the competition.

Timelines

The competition began on 11th August 2021. And while the final submission deadline is on 15th November 2021, the deadline for entry into the competition and team merger is 8th November 2021. So far, there are already 102 competitors from 97 teams who have made 335 entries.

Prizes

Although it is a research challenge, it is still a competition. And we all know very well what comes along with competition, prizes. Google will be giving out $2,000 to each of the top 5 position holders in the competition, which takes the total prize sum to $10,000.

To be a part of the challenge, you just need to have a Kaggle Account. To know more about this challenge and participate in it, click here.

Good Luck!


Other Challenges

Kaggle, a subsidiary of Google is by far the biggest community of Data Science and Machine Learning practitioners. Kaggle regularly organizes competitions in various domains and on behalf of various organizations.

You can check out the blog on the Kaggle Gravitational Wave Detection challenge by G2Net, click here.

You can check out the blog on the Kaggle Volatility Forecasting challenge by Optiver, click here.