Building comprehensive datasets that capture real-world code-mixing patterns across diverse linguistic contexts.
Addressing complex linguistic phenomena that make code-mixed NLP particularly challenging.
Developing specialized evaluation frameworks to accurately assess model performance on code-mixed content.
Fostering a global community of researchers and practitioners to advance code-mixed NLP technologies.
A high-quality Hinglish code-mixed dataset with 181,463 instances, manually annotated for LID, MLI, POS tagging, NER, text normalization, and translations.
A multi-sentential Hindi-English code-mixed dataset with 67,007 documents and 84,937 MCTs, sourced from political speeches, press releases, and Hindi news articles.
A large-scale language identification dataset derived from 1.7 million tweets collected from Indian Twitter/X, annotated with coarse and fine-grained language labels.
A large-scale corpus of WhatsApp political discussions collected during the Indian General Elections 2019, consisting both raw and annotated data, enabling research in political discourse and misinformation.
A high-quality Hindi-English code-mixed dataset for NLG, containing human- and algorithm-generated Hinglish sentences with quality ratings, sourced from IITB English-Hindi parallel corpus.
A parallel corpus of the 13,738 code-mixed English-Hindi sentences and their corresponding translation in English.
SEVENTH CONFERENCE ON MACHINE TRANSLATION (WMT22): Colocated with EMNLP 2022, this shared task focused on machine translation for code-mixed languages.
Hosted at IIT Gandhinagar as part of INLG 2022, this challenge explored Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text.
Held at CODS-COMAD 2022, covering various challenges in code-mixed natural language generation.
Presented at ICON 2021, discussing evaluation metrics for code-mixed text.
A study on code-mixed sentiment classification using candidate sentence generation and selection.
Rajvee ShethSenior Research Fellow IIT Gandhinagar |
Pooja GoswamiTechnical Assistant IIT Gandhinagar |
Mahesh KumarTechnical Assistant IIT Gandhinagar |
Rahul GadhviTechnical Assistant IIT Gandhinagar |
|---|
Samridhi Raj SinhaSRIP Intern IIT Gandhinagar |
Mahavir PatilProject Intern IIT Gandhinagar |
Drashti PatelProject Intern IIT Gandhinagar |
Yash ChopraProject Intern IIT Gandhinagar |
|---|
Shubh NisarSoftware Engineer Intern North Carolina State University |
Heenaben PrajapatiSenior Research Fellow IIT Gandhinagar |
Himanshu BeniwalPhD student IIT Gandhinagar |
|---|
Ronakpuri GoswamiJRF DAU Gandhinagar |
Diksha
|
Vaidahi
|
Ravindra PurohitResearch Scholar DAU Gandhinagar |
Dwip
|
Rahul
|
Vivek SrivastavaResearcher TCS Research |
|---|
Graciously sponsored by