NLP for India

Multimodal Audio Processing System

An advanced system integrating noise reduction, transcription, translation and speech synthesis for streamlined and efficient audio signal processing workflow

ASR Multilingualism TTS Noise cancellation

Learn More

Unity AI (Ganga)

Project Unity is an initiative to address India's linguistic diversity and richness by creating a comprehensive resource covering the country's major languages. We strive to achieve state-of-the-art performance in understanding and generating text in Indian languages.

https://lingo.iitgn.ac.in/unityai-guard/

LLM Hindi AI model

Learn More

COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. We introduce COMMENTATOR specifically designed for annotating code-mixed text. It streamlines token-level and sentence-level language annotation with a focus on Hinglish datasets.

Code-Mixing Indic Languages NLP tools Annotation Frameworks

Learn More

Curating benchmarks and Constructing ML models for Code-Mixed NLP

Curating and annotating large-scale Hindi-English code-mixed data to develop NLP tools and ML models for foundational tasks like Language Identification, NER, POS tagging, Sentiment Analysis and Translation. This project aims to advance low-resource Indic code-mixed NLP, enabling state-of-the-art models and tools. It will also establish a public portal for collaboration, leaderboards, and fostering multilingual NLP research.

Code-Mixing Code-switching Indic Languages Data Annotation Multilingualism

Learn More

HinVec

This project aims to develop state-of-the-art word embedding models tailored for the Hindi language, focusing on capturing its unique grammatical and contextual nuances. The project includes a comprehensive evaluation benchmark suite for measuring model performance across various NLP tasks such as text classification, STS, retrieval etc.

Embedding Indic Language

Learn More

Design and implementation of an ASR system for Tamil language

In this project, we aim to develop an ASR system tailored for Tamil language, focussing on gathering huge training data and providing seamless tamil voice-to-text facility.

Automatic speech recognition Speech-to-text STT Indic Languages

Learn More

Enhancing ASR in Marathi language

The project helps create an audio dataset for the Marathi language that spans its various dialects and provides a smooth Marathi voice-to-text experience.

Learn More