AI for Research and Education

Scientific Leaderboard Generation Benchmark

LEGOBench is a benchmark designed to evaluate systems that generate scientific leaderboards, leveraging 22 years of arXiv data and over 11k leaderboards from PapersWithCode. Despite advancements, state-of-the-art models still face significant performance gaps in automatic leaderboard generation across graph-based and language model-based task configurations.

Leaderboard generation RAG arXiv dataset Citation network Comparison network

Learn More

SciDQA: QA Over Research Papers

SciDQA is a dataset of 2,937 QA pairs designed to test LLMs' deep understanding of scientific articles, with questions sourced from expert peer reviews and answers by paper authors. It emphasizes multi-document reasoning, requiring analysis of figures, tables, equations, and appendices, while evaluations reveal the varied capabilities of open-source and proprietary LLMs.

Scientific question answering QA dataset Peer reviews LLM Evaluation

Learn More

CoSAEMB: Scientific Document Embeddings

CoSAEmb is a model that learns representations from the full text of 97,402 scientific papers, leveraging a novel supervised contrastive training framework for long documents. It outperforms models trained on titles and abstracts in full-text information retrieval and shows competitive performance on SciRepEval and CSFCube benchmarks.

Scientific embeddings Representation learning Long context representations Contrastive training

Learn More

Automated Model Card Generation

To enhance understanding of ML models, a dataset of 500 QA pairs for 25 LMs was created, covering key aspects like training configurations, biases, and architecture. This dataset, with answers extracted by annotators, was used to evaluate LMs' ability to automate model card generation.

Model cards Question answering dataset Language models

Learn More

Robustness of Scientific LM Embeddings

This paper evaluates scientific language models' performance in handling short-query texts and their textual neighbors, revealing difficulties in retrieving relevant documents even under relaxed conditions. Experiments show that retrieval is more influenced by surface form than semantics, with perturbations often failing to produce meaningful neighbors in the embedding space.

Text perturbations Scientific language models Robustness Embeddings

Learn More

Comparison in Peer Reviews

COMPARE introduces a taxonomy and dataset of comparison discussions from peer reviews of experimental deep learning papers, analyzing 1,800 sentences from 117 reviews. The study achieves an F1 score of 0.49 for identifying comparison sentences and pretrains two language models on ML, NLP, and CV paper abstracts and reviews for better peer review representation.

Meaningful comparison Peer reviews Dataset Taxonomy

Learn More

TweeNLP: Twitter Exploration Portal for NLP

TweeNLP is a platform that organizes 19,395 NLP-related tweets, enabling exploration by topics, visualizing conference activities, discovering popular papers, and tracking submission deadlines. It integrates tweets with the NLPExplorer search engine, serving as a collective memory for the NLP community.

Twitter search Portal Science Communication Research papers

Learn More

NLPExplorer

NLPExplorer is an automated portal for indexing, searching, and visualizing NLP research, offering insights into papers, authors, venues, and topics through curated categories like tasks, approaches, and datasets. It features tools for exploring trends, popular authors, datasets, and temporal statistics, with accessible data via API calls to support further research.

Scholarly search Portal Research papers Research metadata

Learn More

Gurukul AI

Gurukul AI is an interactive learning platform designed for school students, promoting engagement and personalized education.

Education portal RAG

Learn More

IITGN ChatBot

A chatbot designed to provide students with easy access to information about the institute. It uses a RAG-based system built on the college advisories for efficient and accurate responses.

ChatBot RAG

Learn More