NLPExplorer

Upenn

The University of Pennsylvania (commonly known as Penn or UPenn) is a private Ivy League research university in Philadelphia, Pennsylvania, United States. It is one of nine colonial colleges and was chartered prior to the U.S. Declaration of Independence when Benjamin Franklin, the university's founder and first president, advocated for an educational institution that trained leaders in academia, commerce, and public service. Penn identifies as the fourth-oldest institution of higher education i... For More Info

Total links:- 670

Total paper mentions:- 1485

First ACL Paper:- 1998

Latest ACL Paper:- 2019

Links

Alongwith its Literature Mentions

http://www.ldc.upenn.edu/ldc/online/index.html
Source Domains as Concept Domains in Metaphorical Expressions
ECONOMY IS A PERSON: A Chinese-English Corpora and Ontological-based Comparison Using the Conceptual Mapping Model

https://catalog.ldc.upenn.edu/docs/
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
A Code-Switching Corpus of Turkish-German Conversations
Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances
Automatic Extraction of Implicit Interpretations from Modal Constructions
Generative Adversarial Networks for Text Using Word2vec Intermediaries

http://projects.ldc.upenn.edu/TDT3/
On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online

http://venom.ldc.upenn.edu/
Refactoring Corpora

http://www.ling.upenn.edu/courses/
Unsupervised Authorial Clustering Based on Syntactic Structure
Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning

http://www.cis.upenn.edu/~xwe/semeval2015pit/
Processing and Normalizing Hashtags

http://www.ldc.upenn.edu/tdt/
Using Soundex Codes for Indexing Names in ASR Documents

http://projects.ldc.upenn.edu/TIDES
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit

http://projects.ldc.upenn.edu/gale/Translation/Editors/
Management of Large Annotation Projects Involving Multiple Human Judges: a Case Study of GALE Machine Translation Post-editing
Evaluation of Machine Translation Errors in English and Iraqi Arabic

http://www.seas.upenn.edu/~ryantm/software/BioTag
Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts

http://www.sas.upenn.edu/
Studying the Temporal Dynamics of Word Co-occurrences: An Application to Event Detection
An analysis of the user occupational class through Twitter content

http://www.ling.upenn.edu/hist-corpora
Deterministic natural language generation from meaning representations for machine translation
DynamicPower at SemEval-2016 Task 8: Processing syntactic parse trees with a Dynamic Semantics core
Extending the tool, or how to annotate historical language varieties
Annotation and Representation of a Diachronic Corpus of Spanish

http://www.ldc.upenn.edu/tools
Technical Infrastructure at Linguistic Data Consortium: Software and Hardware Resources for Linguistic Data Creation

http://projects.ldc.upenn.edu/ArabicTreebank/
From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
The Revised Arabic PropBank
Expanding Arabic Treebank to Speech: Results from Broadcast News
Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines

http://projects.ldc.upenn.edu/TIDES/Translation/TransAssess
Composing Human and Machine Translation Services: Language Grid for Improving Localization Processes

http://www.ling.upenn.edu/~jason2/
Collection of SLR in the Asian-Pacific Area

http://catalog.ldc.upenn.edu/LDC2009T13
Using Collections of Human Language Intuitions to Measure Corpus Representativeness
Using Word Familiarities and Word Associations to Measure Corpus Representativeness

http://repository.upenn.edu/cis
Learning with Structured Representations for Negation Scope Extraction

https://catalog.ldc.upenn.edu/LDC2008T05
Chinese Tense Labelling and Causal Analysis

http://ldc.upenn.edu
Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing
A Joint Syntactic and Semantic Dependency Parsing System based on Maximum Entropy Models
Parsing Syntactic and Semantic Dependencies for Multiple Languages with A Pipeline Approach
The Crotal SRL System : a Generic Tool Based on Tree-structured CRF
Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
An Iterative Approach for Joint Dependency Parsing and Semantic Role Labeling

https://project.ldc.upenn.edu/ace
Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks

http://www.cis.upenn.edu/~mpalmer/siglex/online.html
Computational Linguistics at Universiti Sains Malaysia

http://www.ldc.upenn.edu/readme_files/celex.read
Terminological Variants for Document Selection and Question/Answer Matching
Two Levels of valuation in a Complex NL System

http://www.ldc.upenn.edu/Catalog/docs/LDC2002T31
Automatic Classification of Geographic Named Entities
Automatic Building Gazetteers of Co-referring Named Entities

https://catalog.ldc.upenn.edu/LDC2010L01
LSTM Neural Reordering Feature for Statistical Machine Translation

http://www.cis.upenn.edu/~dbikel/software.html#comparator
Le corpus Sequoia : annotation syntaxique et exploitation pour l’adaptation d’analyseur par pont lexical (The Sequoia Corpus : Syntactic Annotation and Use for a Parser Lexical Domain Adaptation Method) [in French]
Improving Combinatory Categorial Grammar Parse Reranking with Dependency Grammar Features

http://www.ldc.upenn.edu/Projects/ACE/docs/Chi
The Annotation of Event Schema in Chinese

https://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtb-
Searching in the Penn Discourse Treebank Using the PML-Tree Query
Translating Implicit Discourse Connectives Based on Cross-lingual Annotation and Alignment

http://catalog.ldc.upenn.edu/LDC2012T21
Multiword Expression Identification with Recurring Tree Fragments and Association Measures
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus

http://online.ldc.upenn.edu
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

http://www.cis.upenn.edu/dgildea/VerbNet/
Seeing Arguments through Transparent Structures

http://www.seas.upenn.edu/~strctlrn/BioTagger/BioTagger
An Analysis of Biomedical Tokenization: Problems and Strategies

http://acl.ldc.upenn.edu/A/A00/A00-1011.pdf
InfoXtract: A Customizable Intermediate Level Information Extraction Engine

http://itre.cis.upenn.edu/~myl/languagelog/archives/002
How Many Multiword Expressions do People Know?

http://www.ldc.upenn.edu/Papers/ADC2000/adc0
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

http://acl.ldc.upenn.edu/A/A00/A00-1001.pdf
The influence of written task descriptions in Wizard of Oz experiments

http://www.ldc.upenn.edu/Catalog/ByYear.jsp
Prague Dependency Treebank 2.5 – a Revisited Version of PDT 2.0

http://itre.cis.upenn.edu/~myl/languagelog/archives/005
Privacy Issues in Online Machine Translation Services - European Perspective

http://acl.ldc.upenn.edu/P/P06/P06-
Interactive Machine Translation Based on Partial Statistical Phrase-based Alignments

http://www.cis.upenn.edu/verbnet/
Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites
A Bidirectional Study of Mandarin Conversation Verbs

https://www.ling.upenn.edu/courses/
Multilingual and Cross-Lingual Complex Word Identification
mib at SemEval-2016 Task 4a: Exploiting lexicon based features for Sentiment Analysis in Twitter

http://www.ldc.upenn.edu/documents/1
RESTful Annotation and Efficient Collaboration

http://acl.ldc.upenn.edu/W/W96/W96-0213.pdf
利用統計方法及中文訓練資料處理台語文詞性標記 (Modeling Taiwanese POS tagging with statistical methods and Mandarin training data) [In Chinese]

http://acl.ldc.upenn.edu/W/W95/W95-0101.pdf
Factors Affecting Part-of-Speech Tagging for Tagalog

https://catalog.ldc.upenn.edu/ldc96l14
Learning attention for historical text normalization by learning to pronounce

https://catalog.ldc.upenn.edu/docs/LDC2006T13/readme.txt
DalGTM at SemEval-2016 Task 1: Importance-Aware Compositional Approach to Short Text Similarity

http://projects.ldc.upenn.edu/EARS
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit

http://acl.ldc.upenn.edu/
Enhancing Electronic Dictionaries with an Index Based on Associations
Extracting and Querying Relations in Scientific Papers on Language Technology
Natural Language Processing: A Terminological and Statistical Approach
Incorporating an Error Corpus into a Spellchecker for Maltese
Temporal Context: Applications and Implications for Computational Linguistics

http://acl.ldc.upenn.edu/J/J93/J93-2004.pdf
Corporate News Classification and Valence Prediction: A Supervised Approach

https://secure.ldc.upenn.edu/intranet/surveyStatsPublic_2007.jsp
The Linguistic Data Consortium Member Survey: Purpose, Execution and Results

http://projects.ldc.upenn.edu/ArabicTreebank
Simultaneous Tokenization and Part-Of-Speech Tagging for Arabic without a Morphological Analyzer
Construct State Modification in the Arabic Treebank

http://www.ldc.upenn.edu/Catalog/-
Automatic Learning of Language Model Structure

http://wave.ldc.upenn.edu/
Language Model Adaptation for Statistical Machine Translation via Structured Query Models

http://www.ldc.upenn.edu/Catalog/LDC2011T07
Social Text Normalization using Contextual Graph Random Walks

http://www.ldc.upenn.edu/Projects/Corpus-Cookbook/-
Acquisition and Annotation of Slovenian Broadcast News Database

http://projects.ldc.upenn.edu/TIDES/tidesmt.html
Translation Adequacy and Preference Evaluation Tool (TAP-ET)
Corpus Support for Machine Translation at LDC
Mining the Correlation between Human and Automatic Evaluation at Sentence Level

http://www.cis.upenn.edu/[normal-wave
Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

http://www.ldc.upenn.edu/Projects/MDE/
Annotation and analysis of overlapping speech in political interviews

http://www.ldc.upenn.edu/Catalog
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
A Comparative Study of Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval
Proposal for the International Standard Language Resource Number
Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report
A Guide for the Production of Reusable Language Resources
The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering
Adapting to Trends in Language Resource Development: A Progress Report on LDC Activities

http://bioie.ldc.upenn.edu
A Multi-Domain Web-Based Algorithm for POS Tagging of Unknown Words
An Annotation Type System for a Data-Driven NLP Pipeline
Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts
Learning Hidden Markov Models with Distributed State Representations for Domain Adaptation
Semi-supervised Representation Learning for Domain Adaptation using Dynamic Dependency Networks

https://catalog.ldc.upenn.edu/LDC94S13A
Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems
Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition

https://catalog.ldc.upenn.edu/ldc2015s04
Automatic Token and Turn Level Language Identification for Code-Switched Text Dialog: An Analysis Across Language Pairs and Corpora

https://web.sas.upenn.edu/danielpr/
Current and Future Psychological Health Prediction using Language and Socio-Demographics of Children for the CLPysch 2018 Shared Task

http://www.ling.upenn
The Icelandic Parsed Historical Corpus (IcePaHC)
The Penn Parsed Corpus of Modern British English: First Parsing Results and Analysis
Part-of-Speech Tagging for Historical English

http://languagelog.ldc.upenn.edu/nll/?p=36048
Obituary: Aravind K. Joshi

https://catalog.ldc.upenn.edu/LDC2010T24
Evaluating Two Annotated Corpora of Hindi Using a Verb Class Identifier

http://www.cis.upenn.edu/~cotton/cgibin/pblex_fmt.cgi
From TreeBank to PropBank

http://www.seas.upenn.edu/~pdtb/
From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification
Subordinators with Elaborative Meanings in Czech and English

https://www.seas.upenn.edu/~hongkai1/regsum.html
Improving the Estimation of Word Importance for News Multi-Document Summarization

http://repository.upenn.edu/ircs_reports/38/
Developing Universal Dependencies for Mandarin Chinese

https://catalog.ldc.upenn.edu/LDC2002T07
Discourse Parsing with Attention-based Hierarchical Neural Networks
The RST Spanish-Chinese Treebank

http://acl.ldc.upenn.edu/C/C94/C94-1024.pdf
A Flexible Language Acquisition Tool Kit for Natural Language Processing

http://oracc.museum.upenn.edu/doc/
Word Segmentation for Akkadian Cuneiform

http://psd.museum.upenn.edu/epsd/index.html
Creating Tools for Morphological Analysis of Sumerian

http://www.ircs.upenn.edu/arabic/Jan03release/guidelines-
Automatic Treebank-Based Acquisition of Arabic LFG Dependency Structures

http://papers.ldc.upenn.edu/NEMLAR2004/Dialectal-
Lexicon Development for Varieties of Spoken Colloquial Arabic

http://www.ldc.upenn.edu/Project/GALE
Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner

http://www.ling.upenn.edu/hist-corpora/ppcme2-
The Latin Dependency Treebank in a Cultural Heritage Digital Library

http://projects.ldc.upenn.edu/mixer/
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

http://www.cis.upenn.edu/dbikel/software.html
Empirical Evaluations of Animacy Annotation
Integrating Graph-Based and Transition-Based Dependency Parsers
Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation
The Effects of Disfluency Detection in Parsing Spoken Language
Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
Improving data-driven dependency parsing using large-scale LFG grammars
Parsing Noun Phrases in the Penn Treebank
Learning Grammar with Explicit Annotations for Subordinating Conjunctions
Intricacies of Collins’ Parsing Model
Analyzing and Integrating Dependency Parsers
Improved Chinese Parsing Using Named Entity Cue
Overview of BioNLP’09 Shared Task on Event Extraction
All Fragments Count in Parser Evaluation
Detecting Parser Errors Using Web-based Semantic Filters
Linguistic features in data-driven dependency parsing

http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/
Recognition and Classification of Numerical Entities in Basque

https://www.ldc.upenn.edu/collaborations/past-projects/
Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching

http://repository.upenn.edu
Evaluation methodologies in Automatic Question Generation 2013-2018

http://morph.ldc.upenn.edu/Catalog/LDC99T37.ht
A Multilingual Approach To Annotating And Extracting Temporal Information

http://acl.ldc.upenn.edu/C/C96/C96-2182.pdf
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research

http://ldc.upenn.edu/
A Simple Generative Pipeline Approach to Dependency Parsing and Semantic Role Labeling
Implicitly Supervised Language Model Adaptation for Meeting Transcription
CCG parsing with one syntactic structure per n-gram

http://www.ldc.upenn.edu/Projects/ACE/docs/RDC-
Callisto: A Configurable Annotation Workbench

http://ldc.upenn.edu/projects/tides/
A Maximum Entropy Word Aligner for Arabic-English Machine Translation

https://catalog.ldc.upenn.edu/LDC2014T24
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)

https://www.ldc.upenn.edu/collaborations/past-
Cross-genre Event Extraction with Knowledge Enrichment
Generating Politically-Relevant Event Data
Recognizing Complex Entity Mentions: A Review and Future Directions
Annotating genericity: a survey, a scheme, and a corpus

http://projects.ldc.upenn.edu/ACE
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit

http://www.cis.upenn.edu/~pereira/papers/crf.pdf
學術會議資訊之擷取及其應用 (Information Extraction for Academic Conference and It’s Application) [In Chinese]

https://catalog.ldc.upenn.edu/LDC2016T02
Construction and Annotation of the Jordan Comprehensive Contemporary Arabic Corpus (JCCA)

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-
Zero-Shot Transfer Learning for Event Extraction

http://www.cis.upenn.edu/cis639/arabic/info/translit-
Unsupervised Learning of Arabic Stemming Using a Parallel Corpus

http://projects.ldc.upenn.edu/ace/docs/English-Entities-
Entity Translation and Alignment in the ACE-07 ET Task
Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution

https://catalog.ldc.upenn.edu/LDC2009T07
GRaSP: A Multilayered Annotation Scheme for Perspectives

https://catalog.ldc.upenn.edu
Resource Interoperability for Sustainable Benchmarking: The Case of Events
Review on the Existing Language Resources for Languages of France
Unsupervised AMR-Dependency Parse Alignment
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities

http://www.cis.upenn.edu/~adwait
Classifier Combination for Improved Lexical Disambiguation

https://catalog.ldc.upenn.edu/LDC99T42
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
Generating a Linguistic Model for Requirement Quality Analysis
Coherence Modeling Improves Implicit Discourse Relation Recognition
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

http://papers.ldc.upenn.edu/LREC2004/
Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

http://projects.ldc.upenn.edu/kbp/data
Infusion of Labeled Data into Distant Supervision for Relation Extraction

http://oracc.museum.upenn.edu/etcsri/
Towards a Linked Open Data Edition of Sumerian Corpora
Universal Morphologies for the Caucasus region

https://catalog.ldc.upenn.edu/LDC2003T09
An Efficient Cross-lingual Model for Sentence Classification Using Convolutional Neural Network
Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation
Better Modeling of Incomplete Annotations for Named Entity Recognition

http://www.ldc.upenn.edu/Projects/Chinese
Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
HMM Word and Phrase Alignment for Statistical Machine Translation

http://projects.ldc.upenn.edu/Chinese/docs/char2pinyin.txt
Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages

http://oracc.museum.upenn.edu/doc/help/
Towards a Linked Open Data Edition of Sumerian Corpora

http://catalog.ldc.upenn.edu/LDC2006T06
Triple based Background Knowledge Ranking for Document Enrichment

http://catalog.ldc.upenn.edu/docs/LDC96L14
Assessing the relative reading level of sentence pairs for text simplification

http://www.ling.upenn.edu/advice/latex/qtree/
A Word-Order Database for Testing Computational Models of Language Acquisition

http://www.ldc.upenn.edu/Catalog/LDC94T5.html
The OLAC Metadata Set and Controlled Vocabularies

http://www.cis.upenn.edu/treebank/tokenizer.sed
A Clustered Global Phrase Reordering Model for Statistical Machine Translation
Down-stream effects of tree-to-dependency conversions

https://catalog.ldc.upenn.edu/LDC2002T31
Cross-Pair Text Representations for Answer Sentence Selection

http://nahuatl.ldc.upenn.edu/
Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement

http://acl.ldc.upenn.edu/A/A97/A97-2014.pdf
An Evaluation of Adopting Language Model as the Checker of Preposition Usage

http://projects.ldc.upenn.edu/MDE/Guidelines/SimpleMDE_V
Linguistic Resources for Speech Parsing

https://www.ircs.upenn.edu/
Obituary: Aravind K. Joshi

https://catalog.ldc.upenn.edu/LDC2004T19
Enriching ASR Lattices with POS Tags for Dependency Parsing

http://www.cis.upenn.edu/dbikel
Parallel Entity and Treebank Annotation
QuestionBank: Creating a Corpus of Parse-Annotated Questions

https://catalog.ldc.upenn.edu/LDC2009T08
A Japanese Word Segmentation Proposal

https://catalog.ldc.upenn.edu/ldc2008t19
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

https://catalog.ldc.upenn.edu/LDC2010T06
Generating and Scoring Correction Candidates in Chinese Grammatical Error Diagnosis
NTOU Chinese Spelling Check System in Sighan-8 Bake-off
Detecting Grammatical Errors in the NTOU CGED System by Identifying Frequent Subsentences
A Study on Chinese Spelling Check Using Confusion Sets and?N-gram Statistics
NTOU Chinese Grammar Checker for CGED Shared Task
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language

http://catalog.ldc.upenn.edu/docs/
Exploring Measures of “Readability” for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T01U01
Constructing a Temporal Relation Tagged Corpus of Chinese Based on Dependency Structure Analysis

http://www.cis.upenn.edu/~mdredze/datasets/sentiment/
Co-Training for Cross-Lingual Sentiment Classification

http://acl.ldc.upenn.edu/P/P93/P93-1035.pdf
Constituent Structure for Filipino: Induction through Probabilistic Approaches

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/
A Two-stage Approach for Extending Event Detection to New Types via Neural Networks
Event Detection and Domain Adaptation with Convolutional Neural Networks

http://projects.ldc.upenn.edu/ace/docs/English-
Building Chinese Event Type Paradigm Based on Trigger Clustering
Ontology Population from Textual Mentions: Task Definition and Benchmark
Acquiring Topic Features to improve Event Extraction: in Pre-selected and Balanced Collections
Who is Who and What is What: Experiments in Cross-Document Co-Reference
Using Prediction from Sentential Scope to Build a Pseudo Co-Testing Learner for Event Extraction

http://projects.ldc.upenn.edu/TDT
Extractive Summarization using Inter- and Intra- Event Relevance

http://www.cis.upenn.edu/~chinese/posguide.3rd.ch.pdf
Morphological features help POS tagging of unknown words across language varieties
Unsupervised Language Model Adaptation Incorporating Named Entity Information

http://www.ldc.upenn.edu/Catalog/topten.jsp
LDC Language Resource Database: Building a Bibliographic Database

http://languagelog.ldc.upenn.edu/nll/?p=2068
Crisis MT: Developing A Cookbook for MT in Crisis Situations

http://www.cis.upenn.edu/epitler/discourse.html;
Assessing the Discourse Factors that Influence the Quality of Machine Translation

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC99T42
Construction of a Free Large Part-of-Speech Annotated Corpus in French (Construction d’un large corpus écrit libre annoté morpho-syntaxiquement en français) [in French]

https://www.sas.upenn.edu/
Evaluating Dialogs based on Grice’s Maxims

http://morph.ldc.upenn.edu/TDT
Portability Issues for Speech Recognition Technologies

https://catalog.ldc.upenn.edu/LDC2012T08
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus

https://catalog.ldc.upenn.edu/LDC2012T09
Sentiment after Translation: A Case-Study on Arabic Social Media Posts
Sentiment Lexicons for Arabic Social Media

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?cata-
DutchSemCor: Targeting the ideal sense-tagged corpus

http://projects.ldc.upenn.edu/QLDB
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit

https://catalog.ldc.upenn.edu/LDC2012T05
Developing Universal Dependencies for Mandarin Chinese
A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks

https://www.ldc.upenn.edu/
Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
ICE: Rapid Information Extraction Customization for NLP Novices
Improving Event Detection with Dependency Regularization
Induction of a variable granularity property grammar from the Arabic Treebank ATB (Induction d’une grammaire de propriétés à granularité variable à partir du treebank arabe ATB) [in French]
A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards
Joint Extraction of Events and Entities within a Document Context
A Hybrid Approach to Features Representation for Fine-grained Arabic Named Entity Recognition
Improving Event Detection with Active Learning
A Quantitative Study of Data in the NLP community
Annotation of Entities and Relations in Spanish Radiology Reports
RDF Representation of Licenses for Language Resources
Constructing an Annotated Corpus for Protest Event Mining
One Sentence One Model for Neural Machine Translation
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
Probabilistic Inference for Cold Start Knowledge Base Population with Prior World Knowledge
Automated Acquisition of Patterns for Coding Political Event Data: Two Case Studies
A High-Quality Multilingual Dataset for Structured Documentation Translation
Improving Domain Adaptation Translation with Domain Invariant and Specific Information
Literary Event Detection
Construction and Annotation of the Jordan Comprehensive Contemporary Arabic Corpus (JCCA)
An annotated dataset of literary entities

http://www.ldc.upenn.edu/Papers/LREC2000/mult
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

http://oracc.museum.upenn.edu/
Building A Handwritten Cuneiform Character Imageset
Towards a Linked Open Data Edition of Sumerian Corpora
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition
Machine Translation and Automated Analysis of the Sumerian Language
Experiments in Cuneiform Language Identification

http://www.seas.upenn.edu/lannie/IEval2.html
Automatically Assessing Machine Summary Content Without a Gold Standard

http://www.ldc.upenn.edu/LDC2006T01
Non-projectivity and valency
Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation

http://projects.ldc.upenn.edu/kbp/
Non-Expert Correction of Automatically Generated Relation Annotations
Personal Attributes Extraction in Chinese Text Bakeoff in CLP 2014: Overview

http://www.cis.upenn.edu/~chi-
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language

http://www.ldc.upenn.edu/annotation/
An Efficient and Flexible Format for Linguistic and Semantic Annotation
The MATE Markup Framework

http://www.ldc.upenn.edu/
A Phonotactic Language Model for Spoken Language Identification
COLLATE: Competence Center in Speech and Language Technology
Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora
Effectiveness and Efficiency of Open Relation Extraction
Chinese Segmentation and New Word Detection using Conditional Random Fields
Tree Linearization in English: Improving Language Model Based Approaches
Chinese Web Scale Linguistic Datasets and Toolkit
An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora
SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling
Construction of an Infrastructure for Providing Users with Suitable Language Resources
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Syntactic Reordering for English-Arabic Phrase-Based Machine Translation
Unsupervised Discovery of Morphemes
On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market
OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East
Lexicon Development for Varieties of Spoken Colloquial Arabic
Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure
Arabic Word Segmentation for Better Unit of Analysis
Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods
XQuery as an Annotation Query Language: a Use Case Analysis
Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
Method of Selecting Training Data to Build a Compact and Efficient Translation Model
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Construction of a Metadata Database for Efficient Development and Use of Language Resources
A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation
ONTS: “Optima” News Translation System
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
Memory-Based Morphological Analysis Generation and Part-of-Speech Tagging of Arabic
Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA
Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites
Lexicon Design for Transcription of Spontaneous Voice Messages
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
The OLAC Metadata Set and Controlled Vocabularies
Monolingual Distributional Profiles for Word Substitution in Machine Translation
Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
ELRA’s Services 15 Years on...Sharing and Anticipating the Community
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
Lexical Semantics and Distribution of Suffixes - A Visual Analysis
15 Years of Language Resource Creation and Sharing: a Progress Report on LDC Activities
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
Building an Annotated Japanese-Chinese Parallel Corpus - A Part of NICT Multilingual Corpora
The META-SHARE Metadata Schema for the Description of Language Resources
Rel-grams: A Probabilistic Model of Relations in Text
Memory-based Grammatical Error Correction
Annotated Web as corpus
Tools & Resources for Visualising Conversational-Speech Interaction
Combining Formal Concept Analysis and semantic information for building ontological structures from texts : an exploratory study
High WSD Accuracy Using Naive Bayesian Classifier with Rich Features
Speech-Related Technologies - Where Will the Field Go in 10 Years?
Annotating Causality in the TempEval-3 Corpus
Structural semantic interconnection: a knowledge-based approach to Word Sense Disambiguation
Estimation of Speaking Style in Speech Corpora Focusing on speech transcriptions
RESTful Annotation and Efficient Collaboration
Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
STC-TIMIT: Generation of a Single-channel Telephone Corpus
Extension of Zipf’s Law to Word and Character N-grams for English and Chinese
A Multilingual Natural Stress Emotion Database
On the Way to a Legal Sharing of Web Applications in NLP
A Comparative Study for Query Translation using Linear Combination and Confidence Measure
Adapting an Example-Based Translation System to Chinese
Use of Event Types for Temporal Relation Identification in Chinese Text
Constructing a Temporal Relation Tagged Corpus of Chinese Based on Dependency Structure Analysis
Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
Language Specific and Topic Focused Web Crawling
Conceptual Structure of Automatically Extracted Multi-Word Terms from Domain Specific Corpora: a Case Study for Italian
A Comparative Study of Four Language Identification Systems
Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks
Discriminative Joint Modeling of Lexical Variation and Acoustic Confusion for Automated Narrative Retelling Assessment
Exploring Approaches to Discriminating among Near-Synonyms
Where Should Annotation Stop?
Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering
WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles
Reduced n-gram Models for English and Chinese Corpora
Give me a bug. a framework for a bug report service
CATiB: The Columbia Arabic Treebank
Reduced N-Grams for Chinese Evaluation

http://acl.ldc.upenn.edu/muc7/M98-
Named Entity Recognition System for Urdu

http://www.cis.upenn.edu/mdredze/datasets/sentiment/
Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T14
以中文十億詞語料庫為基礎之兩岸詞彙對比研究 (Cross-Strait Lexical Differences: A Comparative Study based on Chinese Gigaword Corpus) [In Chinese]
以中文十億詞語料庫為基礎之兩岸詞彙對比研究 (A Study of Lexical Differences between China and Taiwan based on the Chinese Gigaword Corpus) [In Chinese]
The Polysemy of Da3: An ontology-based lexical semantic study
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 2, June 2013-Special Issue on Chinese Lexical Resources: Theories and Applications

http://www.cis.upenn.edu/mpalmer/project
Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution
Impact of Question Decomposition on the Quality of Answer Summaries

http://projects.ldc.upenn.edu/TIDES/mt2003.html
A Joint Model to Identify and Align Bilingual Named Entities

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId
Compositionality of NN Compounds: A Case Study on [N1+Artifactual-Type Event Nouns]
Building an Arabic Multiword Expressions Repository
The Application of Constraint Rules to Data-driven Parsing
Generalization of Words for Chinese Dependency Parsing
A Hybrid Chinese Spelling Correction Using Language Model and Statistical Machine Translation with Reranking

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalo
A Multiple-Domain Ontology Builder

http://projects.ldc.upenn.edu/lctl
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

http://www.cis.upenn.edu/kinyon
A Language Independent Shallow-Parser Compiler

http://ldc.upenn
Genetic Algorithms for Feature Relevance Assignment in Memory-Based Language Processing

http://www.seas.upenn.edu/strctlrn/BioTagger/BioTagger.html
AnnoMarket: An Open Cloud Platform for NLP

http://www.cis.upenn.edu/~treebank/tokenization.html
CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation
Construction of a Free Large Part-of-Speech Annotated Corpus in French (Construction d’un large corpus écrit libre annoté morpho-syntaxiquement en français) [in French]

https://catalog.ldc.upenn.edu/LDC2017S01
A Vietnamese Dialog Act Corpus Based on ISO 24617-2 standard

http://projects.ldc.upenn.edu/Chinese/
Chinese Word Segmentation and Named Entity Recognition by Character Tagging
Can Word Segmentation be Considered Harmful for Statistical Machine Translation Tasks between Japanese and Chinese?

http://www.ldc.upenn.edu/Projects/
Do We Need Chinese Word Segmentation for Statistical Machine Translation?
Modelling Entity Instantiations
Detecting Structural Metadata with Decision Trees and Transformation-Based Learning
Corroborating Text Evaluation Results with Heterogeneous Measures
Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation
Developing and Using a Pilot Dialectal Arabic Treebank
Error Analysis of Statistical Machine Translation Output

http://www.ldc.upenn.edu/Catalog/docs/LDC2007T21/ontonotes-1.0-documentation.pdf
Word Sense Disambiguation Using Multiple Contextual Features

http://www.ldc.upenn.edu/Projects/TDT5/Annotation/
Integrated Linguistic Resources for Language Exploitation Technologies

https://catalog.ldc.upenn
Transition-Based Chinese AMR Parsing
Coreference in Prague Czech-English Dependency Treebank
Building Universal Dependency Treebanks in Korean
A 2nd Longitudinal Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns
SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering
Decoupling Encoder and Decoder Networks for Abstractive Document Summarization
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems
Enriching ASR Lattices with POS Tags for Dependency Parsing
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
The Steep Road to Happily Ever after: an Analysis of Current Visual Storytelling Models
Unsupervised Question Answering by Cloze Translation

http://acl.ldc.upenn.edu/J/
Application of Clause Alignment for Statistical Machine Translation

https://catalog.ldc.upenn.edu/LDC97T19
Multi-Dialect Arabic POS Tagging: A CRF Approach
The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic

https://catalog.ldc.upenn.edu/ldc97s62
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

https://catalog.ldc.upenn.edu/ldc99t42
Improving Implicit Semantic Role Labeling by Predicting Semantic Frame Arguments
Facing the most difficult case of Semantic Role Labeling: A collaboration of word embeddings and co-training
Transfer Learning for British Sign Language Modelling

http://www.cis.upenn.edu/xtag
Towards Automatic Generation of Natural Language Generation Systems
Statistical Machine Translation with a Factorized Grammar
Refining the most frequent sense baseline

http://www.ldc.upenn.edu/Catalog/CatologE
Stemming the Qur’an

http://www.cis.upenn.edu/~chin
Constructing a Temporal Relation Tagged Corpus of Chinese Based on Dependency Structure Analysis

https://www.ldc.upenn.edu
The DCU Discourse Parser for Connective, Argument Identification and Explicit Sense Classification
Joint Arabic Segmentation and Part-Of-Speech Tagging
ELRA’s Consolidated Services for the HLT Community
Computational Challenges for Polysynthetic Languages
Developing a Framework for Describing Relations among Language Resources

http://www.ldc.upenn.edu/Catalog/LDC97S42.html
The American National Corpus: More Than the Web Can Provide

https://catalog.ldc.upenn.edu/LDC2015T22
Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School

http://ldc.upenn.edu/Projects
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

http://www.ircs.upenn.edu/
Developing and Using a Pilot Dialectal Arabic Treebank

http://www.cis.upenn.edu
PASSAGE: from French Parser Evaluation to Large Sized Treebank

http://catalog.ldc.upenn.edu/LDC99T42
The Halliday Centre Tagger: An Online Platform for Semi-automatic Text Annotation and Analysis

http://www.cis.upenn.edu/~dbikel/software.html#stat-parser
Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines

http://catalog.ldc.upenn.edu/LDC2013S05
New Directions for Language Resource Development and Distribution

https://catalog.ldc.upenn.edu/ldc2011t12
Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding

https://catalog.ldc.upenn.edu/LDC2004T23
The International Corpus of Arabic: Compilation, Analysis and Evaluation

http://www.cis.upenn.edu/~
A Probabilistic Earley Parser as a Psycholinguistic Model

http://projects.ldc.upenn.edu/ace/annotation/
Identifying Untyped Relation Mentions in a Corpus given an Ontology
Inter-sentential Relations in Information Extraction Corpora
Towards the Annotation of Named Entities in the National Corpus of Polish

http://www.seas.upenn.edu/pdtb/
Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification
What excludes an Alternative in Coherence Relations?
Recovering discourse relations: Varying influence of discourse adverbials
Obituary: Aravind K. Joshi
Ambiguity in Explicit Discourse Connectives

http://www.ldc.upenn.edu/kits/1
RESTful Annotation and Efficient Collaboration

http://lodl.ldc.upenn.edu/MRDL/Tamil
Parallel Creation of Gigaword Corpora for Medium Density Languages - an Interim Report

http://www.seas.upenn
FrameNet+: Fast Paraphrastic Tripling of FrameNet
Most “babies” are “little” and most “problems” are “huge”: Compositional Entailment in Adjective-Nouns

http://www.cis.upenn.edu/R
Using Semantic and Syntactic Graphs for Call Classification

http://www.cis.upenn.edu/~cis530/
NLTK: The Natural Language Toolkit

http://www.cis.upenn.edu/\
Topic Model for Identifying Suicidal Ideation in Chinese Microblog

http://ccat.sas.upenn.edu/
Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

http://projects.ldc.upenn.edu/TIDES/Translation/TransAssess04.pdf
Hypothesis Refinement Using Agreement Constraints in Machine Translation

https://catalog.ldc.upenn.edu/LDC2005T09
Joint Mention Extraction and Classification with Mention Hypergraphs
Neural Architectures for Nested NER through Linearization

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?cata
Word Domain Disambiguation via Word Sense Disambiguation
Improving Question Recommendation by Exploiting Information Need

https://catalog.ldc.upenn.edu/LDC2005T01
SCTB: A Chinese Treebank in Scientific Domain
Cross-language Projection of Dependency Trees with Constrained Partial Parsing for Tree-to-Tree Machine Translation

https://www.seas.upenn.edu/~pdtb/tools.shtml#annotator
CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing

http://www.ldc.upenn.edu/annotation/database/papers/Broeder
SLR Validation: Current Trends and Developments

http://projects.ldc.upenn.edu/ace/annotation/2005Tasks.html
SpatialML: Annotation Scheme, Corpora, and Tools

http://acl.ldc.upenn.edu
Enhancing Electronic Dictionaries with an Index Based on Associations

https://catalog.ldc.upenn.edu/LDC95T21
A Discursive Grid Approach to Model Local Coherence in Multi-document Summaries

http://www.ldc.upenn.edu/AG/
Annotation Tools Based on the Annotation Graph API
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

http://acl.ldc.upenn.edu/C/C94/C94-
Natural Language Analysis of Patent Claims

http://ccat.sas.upenn.edu/~haroldfs/
Improving Gender Classification of Blog Authors

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogI
The MULINCO corpus and corpus platform
SpatialML: Annotation Scheme, Corpora, and Tools
Mining Key Phrase Translations from Web Corpora

http://www.ldc.upenn.edu/Projects/TDT
Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation
Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora

http://www.cis.upenn.edu/~xtag/gramrelease.html
Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars

http://www.seas.upenn.edu/pdtb/tools.shtml#annotator
PDTB-style Discourse Annotation of Chinese Text

http://bioie.ldc.upenn.edu/wiki/index.php/POS_t
A Voting Mechanism for Named Entity Translation in English–Chinese Question Answering

http://morph.ldc.upenn.edu/Catalog/LDC94S13A.html
Evaluation of Pronunciation Variants in the ASR Lexicon for Different Speaking Styles

http://catalog.ldc.upenn.edu/LDC2011T07
The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding
Two-Stage Hashing for Fast Document Retrieval

https://catalog.ldc.upenn.edu/LDC2002S09
Joint Transition-based Dependency Parsing and Disfluency Detection for Automatic Speech Recognition Texts
Enriching ASR Lattices with POS Tags for Dependency Parsing

https://www.ldc.upenn.edu/language-
Trends in HLT Research: A Survey of LDC’s Data Scholarship Program

https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Improving Low Resource Machine Translation using Morphological Glosses (Non-archival Extended Abstract)
Can spontaneous spoken language disfluencies help describe syntactic dependencies? An empirical study
Enriching Source for English-to-Urdu Machine Translation

http://catalog.ldc.upenn.edu/LDC2010T21
English to Urdu Statistical Machine Translation: Establishing a Baseline

http://www.ling.upenn.edu/hist-
The goo300k corpus of historical Slovene
Using Derivation Trees for Informative Treebank Inter-Annotator Agreement Evaluation
Rapid Deployment of Phrase Structure Parsing for Related Languages: A Case Study of Insular Scandinavian

http://catalog.ldc.upenn.edu/LDC2005T14
Parsing Chinese Synthetic Words with a Character-based Dependency Model

http://www.ldc.upenn.edu/Projects/ACE
Exploring Various Knowledge in Relation Extraction
Integrated Annotation for Biomedical Information Extraction
Exploiting the Role of Position Feature in Chinese Relation Extraction

http://projects.ldc.upenn.edu/ace/docs/English-Entities
Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction
I-CAB: the Italian Content Annotation Bank

http://projects.ldc.upenn.edu
Empirical Lower Bounds on the Complexity of Translational Equivalence

https://dbappserv.cis.upenn.edu/spell/
A Graph Approach to Spelling Correction in Domain-Centric Search

http://www.ldc.upenn.edu/exploration/expl2000/paper
Towards machine-readable lexicons for South African Bantu languages

http://www.ldc.upenn.edu/Projects/TDT3/email/email
Multiple Similarity Measures and Source-Pair Information in Story Link Detection

http://projects.ldc.upenn.edu/ace
Joint Inference for Knowledge Base Population
Joint Event Extraction via Recurrent Neural Networks
Building a Cross-document Event-Event Relation Corpus
Pattern Learning for Relation Extraction with a Hierarchical Topic Model
Seed-Based Event Trigger Labeling: How far can event descriptions get us?
Evaluation of Natural Language Tools for Italian: EVALITA 2007

https://catalog.ldc.upenn.edu/LDC2008T25
Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

http://ldc.upenn.edu/mirror/Transcriber/
A Multi-Modal Documentation System for Warao

http://www.ling.upenn.edu/courses/Fall_2003/ling001/
Sentiment Analysis of Conditional Sentences

http://acl.ldc.upenn.edu/W/W07/W07-2007.pdf
Semantic Annotation and Terminology Validation in full scientific articles in Social Sciences and Humanities (Annotation sémantique et validation terminologique en texte intégral en SHS) [in French]

http://www.ldc.upenn.edu/example
RESTful Annotation and Efficient Collaboration

http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3/index.html
Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic

http://www.cis.upenn.edu/dbikel/#stat-parser
Comparing Italian parsers on a common Treebank: the EVALITA experience

http://projects.ldc.upenn.edu/EARS/
Bridging the Gap between Linguists and Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition

http://acl.ldc.upenn.edu/J/J04/J04-4002.pdf
InterlinguaPlus Machine Translation Approach for Local Languages: Ekegusii & Swahili

https://www.cis.upenn.edu/
On-demand Injection of Lexical Knowledge for Recognising Textual Entailment
Predicting Specificity in Classroom Discussion
Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network

http://www.cis.upenn.edu/~treebank/tokenizer.sed
Linguistic Resources for Speech Parsing

https://catalog.ldc.upenn.edu/LDC2001T02
Author Name Disambiguation in MEDLINE Based on Journal Descriptors and Semantic Types
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

https://www.cis.upenn.edu/~treebank/
SoMaJo: State-of-the-art tokenization for German web and social media texts
Redefining part-of-speech classes with distributional semantic models

http://www.ldc.upenn.edu/Projects/ACE/docs/
Parallel Entity and Treebank Annotation

http://www.ldc.upenn.edu/Membership/Agreements/memberannouncement.shtml
JMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool

https://catalog.ldc.upenn.edu/ldc2006t01
Synonymy in Bilingual Context: The CzEngClass Lexicon

http://www.ldc.upenn.edu/Catalog/LDC93S1.html
State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition

https://www.ldc.upenn.edu/collaborations/
Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)

http://acl.ldc.upenn.edu/J/J93/J93-1013.pdf
InterlinguaPlus Machine Translation Approach for Local Languages: Ekegusii & Swahili

http://onlinebooks.library.upenn.edu
Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion
Bayesian Unsupervised Topic Segmentation

https://catalog.ldc.upenn.edu/LDC2015S05
PronouncUR: An Urdu Pronunciation Lexicon Generator
Data mining Mandarin tone contour shapes

http://morph.ldc.upenn.edu/Catalog/LDC97L20.html
Evaluation of Pronunciation Variants in the ASR Lexicon for Different Speaking Styles

http://acl.ldc.upenn.edu/E/
A Method of Creating New Bilingual Valency Entries using Alternations

http://projects.ldc.upenn.edu/ace/data/
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes

http://www.ldc.upenn.edu/myl/morph/buckwalter.html
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools
Lexicon Acquisition for Dialectal Arabic Using Transductive Learning
Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks
Parsing Arabic Dialects

http://projects.ldc.upenn.edu/TIDES/index.html
Arabic WordNet: Semi-automatic Extensions using Bayesian Inference

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?cat
Preliminary Lexical Framework for English-Arabic Semantic Resource Construction
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization

http://www.ldc.upenn.edu/annotation
Multi-Level Annotation in MMAX

http://www.seas.upenn.edu/~nikhild/PDTBAPI/
GrAF: A Graph-based Format for Linguistic Annotations

http://www.seas.upenn.edu/~pdtb
A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus
A Pilot Annotation to Investigate Discourse Connectivity in Biomedical Text
Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation

http://dla.library.upenn.edu/dla/olac/search.html?fq=subject_language_facet%3ASkolt
Synchronized Mediawiki based analyzer dictionary development

http://www.ldc.upenn.edu/Projects/ACE/
Semi-Supervised Learning for Relation Extraction
Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information
Exploiting Constituent Dependencies for Tree Kernel-Based Semantic Relation Extraction
Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora
A Knowledge-Based Approach for Unsupervised Chinese Coreference Resolution
Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
A Combination of Topic Models with Max-margin Learning for Relation Detection
The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution
Relation Extraction Using Label Propagation Based Semi-Supervised Learning
Modeling Commonality among Related Classes in Relation Extraction
Semi-supervised Relation Extraction with Label Propagation
HowtogetaChineseName(Entity): Segmentation and Combination Issues
Convolution Kernel over Packed Parse Forest
FICO: Web Person Disambiguation Via Weighted Similarity of Entity Contexts
Mining Inter-Entity Semantic Relations Using Improved Transductive Learning
Relation Extraction Using Convolution Tree Kernel Expanded with Entity Features
Extracting Relations with Integrated Information Using Kernel Methods
Adding multi-layer semantics to the Greek Dependency Treebank
Preemptive Information Extraction using Unrestricted Relation Discovery
Unsupervised Feature Selection for Relation Extraction
Analysis and Repair of Name Tagger Errors
The CONCISUS Corpus of Event Summaries
Using Semantic Relations to Refine Coreference Decisions
Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction
Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel
Recognising Sets and Their Elements: Tree Kernels for Entity Instantiation Identification

http://ccat.sas.upenn.edu/plc/tamilweb/hindi.html
Urdu and the Parallel Grammar Project
Morphological Richness Offsets Resource Demand – Experiences in Constructing a POS Tagger for Hindi

https://catalog.ldc.upenn.edu/LDC2017L01
PronouncUR: An Urdu Pronunciation Lexicon Generator

http://www.ling.upenn.edu/ppche-release-
Code-Switching Ubique Est - Language Identification and Part-of-Speech Tagging for Historical Mixed Text

http://catalog.ldc.upenn.edu/LDC2008T19
Splitting of Compound Terms in non-Prototypical Compounding Languages
Distant Supervision for Relation Extraction with Matrix Completion
Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference

http://www.ling.upenn.edu/hist-corpora/PPCMBE-
DynamicPower at SemEval-2016 Task 8: Processing syntactic parse trees with a Dynamic Semantics core

http://www.ldc.upenn.edu/Cata-
Generating Discourse Structures for Written Text

http://catalog.ldc.upenn.edu/docs/LDC2007T21/corefe
CROMER: a Tool for Cross-Document Event and Entity Coreference

http://www.ldc.upenn.edu/Catalog/CatalogList/
Towards Automatic Identification of Discourse Markers in Dialogs: The Case of Like

http://www.ling.upenn.edu/hist-corpora/;
The Icelandic Parsed Historical Corpus (IcePaHC)

http://www.cis.upenn.edu/adwait/statnlp.html
Dependency Tree Kernels for Relation Extraction
A Clustered Global Phrase Reordering Model for Statistical Machine Translation

http://www.ldc.upenn.edu/Catalog/byType.jsp#speech
A Study of the Influence of Speech Type on Automatic Language Recognition Performance

https://catalog.ldc.upenn.edu/LDC97S62
Enriching ASR Lattices with POS Tags for Dependency Parsing

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T09
Improve Parsing Performance by Self-Learning
Improve Parsing Performance by Self-Learning

http://www.ling.upenn.edu/histcorpora/
Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties

http://www.cis.upenn.edu/%7Ederry/translations.html
Learning Translations via Matrix Completion

https://catalog.ldc.upenn.edu/ldc2013t19
Neural Machine Translation Incorporating Named Entity
Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance
Neural Adaptation Layers for Cross-domain Named Entity Recognition
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

http://www.ldc.upenn.edu/Projects/TIDES
Partitioning Parallel Documents Using Binary Segmentation

http://www.seas.upenn.edu/pdtb/PDTBAPI
Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank

http://projects.ldc.upenn.edu/tides/translation/transassess04.pdf
Two Phase Evaluation for Selecting Machine Translation Services
Service Composition Scenarios for Task-Oriented Translation

http://projects.ldc.upenn.edu/gale/task_specifications/
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
Enriching Word Alignment with Linguistic Tags
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
Quick Rich Transcriptions of Arabic Broadcast News Speech Data
Large Scale Multilingual Broadcast Data Collection to Support Machine Translation and Distillation Technology Development

https://catalog.ldc.upenn.edu/LDC2013T22
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

https://catalog.ldc.upenn.edu/LDC2013T21
Feature Optimization for Constituent Parsing via Neural Networks
Data mining Mandarin tone contour shapes

http://www.ldc.upenn.edu/Catalog/Catalog
Off-Topic Detection in Conversational Telephone Speech
Collocation Extraction Using Monolingual Word Alignment Method

http://projects.ldc.upenn.edu/Transcription/
RUNDKAST: an Annotated Norwegian Broadcast News Speech Corpus

https://catalog.ldc.upenn.edu/LDC2006T13
An Example-Based Approach to Difficult Pronoun Resolution
ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation
ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment
ECNU: Leveraging Word Embeddings to Boost Performance for Paraphrase in Twitter
Trimming a consistent OWL knowledge base, relying on linguistic evidence
NTOUA at IJCNLP-2017 Task 2: Predicting Sentiment Scores of Chinese Words and Phrases
OCR Post-Processing Text Correction using Simulated Annealing (OPTeCA)

http://acl.ldc.upenn.edu/C/C96/C96-1079.pdf
Extending a multilingual Lexical Resource by bootstrapping Named Entity Classification using Wikipedia’s Category System

http://www.cis.upenn.edu/xtag/gramrelease.html
SemTAG, the LORIA toolbox for TAG-based Parsing and Generation
Supertagged Phrase-Based Statistical Machine Translation

https://catalog.ldc.upenn.edu/LDC2006T16
Multi-Dialect Arabic POS Tagging: A CRF Approach

http://repository.upenn.edu/wharton
Effectively Crowdsourcing Radiology Report Annotations

http://www.ldc.upenn.edu/Projects/ACE/docs/English-
Adding multi-layer semantics to the Greek Dependency Treebank
Proposal for an Extension of Traditional Named Entities: From Guidelines to Evaluation, an Overview

https://catalog.ldc.upenn.edu/LDC2019T05
Ambiguity in Explicit Discourse Connectives

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=
Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation
Arabic WordNet: Semi-automatic Extensions using Bayesian Inference
Analyzing the Performance of Automatic Speech Recognition for Ageing Voice: Does it Correlate with Dependency Level?

http://www.cis.upenn.edu/~cis639/docs/xfst.html
Partial Dependency Parsing for Irish

http://www.cis.upenn.edu/dgildea/PropBank/
Seeing Arguments through Transparent Structures

https://catalog.ldc.upenn.edu/LDC2009T14
Chinese Preposition Selection for Grammatical Error Diagnosis

http://oracc.museum.upenn.edu/doc/help/languages/
Machine Translation and Automated Analysis of the Sumerian Language

http://www.ling.upenn.edu/mideng
The Input for Syntactic Acquisition: Solutions from Language Change Modeling

http://www.cis.upenn.edu/sriramv/mywork.html
Detecting Compositionality of Verb-Object Combinations using Selectional Preferences

http://acl.ldc.upenn.edu/W/W03/W03-
Semantic forensics: An application of ontological semantics to information assurance

http://acl.ldc.upenn.edu/P/
Learning Translation Rules for a Bidirectional English-Filipino Machine Translator

http://www.ldc.upenn/edu/
Orthographic Transcription of the Spoken Dutch Corpus

http://www.cis.upenn.edu/%7eungar/eigenwords/
Composition of Word Representations Improves Semantic Role Labelling

http://www.cis.upenn.edu/dbikel/software
Task-oriented Evaluation of Syntactic Parsers and Their Representations

http://www.ldc.upenn.edu/Projects/ACE/docs/Eng-
Using Semantic Relations to Refine Coreference Decisions

http://www.ldc.upenn.edu/Projects/TDT4/
Results of the 2003 Topic Detection and Tracking Evaluation

http://www.ldc.upenn.edu/Projects/TIDES/
Do We Need Chinese Word Segmentation for Statistical Machine Translation?

http://www.seas.upenn.edu/nlp/resources/
Inducing Lexical Style Properties for Paraphrase and Genre Differentiation

http://www.cis.upenn.edu/dbikel/software.html#comparator
Frame-Semantic Parsing
Fast and Accurate Shift-Reduce Constituent Parsing

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T05
利用向量支撐機辨識中文基底名詞組的初步研究 (A Preliminary Study on Chinese Base NP Detection using SVM) [In Chinese]

http://www.cis.upenn.edu/xtag/koreantag
Light verb constructions with ‘do’ and ‘be’ in Hindi: A TAG analysis

http://morph.ldc.upenn.edu/Projects/Chinese/LDC
The Alignment Template Approach to Statistical Machine Translation

http://projects.ldc.upenn.edu/ace/docs/EnglishRDCV4-3-
Exploiting Syntactico-Semantic Structures for Relation Extraction
Semi-supervised Relation Extraction with Large-scale Word Clustering

http://www.ldc.upenn.edu/lol/textreadme.html
A Web-based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC98T30
Sentence Realization with Unlexicalized Tree Linearization Grammars

https://catalog.ldc.upenn.edu/LDC2012T13
A Legal Perspective on Training Models for Natural Language Processing

http://www.cis.upenn.edu/~chinese/parseguide.3rd.ch.pdf
Chinese Main Verb Identification: From Specification to Realization

http://www.cis.upenn.edu/xtag/tr/tech-report.html
A comparison of the XTAG and CLE Grammars for English

https://catalog.ldc.upenn.edu/ldc2012t21
Deep Recurrent Generative Decoder for Abstractive Text Summarization
Topic-Guided Variational Auto-Encoder for Text Generation

http://morph.ldc.upenn.edu/Projects/TDT3/
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval

http://ldc.upenn.edu/Catalog/
Automatic Evaluation of Relation Extraction Systems on Large-scale

http://www.ldc.upenn.edu/Projects/TDT/
A Comparative Study of Methods for Topic Modeling in Spoken Document Retrieval

http://acl.ldc.upenn.edu/acl2004/main/pdf/341_pdf_2-col.pdf
Constituent Structure for Filipino: Induction through Probabilistic Approaches

http://psd.museum.upenn
Word Segmentation for Akkadian Cuneiform

http://projects.ldc.upenn.edu/gale
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
Statistical Evaluation of Information Distillation Systems

http://www.cis.upenn.edu/ace
Tree Kernels for Semantic Role Labeling
Frame Semantic Enhancement of Lexical-Semantic Resources

http://www.ldc.upenn.edu/annotation/gesture
Spatiotemporal Coding in ANVIL

https://catalog.ldc.upenn.edu/LDC2017S11
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it

http://www.asc.upenn.edu/usr/krippendorff/webreliability2.pdf
Annotating Emotions in Meetings

http://www.seas.upenn.edu/pdtb
Annotating Attribution in the Penn Discourse TreeBank
Discourse Annotation in the PDTB: The Next Generation
Genre distinctions for discourse in the Penn TreeBank
Discourse Annotation Working Group Report
The Penn Discourse TreeBank 2.0.
Realization of Discourse Relations by Other Means: Alternative Lexicalizations
Real Time Web Text Classification and Analysis of Reading Difficulty
The CoNLL-2015 Shared Task on Shallow Discourse Parsing
Exploiting Scope for Shallow Discourse Parsing
Towards an Annotated Corpus of Discourse Relations in Hindi
Proceedings 14th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

http://bioie.ldc.upenn.edu/
Adaptation of POS Tagging for Multiple BioMedical Domains
Semi-Automated Named Entity Annotation
Efficient Annotation with the Jena ANnotation Environment (JANE)
Learning Representations for Weakly Supervised Natural Language Processing Tasks
Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling
Parallel Entity and Treebank Annotation
Exploring Representation-Learning Approaches to Domain Adaptation
Language Models as Representations for Weakly Supervised NLP Tasks
Flexible Text Segmentation with Structured Multilabel Classification
Building Domain-Specific Taggers without Annotated (Domain) Data
Domain Adaptation with Structural Correspondence Learning
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

http://projects.ldc.upenn.edu/TDT-Pilot/
Reduction of Search Space to Annotate Monolingual Corpora

https://catalog.ldc.upenn.edu/ldc2013t21
Developing Universal Dependencies for Mandarin Chinese

http://gosset.wharton.upenn.edu/
Spectral Learning Algorithms for Natural Language Processing

http://ldc.upenn.edu/Projects/EARS
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

http://acl.ldc.upenn.edu/P/P99/P99-1023.pdf
Design and Implementation of a Semantic Search Engine for Portuguese

http://www.cis.upenn.edu/treebank/tokenization.html
One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
Better Punctuation Prediction with Dynamic Conditional Random Fields
Detecting Speculative Language Using Syntactic Dependencies and Logistic Regression

http://projects.ldc.upenn.edu/ace/docs/English-Events-
Filtered Ranking for Bootstrapping in Event Extraction
Using Document Level Cross-Event Inference to Improve Event Extraction
Joint Event Extraction via Structured Prediction with Global Features
An Efficient Approach to Gold-Standard Annotation: Decision Points for Complex Tasks

http://itre.cis.upenn.edu/~myl/
Compounds and other oddities in machine translation

http://www.ling.upenn.edu/mideng/
Tree Searching/Rewriting Formalism

http://ldc.upenn.edu/Catalog/docs/LDC2002T31
Improving Question Recommendation by Exploiting Information Need

https://catalog.ldc.upenn.edu/LDC2017T10
Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation
Annotation of Tense and Aspect Semantics for Sentential AMR
Augmenting Abstract Meaning Representation for Human-Robot Dialogue
Natural Language Generation: Recently Learned Lessons, Directions for Semantic Representation-based Approaches, and the Case of Brazilian Portuguese Language

http://www.ldc.upenn.edu/Projects/ACE/Data
A Clustering Approach for Unsupervised Chinese Coreference Resolution

http://www.cis.upenn.edu/epitler/discourse.html
Cross-lingual Discourse Relation Analysis: A corpus study and a semi-supervised classification system
Automatically Scoring Freshman Writing: A Preliminary Investigation

https://web.sas.upenn.edu/danielpr/resources/
Diachronic degradation of language models: Insights from social media

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T07
Implicit Discourse Relation Recognition by Selecting Typical Training Examples

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T01
Une étude en 3D de la paraphrase: types de corpus, langues et techniques (A Study of Paraphrase along 3 Dimensions : Corpus Types, Languages and Techniques) [in French]

https://www.seas.upenn
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization

http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html
Learnability-Based Syntactic Annotation Design
An empirical study for Vietnamese dependency parsing
Layer-Based Dependency Parsing

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=
Sentence Rephrasing for Parsing Sentences with OOV Words

http://projects.ldc.upenn.edu/gale/
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
A Semi-supervised Learning Approach to Arabic Named Entity Recognition
Quick Rich Transcriptions of Arabic Broadcast News Speech Data
PPDB: The Paraphrase Database

http://www.cis.upenn.edu/chinese/
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Learning Reliable Information for Dependency Parsing Adaptation
Improving Graph-based Dependency Parsing with Decision History
Building the multilingual TUT parallel treebank
The Parallel-TUT: a multilingual and multiformat treebank
Improving Dependency Parsing with Subtrees from Auto-Parsed Data
Automatic extraction of subcategorization frames for Italian
Dependency Parsing with Short Dependency Relations in Unlabeled Data

http://projects.ldc.upenn.edu/ace/docs/
Improving Coreference Resolution by Using Conversational Metadata

http://languagelog.ldc.upenn.edu/myl/C
Computational Linguistics for Enhancing Scientific Reproducibility and Reducing Healthcare Inequities

http://www.cis.upenn.edu/~treebank/tokenization.ht
The Swedish-Turkish Parallel Corpus and Tools for its Creation

https://catalog.ldc.upenn.edu/LDC2017T01
English Multiword Expression-aware Dependency Parsing Including Named Entities

http://www.seas.upenn.edu/strctlrn/MSTParser/MSTParser.html
Beyond Chart Parsing: An Analytic Comparison of Dependency Chart Parsing Algorithms

https://catalog.ldc.upenn.edu/LDC2010T07
A Transition-based Model for Joint Segmentation, POS-tagging and Normalization

http://www.ldc.upenn.edu/Projects/Chinese/LDC_ch.htm
Design and Development of a Bilingual Reading Comprehension Corpus

http://www.cis.upenn.edu/~xtag/tech-
D-Tree Substitution Grammars

https://catalog.ldc.upenn.edu/ldc2011t07
A Language Model based Evaluator for Sentence Compression
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

http://www.ldc.upenn.edu/Catalog/docs/LDC2004
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools

http://www.cis.upenn.edu/xtag/
Reranking Translation Hypotheses Using Structural Properties
A Discriminative Approach for Dependency Based Statistical Machine Translation
A Debug Tool for Practical Grammar Development
Evaluation of LTAG Parsing with Supertag Compaction
Automated Rating of ESL Essays
A Formal Proof of Strong Equivalence for a Grammar Conversion from LTAG to HPSG-style
Resource Sharing Amongst HPSG and LTAG Communities by a Method of Grammar Conversion between FB-LTAG and HPSG

http://www.ldc.upenn.edu/Projects/ACE/docs/EDT-
Callisto: A Configurable Annotation Workbench

http://acl.ldc.upenn.edu/acl2004/qarestricteddomain/
Experiments Adapting an Open-Domain Question Answering System to the Geographical Domain Using Scope-Based Resources

http://projects.ldc.upenn.edu/gale/Translation/Editors/GALEpostedit_guidelines-3.0.2.pdf
A Comparative Study of Post-editing Guidelines

https://www.cis.upenn.edu/about-cis/events/joshi-fest/program.php
Obituary: Aravind K. Joshi

https://catalog.ldc.upenn.edu/LDC2004T25
Tools for Building an Interlinked Synonym Lexicon Network
Creating a Verb Synonym Lexicon Based on a Parallel Corpus

http://www.ldc.upenn.edu/ProjectsTDT2004
Aggregating Continuous Word Embeddings for Information Retrieval

http://www.cis.upenn.edu/~bies/manuals/tagguide.pdf
The Interplay of Form and Meaning in Complex Medical Terms: Evidence from a Clinical Corpus

http://ldc.upenn.edu/Catalog/docs/LDC2005S08/BBN-
A Conventional Orthography for Tunisian Arabic

http://projects.ldc.upenn.edu/TDT4
An Unsupervised Approach to Biography Production Using Wikipedia

http://www.ling.upenn.edu/
Comparing linguistic information in treebank annotations
The Icelandic Parsed Historical Corpus (IcePaHC)
he Syntax Student’s Companion: an eLearning Tool designed for (Computational) Linguistics Students
Phrase Dependency Parsing for Opinion Mining
The Penn Parsed Corpus of Modern British English: First Parsing Results and Analysis
Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite

http://repository.upenn.edu/
Efficient parsing with Linear Context-Free Rewriting Systems

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T01
Multi-Task Learning in Conditional Random Fields for Chunking in Shallow Semantic Parsing

http://www.ling.upenn.edu/hist-corpora/
The Icelandic Parsed Historical Corpus (IcePaHC)
Lexicon Construction and Corpus Annotation of Historical Language with the CoBaLT Editor

http://www.ldc.upenn.edu/Projects/FORM/
FORM: An Extensible, Kinematically-based Gesture Annotation Scheme.

https://catalog.ldc.upenn.edu/LDC2007T36
Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation

http://www.ldc.upenn.edu/exploration/expl2000/papers/bell/bell
Report on the Revision of the Lexicographical Standard ISO 1951 Presentation/Representation of Entries in Dictionaries

http://catalog.ldc.upenn.edu/LDC95T11
TLAXCALA: a multilingual corpus of independent news

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalog
Automatically Annotating A Five-Billion-Word Corpus of Japanese Blogs for Affect and Sentiment Analysis
The Penn Discourse TreeBank 2.0.

https://catalog.ldc.upenn.edu/LDC2005T14
Word Order Sensitive Embedding Features/Conditional Random Field-based Chinese Grammatical Error Detection
NCTU-NTUT at IJCNLP-2017 Task 2: Deep Phrase Embedding using bi-LSTMs for Valence-Arousal Ratings Prediction of Chinese Phrases
Accurate Linear-Time Chinese Word Segmentation via Embedding Matching
Synthetic Word Parsing Improves Chinese Word Segmentation
NCTU and NTUT’s Entry to CLP-2014 Chinese Spelling Check Evaluation

https://catalog.ldc.upenn.edu/LDC2014T06
String Kernels for Native Language Identification: Insights from Behind the Curtains
Native Language Identification With Classifier Stacking and Ensembles
A Corpus of Non-Native Written English Annotated for Metaphor
Automated Essay Scoring with Discourse-Aware Neural Models

https://catalog.ldc.upenn.edu/LDC2005T19
Enriching ASR Lattices with POS Tags for Dependency Parsing

http://www.ling.upenn.edu/phono_atlas/home.html
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

http://projects.ldc.upenn.edu/TDT/
Thread Cleaning and Merging for Microblog Topic Detection

https://catalog.ldc.upenn.edu/ldc2006t06
evision PDF of 'Low-resource Cross-lingual Event Type Detection via Distant Supervision with Minimal Effort

http://ldc.upenn.edu/Catalog
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards

http://www.cis.upenn.edu/ccb/ppdb/
MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation

http://www.ldc.upenn.edu/tools/XTrans
Transcription Methods for Consistency, Volume and Efficiency
New language resources for the Pashto language

https://catalog.ldc.upenn.edu/LDC2005S14
Multi-Dialect Arabic POS Tagging: A CRF Approach

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalog
Learning to Order Natural Language Texts

http://www.upenn.edu/cth/
Annotating Information Structures in Chinese Texts Using HowNet

https://catalog.ldc.upenn.edu/ldc2011t03
Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp
QuickView: NLP-based Tweet Search
SRL-Based Verb Selection for ESL

http://catalog.ldc.upenn.edu/LDC2011T13
Be Appropriate and Funny: Automatic Entity Morph Encoding

http://projects.ldc.upenn.edu/TIDES/
GLEU: Automatic Evaluation of Sentence-Level Fluency

http://acl.ldc.upenn
Extracting Transfer Rules for Multiword Expressions from Parallel Corpora

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13
Toward Plot Units: Automatic Affect State Analysis
phloat : Integrated Writing Environment for ESL learners
Learning to Find Translations and Transliterations on the Web based on Conditional Random Fields
International Journal of Computational Linguistics & Chinese Language Processing, Volume 18, Number 1, March 2013

http://projects.ldc.upenn.edu/gale/Transcription/XTran
Quick Rich Transcriptions of Arabic Broadcast News Speech Data

http://projects.ldc.upenn.edu/Chinese/LDCch.htm
Cross-Lingual News Group Recommendation Using Cluster-Based Cross-Training

http://projects.ldc.upenn.edu/LCTL/Specifications/SimpleNamedEntityGuidelinesV6.5.pdf
NERSIL - the Named-Entity Recognition System for Iban Language

http://www.seas.upenn.edu/epavlick/data.html
MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting

http://projects.ldc.upenn.edu/Chinese/LDC
Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation
Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews
Cross Language Text Categorization Using a Bilingual Lexicon
Unsupervised Tokenization for Machine Translation

https://catalog.ldc.upenn.edu/LDC2008T13
Coherence Modeling Improves Implicit Discourse Relation Recognition

http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf
Classifying mood in plurks

https://catalog.ldc.upenn.edu/LDC2009T26
Enriching ASR Lattices with POS Tags for Dependency Parsing

http://catalog.ldc.upenn.edu/docs
Language Resources and Annotation Tools for Cross-Sentence Relation Extraction

http://bioie.ldc.upenn
A Proposal for a Configurable Silver Standard

http://catalog.ldc.upenn.edu/LDC2017T03
Gender as a Variable in Natural-Language Processing: Ethical Considerations

http://www.ldc.upenn.edu/Projects/MDE
The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese
Effective Use of Prosody in Parsing Conversational Speech
A Progressive Feature Selection Algorithm for Ultra Large Feature Spaces

https://catalog.ldc.upenn.edu/LDC2011T13
Learning to Distill: The Essence Vector Modeling Framework
Neural Word Segmentation with Rich Pretraining
Subword Encoding in Lattice LSTM for Chinese Word Segmentation

http://www.cis.upenn.edu/proj/hlt-naacl-2006-dc/
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium

https://www.ldc.upenn
Can We Create a Tool for General Domain Event Analysis?
Distant Supervision for Relation Extraction beyond the Sentence Boundary
Event Linking with Sentential Features from Convolutional Neural Networks
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

http://seas.upenn.edu/strctlrn/BioTagger/BioTagger.html
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger
A Multi-Platform Annotation Ecosystem for Domain Adaptation

http://www.ldc.upenn.edu/Mixer/
Current Projects in Languages of Military Interest at the Defense Language Institute

http://www.cis.upenn.edu/treebank/home.html
WikiNet: A Very Large Scale Multi-Lingual Concept Network
LPath+: A First-Order Complete Language for Linguistic Tree Query

http://projects.ldc.upenn.edu/ace/tools/jan_2004_tool
C-3: Coherence and Coreference Corpus

http://fave.ling.upenn.edu
A Web Application for Automated Dialect Analysis

http://www.cis.upenn
A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge
Multi-Task Active Learning for Linguistic Annotations
Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
Dependency Parsing with Undirected Graphs
Towards User-Adaptive Annotation Guidelines
Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
Coordinate Structure Analysis with Global Structural Constraints and Alignment-Based Local Features
TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering

http://projects.ldc.upenn.edu/gale/Translation/specs/
New Resources for Document Classification, Analysis and Translation Technologies

http://itre.cis.upenn.edu/
No Sentence Is Too Confusing To Ignore
Compounds and other oddities in machine translation
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

http://www.cis.upenn.edu/~chinese/ctb.html
A Segmentation Matrix Method for Chinese Segmentation Ambiguity Analysis
Chinese Sketch Engine and the Extraction of Grammatical Collocations
A Study of Applying BTM Model on the Chinese Chunk Bracketing
Use of Event Types for Temporal Relation Identification in Chinese Text
International Journal of Computational Linguistics & Chinese Language Processing, Volume 21, Number 1, June 2016

https://catalog.ldc.upenn.edu/LDC2018T09
SPADE: Evaluation Dataset for Monolingual Phrase Alignment

http://www.ldc.upenn.edu/annotation/AG/
Transcribing with Annotation Graphs

http://www.cis.upenn.edu/treebank
Tree Kernels for Semantic Role Labeling
Learning to Translate: A Query-Specific Combination Approach for Cross-Lingual Information Retrieval
A Common Framework for Syntactic Annotation

https://catalog.ldc.upenn.edu/LDC2012T21
Frustratingly Easy Model Ensemble for Abstractive Summarization
C4Corpus: Multilingual Web-size Corpus with Free License
Language as a Latent Variable: Discrete Generative Models for Sentence Compression
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

http://www.cis.upenn.edu/~mpalmer/
Issues in Synchronizing the English Treebank and PropBank

http://www.cis.upenn.edu/~xtag/tech-report/node248.html
Two-Fold Filtering for Chinese Subcategorization Acquisition with Diathesis Alternations Used as Heuristic Information

http://projects.ldc.upenn.edu/TDT5/
Cross-document Temporal and Spatial Person Tracking System Demonstration
One-Class Clustering in the Text Domain

http://wave.ldc.upenn
The OLAC Metadata Set and Controlled Vocabularies

http://www.ldc.upenn.edu/Catalog/docs/LDC2002T31/
Learning Adaptable Patterns for Passage Reranking
iKernels-Core: Tree Kernel Learning for Textual Similarity

https://catalog.ldc.upenn.edu/LDC2008T19
Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms
Identification and Characterization of Newsworthy Verbs in World News
A Joint Model for Semantic Sequences: Frames, Entities, Sentiments
Two Discourse Driven Language Models for Semantics
Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource
ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation
Unsupervised Learning of Distributional Relation Vectors
Reducing Lexical Features in Parsing by Word Embeddings
Leveraging FrameNet to Improve Automatic Event Detection
Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks
ECNU: Leveraging Word Embeddings to Boost Performance for Paraphrase in Twitter
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks
ECNU at SemEval-2016 Task 1: Leveraging Word Embedding From Macro and Micro Views to Boost Performance for Semantic Textual Similarity
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey
Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms
Event Detection without Triggers
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics
Controlling Grammatical Error Correction Using Word Edit Rate

http://www.seas.upenn.edu/~mdredze/datasets/sentiment/
A Framework of Feature Selection Methods for Text Categorization
SESS: A Self-Supervised and Syntax-Based Method for Sentiment Classification
Sentiment Classification and Polarity Shifting
Sentiment Classification Considering Negation and Contrast Transition
Multi-domain Sentiment Classification
Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification

http://projects.ldc.upenn.edu/ANC/ANC
Shared Corpora Working Group Report

http://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtb-annotation-manual.pdf
Does Tectogrammatics Help the Annotation of Discourse?
Semi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT

http://ww.ldc.upenn.edu/Project/GALE
How Much Can We Gain from Supervised Word Alignment?

http://www.sas.upenn.edu/haroldfs/
Tracking Sentiment in Mail: How Genders Differ on Emotional Axes

http://repository.upenn.edu/cgi/viewcontent.cgi?article=
Using a Corpus of English and Chinese Political Speeches for Metaphor Analysis

https://www.ldc.upenn.edu/language-resources/data/data-
Trends in HLT Research: A Survey of LDC’s Data Scholarship Program

http://www.cis.upenn.edu/%7Enlp/corpora/lrec16spec.html
Improving the Annotation of Sentence Specificity

http://www.Idc.upenn.edu/ldc/about/chjapanese.html
Generation of Adaptive Vocabulary Lexicon for Japanese LVCSR

http://www.ldc.upenn.edu/Projects/Chinese/
Acquiring Compound Word Translations both Automatically and Dynamically
The SVM With Uneven Margins and Chinese Document Categorization

http://www.seas.upenn.edu/nlp/
Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition

http://www.seas.upenn.edu/
Annotating Discourse Relations with the PDTB Annotator
A Stacking Gated Neural Architecture for Implicit Discourse Relation Classification
Dynamic Feature Selection for Dependency Parsing
Top-down Tree Long Short-Term Memory Networks
A Fast and Accurate Dependency Parser using Neural Networks
An Empirical Study on the Effect of Morphological and Lexical Features in Persian Dependency Parsing
Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition
Edlin: an Easy to Read Linear Learning Framework
An Empirical Analysis of Formality in Online Communication
Model Invertibility Regularization: Sequence Alignment With or Without Parallel Data
A Novel Reordering Model Based on Multi-layer Phrase for Statistical Machine Translation
Head-driven Transition-based Parsing with Top-down Prediction
PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0
Variational Neural Discourse Relation Recognizer
Simple PPDB: A Paraphrase Database for Simplification
Efficient Stacked Dependency Parsing by Forest Reranking
Annotation of Discourse Relations for Conversational Spoken Dialogs
Detecting Hedge Cues and their Scopes with Average Perceptron
The Integration of Dependency Relation Classification and Semantic Role Labeling Using Bilayer Maximum Entropy Markov Models
Initial Explorations of CCG Supertagging for Universal Dependency Parsing
Automatic Question Generation using Discourse Cues
SemEval-2015 Task 15: A CPA dictionary-entry-building task
Automatic identification of general and specific sentences by leveraging discourse annotations
Minimally Supervised Event Causality Identification
Implicit Discourse Relation Recognition with Context-aware Character-enhanced Embeddings
Edge-Linear First-Order Dependency Parsing with Undirected Minimum Spanning Tree Inference
Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
Shallow Discourse Parsing with Syntactic and (a Few) Semantic Features
Reordering Modeling using Weighted Alignment Matrices

http://repository.upenn.edu/cgi/viewcontent.cgi?article
FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German

http://www.cis.upenn.edu/~dbikel/software.html
Expanding Arabic Treebank to Speech: Results from Broadcast News
Building a Large Syntactically-Annotated Corpus of Vietnamese
Tagging Spanish Texts: the Problem of Problem of “SE”
La reconnaissance des mots composés à l’épreuve de l’analyse syntaxique et vice-versa : évaluation de deux stratégies discriminantes (Recognition of Compound Words Tested against Parsing and Vice-versa : Evaluation of Two Discriminative Approaches) [in French]
Linguistic Resources for Speech Parsing

http://www.ldc.upenn.edu/annotation/database/pap
Towards Metadata Interoperability

http://morph.ldc.upenn.edu/Catalog/LDC99T3
Robust Temporal Processing of News

http://www.cis.upenn.edu/~chinese/segguide.3
Chinese Word Segmentation at Peking University

http://catalog.ldc.upenn.edu/LDC2000T43
TLAXCALA: a multilingual corpus of independent news

https://catalog.ldc.upenn.edu/ldc2006t13
Identification of Flexible Multiword Expressions with the Help of Dependency Structure Annotation
Discovering the Language of Wine Reviews: A Text Mining Account

http://catalog.ldc.upenn.edu/LDC2008T25
Individual Variation in the Choice of Referential Form

http://www.asc.upenn.edu/usr/krippendorff/mw
A Citation Centric Annotation Scheme for Scientific Articles

http://www.ling.upenn.edu/courses/Fall_
On- and Off-Topic Classification and Semantic Annotation of User-Generated Software Requirements

http://www.seas.upenn.edu/~strctlrn/
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations

http://projects.ldc.upenn.edu/gale/Translation
Corpus Support for Machine Translation at LDC
Enhanced Infrastructure for Creation and Collection of Translation Resources

http://projects.ldc.upenn.edu/HARD/
Supporting Multiple Information-Seeking Strategies in a Single System Framework

http://projects.ldc.upenn.edu/Transcription/quick-tr
Bridging the Gap between Linguists and Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition

http://www.ldc.upenn.edu/Catalog/docs/LDC2009T24/
Merging Word Senses

http://www.cis.upenn.edu/~chinese/
Customizable Segmentation of Morphologically Derived Words in Chinese
The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
A Hybrid Approach to Chinese Base Noun Phrase Chunking
Chinese Chunking Based on Maximum Entropy Markov Models
The Headedness of Mandarin Chinese Serial Verb Constructions: A Corpus-Based Study
Skeleton Parsing in Chinese: Annotation Scheme and Guidelines

http://www.ldc.upenn.edu/Catalog/docs/
Further Meta-Evaluation of Broad-Coverage Surface Realization
Practical Queries of a Massive n-gram Database

http://www.cis.upenn.edu/datamining/software
An Exploration of Features for Recognizing Word Emotion

https://catalog.ldc.upenn.edu/LDC2015T03
Knowledge Base Population for Organization Mentions in Email
Activity Modeling in Email

https://catalog.ldc.upenn.edu/LDC2003T05
evision PDF of 'Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration
Go Climb a Dependency Tree and Correct the Grammatical Errors
Cohere: A Toolkit for Local Coherence
Are Emojis Predictable?
The Trouble with Machine Translation Coherence
A Transition-based Model for Joint Segmentation, POS-tagging and Normalization
Addressing Domain Adaptation for Chinese Word Segmentation with Global Recurrent Structure

http://www.ldc.upenn.edu/catalog/
The OSU Quake 2004 corpus of two-party situated problem-solving dialogs

http://catalog.ldc.upenn.edu/LDC2017T10
Multitask Parsing Across Semantic Representations

http://www.ldc.upenn.edu/Catalog/docs/LDC2006T13
ECNUCS: Measuring Short Text Semantic Equivalence Using Multiple Similarity Measurements

http://repository.upenn.edu/ircs_reports/53/
Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment

http://www.cis.upenn.edu/jshi/
A Diverse Dirichlet Process Ensemble for Unsupervised Induction of Syntactic Categories

https://catalog.ldc.upenn.edu/LDC2002S28
Zara The Supergirl: An Empathetic Personality Recognition System

https://catalog.ldc.upenn.edu/LDC2002L27
Generating Abbreviations for Chinese Named Entities Using Recurrent Neural Network with Dynamic Dictionary
Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization

http://www.cis.upenn.edu/bikel/software.html
Improving Chinese POS Tagging with Dependency Parsing

http://www.ldc.upenn.edu/ldc/online/treebank/REA
A Test Environment for Natural Language Understanding Systems
A Test Environment for Natural Language Understanding Systems

http://www.cis.upenn.edu/
Using Machine-Learning to Assign Function Labels to Parser Output for Spanish
A Study of Scientific Writing: Comparing Theoretical Guidelines with Practical Implementation
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
Improving Part-of-speech Tagging for Context-free Parsing
ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity
Towards Identifying the Resolvability of Threads in MOOCs
Word Embeddings through Hellinger PCA
Techniques to Incorporate the Benefits of a Hierarchy in a Modified Hidden Markov Model
Part-of-Speech Tagging of Transcribed Speech
Soft Cross-lingual Syntax Projection for Dependency Parsing
Identification and Characterization of Newsworthy Verbs in World News
A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
Retrofitting Word Vectors to Semantic Lexicons
Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
The NomBank Project: An Interim Report
Annotating Noun Argument Structure for NomBank
‘Category families’ for Categorial Grammars
Abar-Hitz: An Annotation Tool for the Basque Dependency Treebank
Constraint Based Description of Polish Multiword Expressions
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Turning Elementary Trees into Feature Structures
Gaussian LDA for Topic Models with Word Embeddings
Concise Integer Linear Programming Formulations for Dependency Parsing
The MULI Project: Annotation and Analysis of Information Structure in German and English
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Improving Chinese Semantic Role Labeling with Rich Syntactic Features
A Classification of Grammar Development Strategies
Resolving Pronominal References in Chinese with the Hobbs Algorithm
Web Text Corpus for Natural Language Processing
Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation
Determining the Syntactic Structure of Medical Terms in Clinical Notes
Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
Refining Grammars for Parsing with Hierarchical Semantic Knowledge
Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study
Search-Aware Tuning for Machine Translation
Domain Adaptation for Parsing
Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing
Ontology-Based Argument Mining and Automatic Essay Scoring
What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain
Edinburgh’s Syntax-Based Machine Translation Systems
Joint Parsing and Named Entity Recognition
A Joint Model for Answer Sentence Ranking and Answer Extraction
Parsing the Penn Chinese Treebank with Semantic Knowledge
Thistle and Interarbora
The Proposition Bank: An Annotated Corpus of Semantic Roles
Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
Coordinate Structure Analysis with Global Structural Constraints and Alignment-Based Local Features
Cross Lingual Adaptation: An Experiment on Sentiment Classifications
Oracle Summaries of Compressive Summarization
A treebank-based study on the influence of Italian word order on parsing performance
Spectral Learning Algorithms for Natural Language Processing
Training Parsers on Partial Trees: A Cross-language Comparison
Contradiction Detection for Rumorous Claims
Computational Properties of Environment-based Disambiguation
Balancing data-driven and rule-based approaches in the context of a Multimodal Conversational System
UVA: Language Modeling Techniques for Web People Search
A Translation Model for Sentence Retrieval
Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning
A Discriminative Learning Model for Coordinate Conjunctions
Utilizing Extra-Sentential Context for Parsing
Parsing Word Clusters
Towards Model Driven Architectures for Human Language Technologies
Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank
It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text
Using reading behavior to predict grammatical functions
Fully Parsing the Penn Treebank
Extending a Dutch Text-to-Pictograph Converter to English and Spanish
Minimally Supervised Classification to Semantic Categories using Automatically Acquired Symmetric Patterns
The FrameNet Database and Software Tools
K-Best A* Parsing
Development and Evaluation of a Korean Treebank and its Application to NLP
Data point selection for self-training
Learning Composition Models for Phrase Embeddings
BioNLP Shared Task 2011: Supporting Resources
Probabilistic Frame-Semantic Parsing
Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French
Probabilistic Lexical Generalization for French Dependency Parsing
A Comparative Study of Syntactic Parsers for Event Extraction
Stacking Dependency Parsers
An Empirical Study of Chinese Chunking
Incorporating Word Correlation Knowledge into Topic Modeling
Fine-grained Opinion Topic and Polarity Identification
Large Scale Production of Syntactic Annotations to Move Forward
On the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization
Transparent combination of rule-based and data-driven approaches in speech understanding
Cross parser evaluation : a French Treebanks study
Penn Korean Treebank : Development and Evaluation
Towards Automatic Topical Question Generation
Third-order Variational Reranking on Packed-Shared Dependency Forests
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
SemEval-2012 Task 5: Chinese Semantic Dependency Parsing
DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output
Question Prediction Language Model
Forest Rescoring: Faster Decoding with Integrated Language Models
Cambridge: Parser Evaluation Using Textual Entailment by Grammatical Relation Comparison
Querying Both Time-aligned and Hierarchical Corpora with NXT Search
Formal Mechanisms for Capturing Regularizations
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
Accurate Context-Free Parsing with Combinatory Categorial Grammar
Hybrid Document Indexing with Spectral Embedding
Enumeration of Extractive Oracle Summaries
Notes on the Evaluation of Dependency Parsers Obtained Through Cross-Lingual Projection
Max-Violation Perceptron and Forced Decoding for Scalable MT Training
Modeling Semantic Relations Expressed by Prepositions
Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
Streaming Analysis of Discourse Participants
Topic Segmentation with Hybrid Document Indexing
Annotation and Data Mining of the Penn Discourse TreeBank
PASSAGE Syntactic Representation: a Minimal Common Ground for Evaluation
Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective
Identification and Disambiguation of Lexical Cues of Rhetorical Relations across Different Text Genres
Using Grammatical Relations to Compare Parsers
MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization
Joint Morphological and Syntactic Disambiguation
Dependency Parser for Chinese Constituent Parsing
Evaluating Discourse in Structured Text Representations

http://www.cis.upenn.edu/~treebank/
MWEs in Treebanks: From Survey to Guidelines
HamleDT: To Parse or Not to Parse?
TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models
Building NLP resources for Dzongkha: A Tagset and A Tagged Corpus
Extraction of Translation Unit from Chinese-English Parallel Corpora
Tokenization: Returning to a Long Solved Problem — A Survey, Contrastive Experiment, Recommendations, and Toolkit —
Factored Language Model based on Recurrent Neural Network
Evaluating and Integrating Treebank Parsers on a Biomedical Corpus
Unsupervised Negation Focus Identification with Word-Topic Graph Model
A Grammar-driven Convolution Tree Kernel for Semantic Role Classification
UM-Checker: A Hybrid System for English Grammatical Error Correction
Graphical Annotation for Syntax-Semantics Mapping

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2008T19
Combinaison de ressources générales pour une contextualisation implicite de requêtes (Query Contextualization and Reformulation by Combining External Corpora) [in French]

http://acl.ldc.upenn.edu/P/P03/P03-2025.pdf
Unsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation

http://acl.ldc.upenn.edu/hlt-
Segment Predictability as a Cue in Word Segmentation: Application to Modern Greek

http://projects.ldc.upenn.edu/gale/Translation/specs/G
Entity Translation and Alignment in the ACE-07 ET Task

https://catalog.ldc.upenn.edu/LDC2013T19
Beyond Plain Spatial Knowledge: Determining Where Entities Are and Are Not Located, and For How Long
Annotating Temporally-Anchored Spatial Knowledge on Top of OntoNotes Semantic Roles
Exploring Options for Fast Domain Adaptation of Dependency Parsers
Domain Adaptation for Dependency Parsing via Self-Training
A Transition-Based Directed Acyclic Graph Parser for UCCA
Neural Transition Based Parsing of Web Queries: An Entity Based Approach
Unsupervised AMR-Dependency Parse Alignment
Bridging Sentential and Discourse-level Semantics through Clausal Adjuncts
Learning to Map Dependency Parses to Abstract Meaning Representations
Liberal Event Extraction and Event Schema Induction
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Reliability-aware Dynamic Feature Composition for Name Tagging
Incorporating Context and External Knowledge for Pronoun Coreference Resolution
Knowledge-aware Pronoun Coreference Resolution

http://projects.ldc.upenn.edu/
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
GAF: A Grounded Annotation Framework for Events

http://www.seas.upenn.edu/strctlrn/StructLearn
Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis

http://www.ldc.upenn.edu/Catalog/docs/LDC200
When Annotation Schemes Change Rules Help: A Configurable Approach to Coreference Resolution beyond OntoNotes

https://catalog.ldc.upenn.edu/LDC2006T08
Enriching TimeBank: Towards a more precise annotation of temporal relations in a text
Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths
Inducing Temporal Relations from Time Anchor Annotation

http://oracc.museum.upenn.edu/dcclt/
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

https://catalog.ldc.upenn.edu/LDC2005T20
The International Corpus of Arabic: Compilation, Analysis and Evaluation

https://catalog.ldc.upenn.edu/LDC2005T23
A Progressive Learning Approach to Chinese SRL Using Heterogeneous Data
Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks
Capturing Argument Relationship for Chinese Semantic Role Labeling

https://catalog.ldc.upenn.edu/LDC2006T06
A Case Study on Learning a Unified Encoder of Relations
Adversarial Feature Adaptation for Cross-lingual Relation Classification
Leveraging FrameNet to Improve Automatic Event Detection
Learning Transferable Representation for Bilingual Relation Extraction via Convolutional Neural Networks
Joint Mention Extraction and Classification with Mention Hypergraphs
Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms
Event Detection without Triggers
Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning
Document-Level Event Factuality Identification via Adversarial Neural Network
Content-based Dwell Time Engagement Prediction Model for News Articles

http://projects.ldc.upenn.edu/TDT2/
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

http://www.ldc.upenn.edu/Catalog/index.jsp
Bilingual Word Spectral Clustering for Statistical Machine Translation
Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition

http://www.ldc.upenn.edu/exploration/
The OLAC Metadata Set and Controlled Vocabularies

https://catalog.ldc.upenn.edu/
Sheffield Submissions for the WMT18 Quality Estimation Shared Task
Incorporating Relation Paths in Neural Relation Extraction
Real-Time News Summarization with Adaptation to Media Attention
Modeling Linguistic and Personality Adaptation for Natural Language Generation
A New Approach for Idiom Identification Using Meanings and the Web
Adapting Topic Models using Lexical Associations with Tree Priors
Synonymy in Bilingual Context: The CzEngClass Lexicon
Feature Extraction for Native Language Identification Using Language Modeling
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Empirical comparison of dependency conversions for RST discourse trees
Chinese NER Using Lattice LSTM
A Neural Layered Model for Nested Named Entity Recognition
Joint Reasoning for Temporal and Causal Relations
Using Twitter to Collect a Multi-Dialectal Corpus of Arabic
Grammatical Error Correction Considering Multi-word Expressions
Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation
A review of Spanish corpora annotated with negation
Chinese Spell Checking Based on Noisy Channel Model
Apples to Apples: Learning Semantics of Common Entities Through a Novel Comprehension Task
An enhanced automatic speech recognition system for Arabic
Universal Dependency Parsing with a General Transition-Based DAG Parser
Towards Broad-coverage Meaning Representation: The Case of Comparison Structures
Predicting and Using Implicit Discourse Elements in Chinese-English Translation
Discourse Annotation of Non-native Spontaneous Spoken Responses Using the Rhetorical Structure Theory Framework
Joint Part-of-Speech and Language ID Tagging for Code-Switched Data
Distant Supervision for Relation Extraction beyond the Sentence Boundary
Non-projectivity and valency
DeepPavlov: Open-Source Library for Dialogue Systems
This is how we do it: Answer Reranking for Open-domain How Questions with Paragraph Vectors and Minimal Feature Engineering
Learning to Jointly Predict Ellipsis and Comparison Structures
AMR dependency parsing with a typed semantic algebra
A Call for Clarity in Reporting BLEU Scores
An Improved Tag Dictionary for Faster Part-of-Speech Tagging
CirdoX: an on/off-line multisource speech and sound analysis software
Modelling Pro-drop with the Rational Speech Acts Model
Learning Paraphrasing for Multiword Expressions
Grouping business news stories based on salience of named entities
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation
Acoustic Word Disambiguation with Phonogical Features in Danish ASR
Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation
A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output
Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction
Zero Alignment of Verb Arguments in a Parallel Treebank
A Code-Switching Corpus of Turkish-German Conversations
A Walk-based Model on Entity Graphs for Relation Extraction
Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks
Learning Connective-based Word Representations for Implicit Discourse Relation Identification
A POS Tagging Model Adapted to Learner English
A Simple Method for Clarifying Sentences with Coordination Ambiguities
Tagging Performance Correlates with Author Age
Coreference and Focus in Reading Times
A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification
Olive Oil is Made of Olives, Baby Oil is Made for Babies: Interpreting Noun Compounds Using Paraphrases in a Neural Model
From Visual Attributes to Adjectives through Decompositional Distributional Semantics
Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data
Nested Named Entity Recognition Revisited
Resolving Shell Nouns
Knowledge-Based Semantic Embedding for Machine Translation
Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem
Abstract Meaning Representation for Human-Robot Dialogue
Detecting (Un)Important Content for Single-Document News Summarization
Neural Architectures for Nested NER through Linearization
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage
Parsing Meaning Representations: Is Easier Always Better?
The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction
Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM
Using Rhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous Speech
Modeling Financial Analysts’ Decision Making via the Pragmatics and Semantics of Earnings Calls
Open Domain Event Extraction Using Neural Latent Variable Models
Learning to Explain: Answering Why-Questions via Rephrasing
Resolving Gendered Ambiguous Pronouns with BERT
Incorporating Word Attention into Character-Based Word Segmentation

http://acl.ldc.upenn.edu/C/C90/C90-2036.pdf
Concrete Assignments for Teaching NLP in an M.S. Program

http://acl.ldc.upenn.edu/W/W97/W97-1010.pdf
Constituent Structure for Filipino: Induction through Probabilistic Approaches

http://www.ldc.upenn.edu/mirror/Transcriber/
The Italian NESPOLE! Corpus: a Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
國語廣播新聞語料轉述系統之效能評估 (Evaluation of Mandarin Broadcast News Transcription System) [In Chinese]

http://repository.upenn.edu/cgi/viewcontent.cgi?article=1490&context=cis_reports
Constituent Structure for Filipino: Induction through Probabilistic Approaches

http://www.ldc.upenn.edu/Projects/Corpus_Cookbook/trans-
The COST278 Pan-European Broadcast News Database

http://catalog.ldc.upenn.edu/
Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution
A Unified Framework for Grammar Error Correction
Lexical Substitution for the Medical Domain
Encoding Semantic Resources in Syntactic Structures for Passage Reranking
Word Sense Filtering Improves Embedding-Based Lexical Substitution
Deep Reinforcement Learning for Chinese Zero Pronoun Resolution

http://www.cis.upenn.edu/~dbikel/download
CRF tagging for head recognition based on Stanford parser

http://psd.museum.upenn.edu/
Towards a Linked Open Data Edition of Sumerian Corpora
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

http://catalog.ldc.upenn
What Substitutes Tell Us - Analysis of an “All-Words” Lexical Substitution Corpus

http://www.cis.upenn.edu/josephr/TIDES
An integrated framework for treebanks and multilayer annotations

https://catalog.ldc.upenn.edu/ldc93s1
Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing

http://catalog.ldc.upenn.edu/LDC2004T14
Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation

http://catalog.ldc.upenn.edu/LDC2013T19
Zero Pronoun Resolution with Attention-based Neural Network
Chinese Zero Pronoun Resolution with Deep Memory Network

http://www.cis.upenn.edu/dbikel/
Constituent Parsing by Classification

http://www.cis.upenn.edu/~treebank
Semantic Role Labeling via Instance-Based Learning

http://www.ircs.upenn.edu/arabic/Jan03release/
Enlisting the Ghost: Modeling Empty Categories for Machine Translation

http://repository.upenn.edu/ircs_reports/37/
Developing Universal Dependencies for Mandarin Chinese

http://bioie.ldc.upenn.edu/wiki/index.php/Main_Page
System Evaluation on a Named Entity Corpus from Clinical Notes

http://projects.ldc.upenn.edu/ace/docs/English-Even
The Impact of Task and Corpus on Event Extraction Systems

https://www.seas.upenn.edu/~nlp/corpora/sumrepo.html
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization

http://morph.ldc.upenn.edu/Catalog/LDC99T37.html
Guidelines for Annotating Temporal Information

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?c
Introduction to CKIP Chinese Spelling Check System for SIGHAN Bakeoff 2013 Evaluation
Automatic Arabic diacritics restoration based on deep nets
Deep Learning Models for Sentiment Analysis in Arabic

http://tools.ldc.upenn.edu
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium

http://catalog.ldc.upenn.edu/LDC2007T03
Why Chinese Web-as-Corpus is Wacky? Or: How Big Data is Killing Chinese Corpus Linguistics

http://www.ldc.upenn.edu/Projects/GALE
Integrated Linguistic Resources for Language Exploitation Technologies

http://www.ldc.upenn.edu/Projects/Corpus
Development of Slovenian Broadcast News Speech Database

http://www.ircs.upenn.edu/arabic/Jan03release/arabic-
Joint Arabic Segmentation and Part-Of-Speech Tagging
Adapting Standard Open-Source Resources To Tagging A Morphologically Rich Language: A Case Study With Arabic
POS Tagging of Dialectal Arabic: A Minimally Supervised Approach
A Hybrid Approach for Building Arabic Diacritizer

http://www.ldc.upenn.edu/ldc/service/comp-ie
Determining Recurrent Sound Correspondences by Inducing Translation Models

http://www.ldc.upenn.edu/Catalog/catalog
UM-Checker: A Hybrid System for English Grammatical Error Correction

http://projects.ldc.upenn.edu/ace/docs/English-Event
Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

http://fave.ling.upenn.edu/downloads/
A Text Normalisation System for Non-Standard English Words

http://www.cis.upenn.edu/lannie/topicS.html
Towards Topic-to-Question Generation

http://www.ldc.upenn.edu/Catalog/
Language Technology Resource Center
A Beam-Search Decoder for Normalization of Social Media Text with Application to Machine Translation
Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut
Frontiers in Linguistic Annotation for Lower-Density Languages
The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia
Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
Iterative Refinement and Quality Checking of Annotation Guidelines — How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena
Echoes of Persuasion: The Effect of Euphony in Persuasive Communication
A Corpus of Preposition Supersenses
The IFADV Corpus: a Free Dialog Video Corpus
The potential and limits of lay post-editing in an online community
The LREC Map of Language Resources and Technologies
ASMA: A System for Automatic Segmentation and Morpho-Syntactic Disambiguation of Modern Standard Arabic
Creating Annotation Tools with the Annotation Graph Toolkit
Anaphora Resolution with the ARRAU Corpus
Models and Tools for Collaborative Annotation
Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features
Construction of a Metadata Database for Efficient Development and Use of Language Resources
Wiki-ly Supervised Part-of-Speech Tagging
A Probabilistic Co-Bootstrapping Method for Entity Set Expansion
Google Web 1T 5-Grams Made Easy (but not for the computer)
Measuring the Divergence of Dependency Structures Cross-Linguistically to Improve Syntactic Projection Algorithms
Word Sense Disambiguation Using Sense Examples Automatically Acquired from a Second Language
Single-Document Summarization as a Tree Knapsack Problem
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
Comparing Corpus-based to Web-based Lookup Techniques for Automatic English Inclusion Detection
Djangology: A Light-weight Web-based Tool for Distributed Collaborative Text Annotation
Annotating Anaphoric Shell Nouns with their Antecedents
Exploring Sensorial Features for Metaphor Identification
Large Scale Relation Detection
Coarse-grained Candidate Generation and Fine-grained Re-ranking for Chinese Abbreviation Prediction
Instance-Based Ontology Population Exploiting Named-Entity Substitution
Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
Antelogue: Pronoun Resolution for Text and Dialogue
FBK-irst: Lexical Substitution Task Exploiting Domain and Syntagmatic Coherence
Combining Shallow and Linguistically Motivated Features in Native Language Identification
Automation and Evaluation of the Keyword Method for Second Language Learning
SemEval-2012 Task 5: Chinese Semantic Dependency Parsing
LDC Language Resource Database: Building a Bibliographic Database
NameDat: A Database of English Proper Names Spoken by Native Norwegians
Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data
evision PDF of 'Segmentation and Translation of Japanese Multi-word Loanwords
Design and compilation of a specialized Spanish-German parallel corpus
Sentence Level Machine Translation Evaluation as a Ranking
Assessing the relative reading level of sentence pairs for text simplification
Machine Translation with Many Manually Labeled Discourse Connectives
Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
A Synchronous Context Free Grammar for Time Normalization
A Python Toolkit for Universal Transliteration
A Dependency Parser for Tweets
One-Class Clustering in the Text Domain
Automatic Acquisition of English Topic Signatures Based on a Second Language

http://www.ldc.upenn.edu/projects/ACE
The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition

http://languagelog.ldc.upenn.edu/nll/?p=26223
Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations

http://projects.ldc.upenn.edu/gale/Transcription/Chine
A Very Large Scale Mandarin Chinese Broadcast Corpus for GALE Project

http://acl.ldc.upenn.edu/C/C92/C92-3146.pdf
evision PDF of 'Collection, Annotation and Analysis of Gold Standard Corpora for Knowledge-Rich Context Extraction in Russian and German

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T34
A Joint Model to Identify and Align Bilingual Named Entities

http://mixer.ldc.upenn.edu
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
Low-cost Customized Speech Corpus Creation for Speech Technology Applications

http://www.cis.upenn.edu/xtag/manuals.html
Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse

http://www.ldc.upenn.edu/Projects/GALE/data/
Integrated Linguistic Resources for Language Exploitation Technologies

http://www.cis.upenn.edu/~nrh/klex.html
A Resource-based Korean Morphological Annotation System

http://www.cis.upenn.edu/dbikel/software.html#stat-
Arabic Named Entity Recognition: Using Features Extracted from Noisy Data

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2003T05
Combinaison de ressources générales pour une contextualisation implicite de requêtes (Query Contextualization and Reformulation by Combining External Corpora) [in French]

http://acl.ldc.upenn.edu/W/W03/#W03-0900
Proceedings of the 2nd Workshop on Text Meaning and Interpretation

http://www.cis.upenn.edu/~mcollins/
FrameNet-based Semantic Parsing using Maximum Entropy Models

https://catalog.ldc.upenn.edu/LDC2017T07
Multi-Dialect Arabic POS Tagging: A CRF Approach

http://catalog.ldc.upenn.edu/LDC2011T03
Exploring the utility of coreference chains for improved identification of personal names

http://catalog.ldc.upenn.edu
Supervised Phrase Table Triangulation with Neural Word Embeddings for Low-Resource Languages
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation

http://catalog.ldc.upenn.edu/LDC2014T06
Oracle and Human Baselines for Native Language Identification

http://acl.ldc.upenn.edu/P/P07/P07-3004.pdf
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research

http://www.ircs.upenn.edu/arabic/
Building the multilingual TUT parallel treebank
Repérage des entités nommées pour l’arabe : adaptation non-supervisée et combinaison de systèmes (Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination) [in French]
A treebank-based study on the influence of Italian word order on parsing performance
Person Name Entity Recognition for Arabic
The Parallel-TUT: a multilingual and multiformat treebank

http://www.cis.upenn.edu/pdtb/
Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure
Towards User-Adaptive Annotation Guidelines
Machine-Assisted Rhetorical Structure Annotation
Investigating the Characteristics of Causal Relations in Japanese Text

http://itre.cis.upenn.edu/~myl/languagelog/archives/005514.ht
How Many Multiword Expressions do People Know?

http://www.seas.upenn.edu/~pdtb/PDTBAPI/
PDTB XML: the XMLization of the Penn Discourse TreeBank 2.0

http://projects.ldc.upenn.edu/gale/Transcription/Arabic-XTrans
From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News

http://languagelog.ldc.upenn.edu/
Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?
Read my points: Effect of animation type when speech-reading from EMA data
CCGweb: a New Annotation Tool and a First Quadrilingual CCG Treebank

http://www.ldc.upenn.edu/Catalog/byType.jsp
Unsupervised Learning of Acoustic Sub-word Units

http://projects.ldc.upenn.edu/LCTL
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
Natural Language Processing for Less Privileged Languages: Where do we come from? Where are we going?
Parallel Creation of Gigaword Corpora for Medium Density Languages - an Interim Report

http://www.cis.upenn.edu/pdtb
The Penn Discourse Treebank
Attribution and the (Non-)Alignment of Syntactic and Discourse Arguments of Connectives
Annotating Discourse Connectives and Their Arguments

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T18
Machine Transliteration

https://catalog.ldc.upenn.edu/LDC2011T07
Learning to Distill: The Essence Vector Modeling Framework
MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting
Discourse Parsing with Attention-based Hierarchical Neural Networks
Fast Gated Neural Domain Adaptation: Language Model as a Case Study
Improving Twitter Named Entity Recognition using Word Representations
ICL-HD at SemEval-2016 Task 10: Improving the Detection of Minimal Semantic Units and their Meanings with an Ontology and Word Embeddings
Context-aware Entity Morph Decoding
FastHybrid: A Hybrid Model for Efficient Answer Selection
A Generalized Framework for Hierarchical Word Sequence Language Model
Vector-space topic models for detecting Alzheimer’s disease
How to Memorize a Random 60-Bit String
D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities
A Multi-media Approach to Cross-lingual Entity Knowledge Transfer
An Improved Hierarchical Word Sequence Language Model Using Directional Information
Generating Topical Poetry
Discourse Relation Sense Classification Systems for CoNLL-2016 Shared Task
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics
Unsupervised Rewriter for Multi-Sentence Compression

https://catalog.ldc.upenn.edu/LDC2011T03
A Multi-classifier Approach to support Coreference Resolution in a Vector Space Model
Adapting Coreference Resolution for Narrative Processing
Visualizing the Content of a Children’s Story in a Virtual World: Lessons Learned
Liberal Event Extraction and Event Schema Induction

https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf
Graph Convolutional Networks for Named Entity Recognition

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?
A Survey of Arabic Named Entity Recognition and Classification
Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia
A Robust Shallow Temporal Reasoning System
Scalable Decipherment for Machine Translation via Hash Sampling
Design and compilation of a specialized Spanish-German parallel corpus

http://www.cis.upenn.edu/ircs/colloq/2003/fall/fillmore.html
Tree-Rewriting Models of Multi-Word Expressions

https://catalog.ldc.upenn.edu/docs/LDC2013T19/
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
Improving the Robustness of Question Answering Systems to Question Paraphrasing

http://www.ldc.upenn.edu/ctb
Paraphrasing of Chinese Utterances
Developing Guidelines and Ensuring Consistency for Chinese Text Annotation

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId
Generalization of Words for Chinese Dependency Parsing
Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches

http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf
Combined Word Alignments

http://www.cis.upenn.edu/-adwait
Classifier Combination for Improved Lexical Disambiguation

http://catalog.ldc.upenn.edu/LDC2002L49
TECHLIMED@QALB-Shared Task 2015: a hybrid Arabic Error Correction System

https://catalog.ldc.upenn.edu/LDC2014T12
Aligning English Strings with Abstract Meaning Representation Graphs
Learning to Map Dependency Parses to Abstract Meaning Representations
Augmenting Abstract Meaning Representation for Human-Robot Dialogue

http://www.ldc.upenn
Idioms in Context: The IDIX Corpus
Utilizing Microblogs for Automatic News Highlights Extraction
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit
Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora
A Syntactically Annotated Corpus of Japanese Spoken Monologue
Bridging the Gap between Technology and Users: Leveraging Machine
Spectral Clustering for Example Based Machine Translation
Constructing a Textual Semantic Relation Corpus Using a Discourse Treebank
On-Demand Information Extraction

https://catalog.ldc.upenn.edu/LDC2009T13
Sentiment Analysis and Lexical Cohesion for the Story Cloze Task
Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches

http://catalog.ldc.upenn.edu/LDC2003T09
TLAXCALA: a multilingual corpus of independent news

http://catalog.ldc.upenn.edu/LDC2003T05
Exploring the utility of coreference chains for improved identification of personal names
“One Entity per Discourse” and “One Entity per Collocation” Improve Named-Entity Disambiguation
Vector-space models for PPDB paraphrase ranking in context

http://www.ldc.upenn.edu/ldc/online/treebank/
Netgraph – Making Searching in Treebanks Easy

http://www.cis.upenn.edu/~cis639/arabic/info/translit-
T-code compression for Arabic computational morphology

http://morph.ldc.upenn.edu/Papers/
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies

http://morph.ldc.upenn.edu/
Lexical Discovery with an Enriched Semantic Network
OLACMS: Comparisons and Applications in Chinese and Formosan Languages

http://www.ldc.upenn.edu/Projects/ACE/Annotation/20
Adding multi-layer semantics to the Greek Dependency Treebank

http://www.ldc.upenn.edu
Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation
Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
The Web as a Parallel Corpus
Constructing an Anaphorically Annotated Corpus with Non-Experts: Assessing the Quality of Collaborative Annotations
Language Model Based Arabic Word Segmentation
A Survey of Arabic Named Entity Recognition and Classification
The FLaReNet Strategic Language Resource Agenda
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
A Model for Linguistic Resource Description
An Open Architecture for the Construction and Administration of Corpora
SpeechDat across all America: SALA II
Extension of Zipf’s Law to Words and Phrases
New Directions for Language Resource Development and Distribution
Name Origin Recognition Using Maximum Entropy Model and Diverse Features
WikiWars: A New Corpus for Research on Temporal Expressions
Bilingual Connections for Trilingual Corpora: An XML Approach
The Manually Annotated Sub-Corpus: A Community Resource for and by the People
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks
ANC2Go: A Web Application for Customized Corpus Creation
Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA
Towards Cross-Lingual Textual Entailment
The Language Application Grid and Galaxy
Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Use of Coreference in Automatic Searching for Multiword Discourse Markers in the Prague Dependency Treebank
Generalizing semantic role annotations across syntactically similar verbs
Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Fuzzy Syntactic Reordering for Phrase-based Statistical Machine Translation
Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription
Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text
LAPPS/Galaxy: Current State and Next Steps
風險最小化準則在中文大詞彙連續語音辨識之研究 (Risk Minimization Criterion for Mandarin Large Vocabulary Continuous Speech Recognition) [In Chinese]
Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models
Unsupervised and Semi-supervised Learning of Tone and Pitch Accent
OntoNotes: Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
A Translation Model for Sentence Retrieval
The New Edition of the Natural Language Software Registry (an Initiative of ACL hosted at DFKI)
Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT
CMDMC: A Diachronic Digital Museum of Chinese Mandarin
NIST Rich Transcription 2002 Evaluation: A Preview
Cluster-specific Named Entity Transliteration
Collecting Code-Switched Data from Social Media
Issues in Pre- and Post-translation Document Expansion: Untranslatable Cognates and Missegmented Words
Semantic Annotation of a Japanese Speech Corpus
Unsupervised Constraint Driven Learning For Transliteration Discovery
Rich Morphology Generation Using Statistical Machine Translation
Tor, TorMd: Distributional Profiles of Concepts for Unsupervised Word Sense Disambiguation
An Out-of-Domain Test Suite for Dependency Parsing of German
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
Towards Semi-Automated Annotation for Prepositional Phrase Attachment
Computer Estimation of Spoken Language Skills
LX-Service: Web Services of Language Technology for Portuguese
SINOD - Slovenian non-native speech database
整合邊際資訊於鑑別式聲學模型訓練方法之比較研究 (A Comparative Study on Margin-Based Discriminative Training of Acoustic Models) [In Chinese]
Merging Word Senses
Towards Unsupervised Extraction of Verb Paradigms from Large Corpora
Named Entity Recognition: A Maximum Entropy Approach Using Global Information
改善以最小化音素錯誤為基礎的鑑別式聲學模型訓練於中文連續語音辨識之研究 (Improved Minimum Phone Error based Discriminative Training of Acoustic Models for Chinese Continuous Speech Reconigtion) [In Chinese]
The RWTH Aachen Machine Translation System for WMT 2012
Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation
BioSec Multimodal Biometric Database in Text-Dependent Speaker Recognition
Corpus Cleanup of Mistaken Agreement Using Word Sense Disambiguation
Word Distributions for Thematic Segmentation in a Support Vector Machine Approach
Bayesian Checking for Topic Models
MATBN: A Mandarin Chinese Broadcast News Corpus
Improving reordering performance using higher order and structural features
Automatic Discovery of Adposition Typology
Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks
On the Importance of Pivot Language Selection for Statistical Machine Translation
Learning to Merge Word Senses
Translation Quality Indicators for Pivot-based Statistical MT
Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language
Use and Evaluation of Prosodic Annotations in Dutch
Class-Based Ordering of Prenominal Modifiers
Rare Word Translation Extraction from Aligned Comparable Documents
Preface
Building a Large Lexical Databank Which Provides Deep Semantics
SLR Validation: Present State of Affairs and Prospects

http://www.cis.upenn.edu/treebank/
Annotating modals with GraphAnno, a configurable lightweight tool for multi-level annotation
BALLGAME: A Corpus for Computational Semantics
Analysis of TimeBank as a Resource for TimeML Parsing
Comparison of Similarity Models for the Relation Discovery Task
Joint English Spelling Error Correction and POS Tagging for Language Learners Writing
Linguistically Motivated Question Classification
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05
Towards Weakly Supervised Resolution of Null Instantiations

http://projects.ldc.upenn.edu/gale/data/catalog.html
Recent Improvements in the CMU Large Scale Chinese-English SMT System
Pushdown Automata in Statistical Machine Translation
Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars
Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers
Lattice-Based Minimum Error Rate Training Using Weighted Finite-State Transducers with Tropical Polynomial Weights
Hierarchical Phrase-based Translation Representations

https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/
Building a Cross-document Event-Event Relation Corpus
Seed-Based Event Trigger Labeling: How far can event descriptions get us?

http://catalog.ldc.upenn.edu/LDC2002T31
Detection of Topic and its Extrinsic Evaluation Through Multi-Document Summarization
Text Categorization by Learning Predominant Sense of Words as Auxiliary Task

http://www.ldc.upenn.edu/-
On the Robustness of Syntactic and Semantic Features for Automatic MT Evaluation

http://www.cis.upenn.edu/treebank/tok-
Words and Word Usage: Newspaper Text versus the Web

https://catalog.ldc.upenn.edu/LDC93T3A
Adapting predominant and novel sense discovery algorithms for identifying corpus-specific sense differences

http://catalog.ldc.upenn.edu/LDC2010T06
Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners

http://www.seas.upenn.edu/~strctlrn/MSTParser/MS
NJU-Parser: Achievements on Semantic Dependency Parsing

http://morph.ldc.upenn.edu/annotation/
Semantic Annotation of a Japanese Speech Corpus

http://www.cis.upenn.edu/~melamed/
Book Reviews: Ambiguity Resolution in Language Learning: Computational and Cognitive Models

http://www.ldc.upenn.edu/:
TC-STAR: New language resources for ASR and SLT purposes

http://www.cis.upenn.edu/~dbikel/download.html
An Empirical Study of Translation Rule Extraction with Multiple Parsers

http://projects.ldc.upenn
A Survey of Arabic Named Entity Recognition and Classification
Annotating Participant Reference in English Spoken Conversation
Investigating Statistical Techniques for Sentence-Level Event Classification
Labelling and Spatio-Temporal Grounding of News Events

http://www.ldc.upenn.edu/sb/isle.html
OLACMS: Comparisons and Applications in Chinese and Formosan Languages

http://www.ldc.upenn.edu/exploration/expl2000
Issues in the design, construction and use of Language Resources (LR) for Endangered Languages (Els)

http://projects.ldc.upenn.edu/gale/Transcription/
Transcription Methods for Consistency, Volume and Efficiency

http://www.ldc.upenn.edu/projects/tdt4/annotation
Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization

http://www.ldc.upenn.edu/Projects/ACE/intro.htm
Semantically Rich Human-Aided Machine Annotation

https://catalog.ldc.upenn.edu/LDC99L22
PronouncUR: An Urdu Pronunciation Lexicon Generator

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?
Using Morphological and Syntactic Structures for Chinese Opinion Analysis
Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
Language Models and Reranking for Machine Translation
Sentence Level Discourse Parsing using Syntactic and Lexical Information
Efficient Methods for Incorporating Knowledge into Topic Models
Cross Language Text Classification by Model Translation and Semi-Supervised Learning
Improving Semantic Role Labeling with Word Sense
Generating an Entailment Corpus from News Headlines
The SAWA Corpus: A Parallel Corpus English - Swahili
Chinese Sketch Engine and the Extraction of Grammatical Collocations
Search right and thou shalt find ... Using Web Queries for Learner Error Detection
NAIST at the HOO 2012 Shared Task
ANTUSD: A Large Chinese Sentiment Dictionary
Metric Learning for Synonym Acquisition
A Supervised Learning Approach to Automatic Synonym Identification Based on Distributional Features
Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages
Single Document Summarization based on Nested Tree Structure
Context Feature Selection for Distributional Similarity
Other-Anaphora Resolution in Biomedical Texts with Automatically Mined Patterns
Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms
Jointly Modeling WSD and SRL with Markov Logic
Distributed Language Modeling for N-best List Re-ranking-best List Re-ranking
Automatic Extraction of Arabic Multiword Expressions
Arabic Word Generation and Modelling for Spell Checking

http://www.ldc.upenn.edu/Projects/GALE/Translation/
Integrated Linguistic Resources for Language Exploitation Technologies

http://nlpgrid.seas.upenn.edu/PPDB/eng/
Efficient, Compositional, Order-sensitive n-gram Embeddings

http://www.lirig.upenn.edu/miderig-
Finite Structure Query: A Tool for Querying Syntactically Annotated Corpora

http://acl.ldc.upenn.edu/X/X93/
Statistical Identification of English Loanwords in Korean Using Automatically Generated Training Data

http://projects.ldc.upenn.edu/ace/docs/English-Events-Guidelines_v5.4.3.pdf
Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?c
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
The Creation of a Large-Scale LFG-Based Gold Parsebank

http://www.ling.upenn.edu/histcorpora/annotation
DynamicPower at SemEval-2016 Task 8: Processing syntactic parse trees with a Dynamic Semantics core

http://catalog.ldc.upenn.edu/LDC2007T36
Parsing Chinese Synthetic Words with a Character-based Dependency Model

http://acl.ldc.upenn.edu/P/P02/
Taming Structured Perceptrons on Wild Feature Vectors

http://www.ldc.upenn.edu/Cata
Name Matching between Roman and Chinese Scripts: Machine Complements Human

http://projects.ldc.upenn.edu/ace/docs/English-Relatio
Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction

http://www.ldc.upenn.edu/CallFriend2/
Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription

http://nltk.ldc.upenn.edu:9090
Fangorn: A System for Querying very large Treebanks

http://www.ldc.upenn.edu/ldc/service/index.html
Give me a bug. a framework for a bug report service

http://catalog.ldc.upenn.edu/LDC2006T03
Training a Korean SRL System with Rich Morphological Features

http://languagelog.ldc.upenn.edu/nll/?p=3554
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)

http://www.ldc.upenn.edu/Projects/Transcription/rt-04/RT-04-
Linguistic Resources for Speech Parsing

http://languagelog.ldc.upenn.edu/myl/PennTreebank1995.pdf
What It Takes to Achieve 100% Condition Accuracy on WikiSQL

http://morph.ldc.upenn.edu/ctb/
The First International Chinese Word Segmentation Bakeoff

http://www.cis.upenn.edu/dbikel/download/compare.pl
Automatic Adaptation of Annotations

https://online.ldc.upenn.edu/login.html
The Linguistic Data Consortium Member Survey: Purpose, Execution and Results

http://acl.ldc.upenn.edu/muc7/ne
Influence of Module Order on Rule-Based De-identification of Personal Names in Electronic Patient Records Written in Swedish

http://www.ldc.upenn.edu/Projects/TDT2/
Feature Selection for Trainable Multilingual Broadcast News Segmentation

http://www.ling.upenn.edu/Events/DIGS13/
The Icelandic Parsed Historical Corpus (IcePaHC)

http://oracc.museum.upenn.edu/doc/about/aboutoracc/index.html
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

http://microposts2016.seas.upenn
Twitter Named Entity Extraction and Linking Using Differential Evolution

http://languagelog.ldc.upenn.edu/nll/?p=26254
Illegal is not a Noun: Linguistic Form for Detection of Pejorative Nominalizations

http://projects.ldc.upenn.edu/ace/
Compensating for Annotation Errors in Training a Relation Extractor
Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms
Relation Extraction with Relation Topics
Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia
Automatic Construction of Predicate-argument Structure Patterns for Biomedical Information Extraction
Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution
Augmenting WordNet-based Inference with Argument Mapping
Directional Distributional Similarity for Lexical Expansion
By all these lovely tokens... Merging Conflicting Tokenizations
Learning Entailment Rules for Unary Templates
Medical Relation Extraction with Manifold Models
J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features
The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies
A Study on Dependency Tree Kernels for Automatic Extraction of Protein-Protein Interaction
TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification
Non-Expert Correction of Automatically Generated Relation Annotations
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks
An Efficient Cross-lingual Model for Sentence Classification Using Convolutional Neural Network
Automatically Labeled Data Generation for Large Scale Event Extraction
EventWiki: A Knowledge Base of Major Events
Annotating Relations in Scientific Articles
DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled Training Data
Towards a Balanced Named Entity Corpus for Dutch
Person Name Entity Recognition for Arabic
A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models
Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction
Contextual Preferences
Towards Robust Unsupervised Personal Name Disambiguation
Large Corpus-based Semantic Feature Extraction for Pronoun Coreference
A Semantic Feature for Relation Recognition Using a Web-based Corpus
A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity
Improving Event Detection with Abstract Meaning Representation
MEANTIME, the NewsReader Multilingual Event and Time Corpus
On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds
Joint Event Extraction via Structured Prediction with Global Features
Semi-supervised Relation Extraction with Large-scale Word Clustering
Generating Entailment Rules from FrameNet

http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm
Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis
Towards Bilingual Term Extraction in Comparable Patents
Mining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT
Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation

http://acl.ldc.upenn.edu/E/E99/E99-
Multiword Expression-Aware A* TAG Parsing Revisited

http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Cucerz
Improving search engine retrieval using a compound splitter for Swedish

http://ccat.sas.upenn.edu/plc/
Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus

https://www.seas.upenn.edu/
TDB 1.1: Extensions on Turkish Discourse Bank
Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks
Resource-Lean Modeling of Coherence in Commonsense Stories
The CoNLL-2015 Shared Task on Shallow Discourse Parsing
Automatic Construction of Discourse Corpora for Dialogue Translation
Discourse Relation Sense Classification Using Cross-argument Semantic Similarity Based on Word Embeddings
Implicit Discourse Relation Identification for Open-domain Dialogues

http://repository.upenn.edu/cis_reports/975/
Metric Learning for Graph-Based Domain Adaptation

http://www.cis.upenn.edu/~treebank/home.html
Discourse-level Annotation for Investigating Information Structure
Application of search algorithms to natural language processing
PBIE: A Data Preparation Toolkit Toward Developing a Parsing-Based Information Extraction System
Automatic clustering of collocation for detecting practical sense boundary

http://www.cis.upenn.edu/~dbikel/download/compare.pl
Improvements to Training an RNN parser
Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing

http://projects.ldc.upenn.edu/kbp/data/
Distant Supervision for Relation Extraction with an Incomplete Knowledge Base

http://oracc.museum.upenn.edu
Towards a Linked Open Data Edition of Sumerian Corpora
The Open Linguistics Working Group
Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite
A Report on the Third VarDial Evaluation Campaign

http://www.ldc.upenn.edu/annotation/database/
Issues in the design, construction and use of Language Resources (LR) for Endangered Languages (Els)

http://www.ldc.upenn.edu/Projects/DASL
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development

http://www.ldc.upenn.edu/Projects/TIDES/Translation/
Automatic Evaluation of Machine Translation Based on Rate of Accomplishment of Sub-Goals

http://wave.ldc.upenn.edu/Catalog/CatalogEntry.jsp?
An Evaluation of Adopting Language Model as the Checker of Preposition Usage

http://www.ling.upenn.edu/hist-cor-
The Icelandic Parsed Historical Corpus (IcePaHC)

http://acl.ldc.upenn.edu/eacl2006/ws06
An Effective Method of Using Web Based Information for Relation Extraction

https://www.ldc.upenn.edu/sites/www.ldc.upenn
Exploiting Document Level Information to Improve Event Detection via Recurrent Neural Networks

http://www.ldc.upenn.edu/Catalog/catalogEntry
Automatic Arabic diacritics restoration based on deep nets

http://www.ldc.upenn.edu/Projects
Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report
A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features
Adapting to Trends in Language Resource Development: A Progress Report on LDC Activities

http://www.cis.upenn.edu/mpalmer/
Platform for Full-Syntax Grammar Development Using Meta-grammar Constructs

http://www.ling.upenn.edu/courses/Fall
Using Semantic Distance to Automatically Suggest Transfer Course Equivalencies

http://microposts2016.seas.upenn.edu/
A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities

http://www.cis.upenn.edu/~xtag/
Developping Tools and Building Linguistic Resources for Vietnamese Morpho-syntactic Processing
Tools and resources for Tree Adjoining Grammars

http://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtb-
Chinese Discourse Relation Recognition

https://catalog.ldc.upenn.edu/LDC2003T13
Author Name Disambiguation in MEDLINE Based on Journal Descriptors and Semantic Types

https://catalog.ldc.upenn.edu/LDC2003T12
The International Corpus of Arabic: Compilation, Analysis and Evaluation

http://www.cis.upenn.edu/~xueniwen/
Chinese Sketch Engine and the Extraction of Grammatical Collocations

http://www.ldc.upenn.edu/cgi-bin/aesl/aesl
Issues in Corpus Creation and Distribution: The Evolution of the Linguistic Data Consortium

http://projects.ldc.upenn.edu/LCTL/index.html
Indigenous Languages of Indonesia: Creating Language Resources for Language Preservation

https://catalog.ldc.upenn.edu/docs/LDC2019T05/PDTB3-Annotation-Manual.pdf
Ambiguity in Explicit Discourse Connectives

http://ccat.sas.upenn.edu/gopher/text/religion/biblical/
A Nearest-Neighbor Approach to the Automatic Analysis of Ancient Greek Morphology

https://www.ling.upenn.edu/courses/Fall_
Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation
Bulgarian-English and English-Bulgarian Machine Translation: System Design and Evaluation

http://www.cis.upenn.edu:80/ircs/discourse-
Towards Standards and Tools for Discourse Tagging

http://www.cis.upenn.edu/~ace/
Chinese Sketch Engine and the Extraction of Grammatical Collocations
Semantically Rich Human-Aided Machine Annotation

https://catalog.ldc.upenn.edu/LDC96S36
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

http://www.cis.upenn.edu/ace/
The Proposition Bank: An Annotated Corpus of Semantic Roles
Towards Robust Semantic Role Labeling
Shallow Semantic Parsing using Support Vector Machines
Semantic Role Labeling Using Different Syntactic Views

https://catalog.ldc.upenn.edu/docs/LDC2004T12/SimpleMDE_V5.0.pdf
Automated speech-unit delimitation in spoken learner English

http://www.ldc.upenn.edu/Papers/CIS9901_1999/r
Towards Metadata Interoperability

http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2006T06
Relational Structures and Models for Coreference Resolution

https://webann.ldc.upenn.edu/
RESTful Annotation and Efficient Collaboration

http://projects.ldc.upenn.edu/Transcription/quick-trans
Creation of a New Domain and Evaluation of Comparison Generation in a Natural Language Generation System

https://catalog.ldc.upenn.edu/LDC2005T33
Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

http://www.ldc.upenn.edu/Projects/EARS/Arabic
Developing and Using a Pilot Dialectal Arabic Treebank

http://www.cis.upenn.edu/~mpalmer/isle.kickoff.ppt
OLACMS: Comparisons and Applications in Chinese and Formosan Languages

http://wave.ldc.upenn.edu/Catalog/-
Bilingual Parsing with Factored Estimation: Using English to Parse Korean