EMNLP 2018 Highlights

In this post, I share my notes from the conference on Empirical Methods for Natural Language Processing, which took place in Brussels, Belgium, from October 31th to November 4th 2018. The tutorials, workshops and collocated conferences took place on the first two days. The main conference took place from November 2nd to November 3rd 2018.

Day 1: October 31st, 2018

Tutorial 1: Joint models for NLP

The tutorial was presented by Yue Zhang from Westlake Institute for Advanced Study. The presentation exposes the interest in training joint models for NLP, with two different strategies: statistical and deep learning methods. Concepts in the presentation are explained through examples and papers.

  • Motivation for joint models:
    • In NLP, many tasks are related (e.g. NER, chunking and POS tagging) or pipelined (e.g. tokenization and POS tagging).
    • Joint models allow information exchange between tasks and error propagation reduction.
  • Statistical methods include transition based and graph based methods.
  • Deep learning methods:
    • Transition-based models (Joint Learning / Joint Search): predict the next action to perform in order to get the right prediction.
    • Graph-based models (Joint Learning / Separate search): they consist of multi-task learning methods: cross-task, cross-lingual, cross-domain and cross-standard.
      • Cross-tasks: Bear in mind that not all tasks are mutually beneficial.
      • Cross-lingual: Standard (e.g. multilingual neural transliteration between morphological similar languages), Regularization (e.g. Low resource dependency parsing: transfer encoding parameters), staking (e.g. parameter sharing in Singlish parser), pre-training (e.g. Fine-tuning a pre-trained model on a low resource language) and adversarial training (e.g. cross lingual sequence labeling).
      • Cross-domain: e.g. multi-domain sentiment classification.
      • Cross-standard: e.g. output results corresponding to various treebanks.

Workshop 5: SCAI - Search-Oriented Conversational AI

Link to the workshop page.

Keynote: Towards natural conversation with machines

A keynote presented by Milica Gašić from University of Cambridge, the Dialogue Systems Group.

  • Motivations: Popularity of virtual assistants (1 billion calls / day), expected revenue 16 billion USD in 2021 (Tractica), weakness of current models (unnatural, narrow domain).
  • Challenges:
    • Immediate: Models that keep track of context cannot scale.
    • Short-term: Operate only on predefined databases, manage only short conversations, user specific models are unrealistic, etc.
    • Long-term: Modeling rich conversations, teaching a machine to talk about a piece of text, supporting large/infinite action sets.
  • The presentation shows how machine learning solves two challenges in dialogue systems:
    • Belief tracking: Tracking every concept requires training data for each one: Reuse knowledge for different concepts, use semantically-constrained word vector embeddings and architectures that share parameters.
    • Response experience from a large set of possibilities, using deep reinforcement learning.

Papers presentations

  • A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems.
    • Learning the mapping between natural language expressed needs and keywords expressed needs, using machine translation paradigms.
    • Inject task objectives within the model using reinforcement learning methods.
    • For more details, check out the paper.
  • Research Challenges in Building a Voice-based Artificial Personal Shopper - Position Paper.
    • An artificial personal shopper is a voice based dialog system used to enrich online shopping, by replicating personal shopping agents in a brick and mortar store.
    • This is a difficult task requiring effective knowledge and understanding: e.g. in order to correctly answer the question “Is the Bose headphone compatible with my phone?”, the agent has to know:
      • What type of phone the customer has / refers to?
      • What is the model of the “Bose headphone”?
      • Whether the headphone is compatible with the customer’s phone?
    • Types of data required: Product information, user information and consumer-generated content.
    • Research challenges:
      1. How to process a voice utterance? ASR is not perfect; we need a robust approach that provides a precise response for a noisy utterance.
      2. How to identify relevant response source(s) for a given utterance?
        • Identifying the relevant response source effectively, while minimizing the missing relevant sources for the product domain.
        • Aggregating the results from various sources.
      3. How to identify key phrases in a user’s utterance?
        • Key phrases in a query contribute significantly to the search results.
        • Require an effective approach for identifying key phrases, the product domain and in the noisy voice transcription domain.
      4. How to infer which product / entity the user refers to?
        • Personalized information must be taken into consideration.
        • Incorporate coreference resolution and anaphora for personal shopper products / entities.
      5. How to generate a natural language response?
        • Generating informative and conversational responses.
        • Generating a multi-facet answer to a subjective question that represents multiple opinions.
      6. How to evaluate an end-to-end personal shopper system?
        • Evaluation based on the criteria of both the relevance towards the user information needs and the replication of a humanlike conversation.
    • For more details, check out the paper.

Panel discussion

The panel was animated by Milica Gašić (University of Cambridge), Antoine Bordes (Facebook AI Research), Jason Weston (Facebook) and Bill Dolan (Microsoft Research).

  • Future of conversational AI?
    • Jason Weston: Online learning.
    • Antoine Bordes: End-to-end response specialized bot (task-oriented and domain-oriented), not general purpose chatbots but specialized on domains to insure complex tasks’ completion.
    • Bill Dolan: Discussed the importance of chit-chat in task oriented chatbots.
    • Milica Gašić: Bigger challenges for Conversational AI : Humans don’t talk to systems the way they talk to humans. Need to work on question generation, not only response generation.
  • How to evaluate the quality of dialogue systems?
    • If we use real users, it would be impossible to compare results across groups: Milica Gašić: Compare to statistical dialogue systems and / or use standard data sets.
    • Bill Dolan: For response generation, BLEU still has an added value, despite its known weaknesses.
    • For human bot evaluation: precision, recall, perplexity. BLEU is for MT but not for dialogue: We would need at least 100 references. Plus, for a more personalized experience, we need a more personalized evaluation metric.
  • Difficulty for conversational AI?
    • Learning in a compliant environment with encrypted data.
    • Seq2seq architectures are suspicious; they come from MT (difficult linguistic task).
    • To ensure fluency: combine other signals with linguistic signal / Importance of ML advancements to have better solutions, models, architectures.
    • Dialogue incorporates a lot of linguistic tasks: go beyond text with gestures (like human conversation) and combine multimodal data.
    • Algorithms are inadequate to work with small amounts of data; it is difficult to annotate and collect data frame by frame.
    • Assess algorithms on computer vision and text. Is it possible to do it directly on speech / image instead of text?
  • Search versus Chatbots?
    • The user experience is formally different:
      • Chatbots should ensure full representation of belief / generate questions to clarify things.
      • Search (over documents) incorporates external knowledge in dialog (Information retrieval challenges).
      • Conversational search should be an integral part of the dialogue.

Keynote: Understanding the user in social bot conversations

This presentation was delivered by Mari Ostendorf (University of Washington).

  • Winning solution of Amazon Alexa Prize: Building a bot that converses coherently and engagingly with people on popular topics and current events.
  • Types of conversational AI systems:
    • Virtual assistant: execute commands, answer questions, limited social back and forth.
    • Social bot: 2-way social and information exchange
    • Chat bot: chitchat, limited content to talk about.
  • Importance of user modeling :
    • User interests vary
    • For text-based search, personalized query completion improves over popular queries.
  • Differences in conversational AI Paradigms
Speech/Language understanding Dialog management Back-end application Response generation
Assistant Task intents, form filling. Narrow options and execute tasks. Reward=Timely tasks completion. Structured data bases. Constrained domain.
Social Bot Social and info intents. Learn about interests and make suggestions. Reward=user satisfaction. Unstructured information. Open domain.
  • Constrains:
    • Speech recognition is imperfect.
    • No sentence segmentation or pause information.
    • We cannot assume that two conversations coming from the same device correspond to the same user.
  • Design philosophy of the winning solution:
    • Content driven: daily content mining, large and dynamic content collection, etc.
    • User centric: detecting user sentiments, personality, etc.
  • Language understanding and generation in the winning solution:
    • Language understanding: multidimensional utterance representation, different detectors for sentiments, intent, opinion, topic, etc.
    • Generation: Inform AND ground.
  • Hierarchical dialog management :
    • Master (Global):
      • Rank topics, mini-tasks, content
      • Consider: topic coherence, user engagement, content availability
    • Mini-tasks (Local):
      • Greetings, goodbye, menu.
      • Probe user personality.
      • Discuss a new article, movie.
      • Tell a fact, thought, advice, joke, etc.
  • Content management in a social bot:
    • Crawl online content
    • Filter inappropriate / depressing content.
    • Index interesting and uplifting content.
    • Knowledge graph building.
  • Evaluation of social bots: human + duration of conversations.
  • Diagnostic evaluation:
    • User ratings are expensive, sparse and present a high variance.
    • Define sub-dialog rewards using dialog state information (initialization, topic stack, etc.)
  • Content preferences vary with respect to the user’s personality and emotions.
  • Different user goals: information seeking, opinion sharing, getting to know each other.

Brussels NLP meetup

Understanding Structure in Language through Wikipedia Edits

This presentation, delivered by Manaal Faruqui (Google), explains the paper: WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse.

  • Main idea: using Wikipedia edits to solve downstream NLP problems, including splitting, replacing and rephrasing sentences.
  • Two contributions:
    • Releasing WikiAtomicEdits: a data set of 43 million atomic edits across 8 languages, built from Wikipedia.
    • Analysis of the data set brings the following conclusions: Inserted language differs from general Wikipedia content: Language models trained on edit data presents different aspects of semantics and discourse than models trained on raw, unstructured text.

Link to the paper.

The importance of scaling down: One weird trick to make your NLP projects more successful

This presentation was delivered by Matthew Honnibal (Explosion AI).

  • How to maximize NLP project failure? Imagineer (Unrealistic use cases), Forecast, Outsource data collection (little knowledge about the project’s data requirements), wire (stacking layers with no context understanding), ship (delivering the project as is).
  • Machine learning hierarchy of needs, in order of importance:
    • Understanding how the model will work in the larger application or business process: including tolerance for inaccuracies, latencies, etc.
    • Annotation scheme and corpus construction: categories that will be easy to annotate consistently and easy for the model to learn.
    • Consistent and clean data: attentive annotators, good quality control processes.
    • Model architecture: smart choices, no bugs.
    • Optimization: given by hyper-parameters, initialization tricks, etc.
  • Importance of iteration over data and code:
    • Problem: It is easy to make modeling decisions that are simple, obvious and wrong. Solution: Compose generic models into novel solutions.
    • Problem: Big annotation projects make evidence expensive to collect. Solution: Run your own micro-experiments.
  • A/B evaluation beats BLEU scores blues: don’t settle for proxy metrics and build micro A/B tests.

Rapid NLP Annotation through Binary Decisions, Pattern Bootstrapping and Active Learning

This presentation was delivered by Ines Montani (Explosion AI).

  • Feedback and best practices on annotation pipelines from the developers of prodigy.
  • Annotation needs iteration: can’t expect to define the task correctly the first time.
  • It has to be semi-automatic: boring tasks would never be performed reliably.
  • Binary annotation is key for golden data sets: faster, more reliable and generalizable.
  • Avoid cold starts: use simple models (e.g. rule based) and leverage active learning.

More on the slides.

Large-scale Fact Extraction and Verification

This presentation was delivered by Arpit Mittal (Amazon), to explain the paper: FEVER: a large-scale dataset for Fact Extraction and VERification.

  • Release of FEVER: a data set for claim verification against textual sources. It consists of 185k+ claims extracted from Wikipedia, verified and annotated with 1 of 3 labels: supported, refuted, not enough info.
  • Two objectives: Transform free-form text to structured information AND verify facts (in order to help combat fake news).

Link to the paper.

Transfer learning with language models

This presentation was delivered by Sebastian Ruder (Aylien).

  • Recent Advances: ELMo, ULMFiT, OpenAI Transformer (12 layers, 8 GPUs, 1 month), and BERT (24 layers, 64 TPUs, 4 days / 8 GPUs, 40-70 days). SOTA on a wide range of tasks (cheaper use cases once LM trained).
  • Tips for LM training: bidirectional architecture, choice of loss/auxiliary loss, etc.
  • LMs capture: structure of the text, syntax, meter, hierarchical relation (e.g. coherence, syntax, POS tagging, constituents).
  • No indication of performance ceiling: fine-tuning LMs will become commonplace in NLP.
  • Future directions: True multilingual NLP, more challenging problems (NLU, common sense inference), better interpretation of LMs.

For more details, check the slides.

Day 2: November 1st, 2018

Tutorial 4: Deep Latent Variable Models of Natural Language

The tutorial was presented by Yoon Kim, Sam Wiseman, and Alexander Rush from Harvard NLP Group.

  • Tractable inference over the latent variables: including neural extensions of tagging and parsing models.
  • Non-tractable inference over the latent variables: restricted to continuous latent variables.
  • Overview of recent developments in neural variational inference (e.g. auto-encoders), the challenges they undertake and the best practices.

Link to the resources shared as part of the tutorial here.

Workshop 2: CoNLL

Invited talk: Semantic spaces across diverse languages

Presentation by Asifa Majid (University of York).

  • Natural languages are NOT equally expressible: some concepts exist in one language but not in another, e.g. there are 421 words for snow in Scottish.
  • Culture and language shape our internal representation of concepts / perception of the world:
    • If you ask speakers of different languages to color in different body parts in a picture, the body parts that are associated with each term depend on the language.
    • When most languages lack terms describing specific scents and odors. In contrast, the Jahai have half a dozen terms for different qualities of smell.
  • Interesting to incorporate insights from psycholinguistics in how we model words across languages and different cultures, as cross-lingual embeddings have mostly focused on word-to-word alignment.

Comparing Models of Associative Meaning: An Empirical Investigation of Reference in Simple Language Games

This paper was written by Judy Hanwen Shen, Matthias Hofer, Bjarke Felbo and Roger Levy. It presents the nature of the lexical resources that speakers and listeners can bring to bear in achieving reference through associative meaning alone.

Sequence Classification with Human Attention (special paper award)

This paper was written by Maria Barrett, Joachim Bingel, Nora Hollenstein, Marek Rei and Anders Søgaard. It highlights the fact that human attention provides a good inductive bias on many attention functions in NLP. They use estimated human attention, by eye tracking, corpora to regularize the attention in RNNs.

Tutorial 6: Deep Chit-Chat: Deep Learning for Chatbots

This tutorial was presented by Wei Wu (Microsoft) and Rui Yan (Peking University).

  • Motivation:
    • Hot subject both in academia (e.g. the Alexa Prize) and industry (Virtual Assistants: MS Cortana, Apple Siri, Baidu Duer / Smart speakers: Amazon Alexa, Google Home / Social Bot and customer service)
    • 54% of conversations in chatbots are chit-chat (Study performed in Japan).
    • Chit-chat is fundamentally different from goal oriented chatbots: open domain conversations should be relevant and diverse to keep the user engaged.
    • Services (e.g. Recommendation, search, Q&A) are connected via chat in chatbots.
    • Task oriented vs Non-task oriented chatbots.
    • Non task oriented chatbots are based on retrieval based methods or generation based methods.
  • Deep learning for NLP :
    • Word embedding: Word2Vec (CBOW, Skip-Gram), Glove, Fasttext.
    • Sentence embedding.
    • Application in dialogue modeling (seq2seq with attention).
  • Retrieval based chatbots:
    • Message-Response matching for single turn response selection.
    • Context-Response matching for multi-turn response selection.
    • Merging research directions: matching with better representations, matching with unlabeled data.
  • Generation based chatbots:
    • Single-turn generation: seq2seq, attention, bidirectional (MT inspiration).
    • Multi-turn generation: Hierarchical context modeling.
  • Many other subjects include: diversity in conversations, content introducing, topics and emotion in conversations, persona in chat, reinforcement and adversarial learning in conversations.
  • Evaluation metrics for conversations: Weak correlation between human and automatic evaluations (BLEU, information, e.g. entropy and perplexity, diversity, average response length, ADEM, RUBER).
  • Future trends: Learning methods and representations, reasoning in dialogues (context based, knowledge, and common sense), X-grounded dialogues (labels, multimodal, texts), evaluation and benchmark data sets.

Day 3: November 2nd, 2018

Opening remarks

Presentation of the conference key numbers, program and organization details.

  • EMNLP 2018 is HUGE:
    • 2.1k+ submitted papers (+46% increase over 2017): 549 accepted papers: 24.6% acceptance rate.
    • 79 demo submissions (40% increase over 2017): 29 accepted demos (40% acceptance rate).
    • 2.5K attendees: >100% increase over 2017.
  • EMNLP in numbers:
    • 14 workshops, 6 tutorials, 351 long paper presentations, 198 short paper presentations, 10 TACL paper presentations, 29 demo presentations.
    • 60 area chairs, 1436 long/short papers reviewers, 150 demo paper reviewers.
    • Gender diversity: 74.7% male reviewers, 68.3% male area/program chairs.
    • 4 plenaries: 3 keynotes + Best papers awards.
    • 11 parallel sessions: 4 oral sessions, 1 poster session.
  • Companies: Bloomberg, Google, Facebook, Salesforce, Apple, ASAPP, Amazon, Baidu, Grammarly, Naver Labs Europe, Ku Leuven, FWO, Megagon Labs, Huawei, eBay, Microsoft, Naver Line, Oracle, PolyAI, Sogou, YITI, Duolingo, Nuance, Shannon.ai, NextAI, textkernel, Allen Institute for AI, text IQ, etc.

Keynote I: Truth or Lie? Spoken Indicators of Deception in Speech

This keynote was presented by Julia Hirschberg (Columbia University). The key ideas are:

  • Creation of a cross-cultural deception corpus: 340 subjects balanced by native language and gender.
  • Hard to control potential indicators of deception.
  • Features such as gender, ethnicity, culture and personality should be included in the models.
  • Other features include: text-based, speech-based, syntactic features, personality features, etc.
  • Classifiers: DNNS, BLSTMs, Hybrid methods.
  • Conclusions include: Both humans and classifiers classify long answers as lies.
  • The project is sponsored by the US government.

Find more on the slides.

1A: Social Applications I

  • Privacy-preserving Neural Representations of Text: This paper deals with adversarial privacy attacks on deep learning NLP systems. The idea is to attack hidden layers in a NN to get information about the input. The authors investigate the tradeoff between the utility and the privacy of the hidden representations and suggest defense methods based on the alteration of training objectives.
  • Adversarial Removal of Demographic Attributes from Text Data: In this paper, the authors show that demographic information is encoded in intermediate representations and can alter classifiers’ decisions. They suggest removing these attributes using adversarial training, then retraining high level classifiers. They conclude that adversarial training is not enough to reach invariant representation of sensitive attributes.
  • DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning: To detect fake news, the trend is to use external sources, which requires extensive feature modeling. This paper suggests a NN, which judiciously aggregates external knowledge and provides human readable explanations for the model’s results.
  • It’s going to be okay: Measuring Access to Support in Online Communities: This paper analyzes the accessibility to online support, when revealing one’s gender. The authors present a data set and a method to assess accessibility to support on online platforms. Moreover, they suggest a strategy to infer gender from text and usernames.
  • Detecting Gang-Involved Escalation on Social Media Using Context: This paper presents a method to detect expressions of violence and grief on social media. The authors use a domain specific unlabeled corpus and present a contextual and emotional representation of a users’ generated content. They made the collected data available upon request.

2C: Multilingual Methods I

  • Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging: The paper introduces a cross-lingual neural POS tagger, by leveraging annotation projection, instance selection, tag dictionaries, morphological lexicons and distributed representations, without access to any golden data.
  • Unsupervised Bilingual Lexicon Induction via Latent Variable Models: Many works on bilingual lexicons extraction are based on aligning monolingual word embeddings. This paper suggests using latent variable models and adversarial training to do so. The method was tested on several language pairs.
  • Learning Unsupervised Word Translations without Adversaries: The SOTA of unsupervised bilingual dictionaries induction use adversarial methods (thus suffer from instability and hyper parameters sensitivity). This paper presents a statistical method to unsupervised dictionaries extraction with adversarial training.
  • Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification: The problem with joint models is that shared networks learn the majority data set features. To solve this, the paper suggests training language-specific task adversarial networks and task-specific language adversarial networks. The goal is to purge language specific and task specific dependencies. The method is tested on 2 languages and 3 tasks.
  • Surprisingly Easy Hard-Attention for Sequence to Sequence Learning: This paper shows that beam approximation of joint distribution between attention and output is efficient for seq2seq learning. The method combines sharp focus in hard attention and implementation ease of soft attention.

3A: Machine Translation I

  • SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation: This paper presents a data augmentation technique for NMT. It consists of randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies.
  • Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder: This paper presents a word by word translation strategy that uses cross-lingual word embeddings. For context encoding, they suggest using a language model and for word reordering, they use a denoising autoencoder. An advantage of this method is not using iterative approaches, such as back-translation.
  • Decipherment of Substitution Ciphers with Neural Language Models: This paper proposes a “beam search algorithm that scores the entire candidate plaintext at each step of the decipherment using a neural LM”. The authors suggest augmenting “beam search with a novel rest cost estimation that exploits the prediction power of a neural LM”.
  • Rapid Adaptation of Neural Machine Translation to New Languages: The main idea about this paper is using high resource languages to train what the authors call “seed models”, then fine-tuning on low resource languages. They suggest jointly training similar languages to avoid overfitting on LRLs. Findings include that multilingual models perform well on LRLs, even when the training data doesn’t present examples in these languages.
  • Compact Personalized Models for Neural Machine Translation: This paper shows that it is possible to freeze the majority of the models parameters when adapting a translation model, with no reduction in performance. To ensure the sparsity of the tensors’ offsets, the authors suggest using group lasso regularization.

4A: Language Models

  • Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement: This paper presents an iterative algorithm that considers any sequence generation as a latent variable model and refines it using a denoising approach. The algorithm was tested using the transformer model on two different tasks, machine translation and image caption generation.
  • Large Margin Neural Language Model: In this paper, the authors suggest using a large margin criterion in LM training instead of perplexity minimization. The main idea is to enlarge the margin between good vs bad sequences in a task-specific sense.
  • Targeted Syntactic Evaluation of Language Models: This is a data set paper. The authors present a benchmark data set to evaluate the “grammaticality” of the predictions of a LM. They conclude that there is considerable room for improvement in syntax learning using LMs.
  • Rational Recurrences: This paper shows that RNNs have a connection to WFSAs, the same as CNNs (previously demonstrated). The authors try to transfer intuitions from classical models like WFSAs into neural network architectures and design.
  • Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling: LMs are commonly used for various NLP tasks, due to the huge size of the available training data. However, large LMs present heavy computation constraints, during the inference. Thus, the paper presents an approach to LM compression, which keeps only useful information for a specific task.

Day 4: November 3rd, 2018

5C: IR / Text Mining

  • Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment: This paper introduces a “word-by-word alignment framework that measures the compatibility of embeddings between word pairs, and then adaptively accumulates these alignment features with a simple yet effective aggregation function”.
  • Learning Context-Sensitive Convolutional Filters for Text Processing: In this paper, the authors present an approach that uses a small meta network to learn context-aware convolutional filters for text processing. This network encodes the contextual information in “input-aware” filters.
  • Deep Relevance Ranking Using Enhanced Document-Query Interactions: This paper presents a document relevance ranking model, which is obtained by augmenting DRMM with rich context encodings.
  • Learning Neural Representation for CLIR with Adversarial Framework: This paper presents a new cross-lingual IR model, built using adversarial learning.
  • AD3: Attentive Deep Document Dater: This paper presents an attention-based neural document dating system which utilizes both context and temporal information in documents.

6D: Multilingual Methods II

  • Sentence Compression for Arbitrary Languages via Multilingual Pivoting: This paper leverages bilingual corpora to perform sentence compression. The method consists of translating a source string into a foreign language and then back-translating it into the source language while controlling the translation length.
  • Unsupervised Cross-lingual Transfer of Word Embedding Spaces: This paper deals with learning mapping functions between embedding spaces of different languages. It proposes an unsupervised learning approach that does not require any cross-lingual labeled data. Instead, the authors suggest optimizing the transformation functions in both directions simultaneously based on distributional matching and the minimization of the back-translation losses.
  • XNLI: Evaluating Cross-lingual Sentence Representations: This is a data set paper: The authors open source a data set for NLI in 15 languages and present different baselines to perform cross-lingual NLI.
  • Joint Multilingual Supervision for Cross-lingual Entity Linking: This paper shows the limitations of XEL and the added value due to multilingual joint learning in low resource settings.
  • Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition: This paper presents a decipherment algorithm with diverse clues to decipher the networks for accurate text stream alignment.

7A: Dialogue II

  • Session-level Language Modeling for Conversational Speech: This paper generalizes language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.
  • Towards Less Generic Responses in Neural Conversation Models: A Statistical Re-weighting Method: Seq2seq generation models tend to generate generic/dull responses. Thus, the authors of this paper propose a statistical re-weighting method that assigns different weights for the multiple responses of the same query, and trains the standard neural generation model with the weights.
  • Training Millions of Personalized Dialogue Agents: This paper suggests increasing the engagement level in conversational systems by making them more personalized. The authors open source a dataset of 5 million personas and 700 million persona-based dialogues and show that using personas improves the performance on end-to-end systems.
  • Towards Universal Dialogue State Tracking: The paper introduces “StateNet”, a universal dialogue state tracker: It is independent of the number of values, shares parameters across all slots, and uses pre-trained word vectors instead of explicit semantic dictionaries.
  • Semantic Parsing for Task Oriented Dialog using Hierarchical Representations: Dialog systems usually deal with one query (intent) at a time. In this paper, the authors suggest a hierarchical annotation scheme for semantic parsing that allows the representation of compositional queries. Furthermore, they release a dataset of 44k annotated queries.

Keynote II: Understanding the News that Moves Markets

This keynote was presented by Gideon Mann (Bloomberg). The key ideas are:

  • The talk deals with financial technology, the news that moves the markets, computer things, etc.
  • The importance of real time NLP on news data to understand and influence markets: speed and precision requirements.
  • News generation on-demand.

Find more on the slides.

8A: Text Categorization

  • Zero-shot User Intent Detection via Capsule Neural Networks: This paper aims to solve the annotation problem in intents’ detection, in order to work on intents where no labeled utterances are available. Thus, the authors propose two capsule-based architectures: INTENT-CAPSNET (extracts semantic features from utterances and aggregates them to discriminate existing intents) and INTENTCAPSNET-ZSL (gives INTENT-CAPSNET the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents).
  • Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts: This paper suggests a hierarchical sequential labeling network to make use of the contextual information within surrounding sentences and help classify the current sentence.
  • Investigating Capsule Networks with Dynamic Routing for Text Classification: This paper shows that capsule networks exhibit significant improvement when transferring single-label to multi-label text classification over the competitors.
  • Topic Memory Networks for Short Text Classification: This paper proposes topic memory networks for short text classification with a novel topic memory mechanism to encode latent topic representations indicative of class labels.
  • Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces: The authors of this paper perform a fine-grained evaluation to explain how state-of-the-art methods perform on infrequent labels. They also develop few- and zero-shot methods for multi-label text classification when there is a known structure over the label space.

Day 5: November 4th, 2018

9B: Sentiment I

  • Sentiment Classification towards Question-Answering with Hierarchical Matching Network: This paper proposes a novel task/method to address QA sentiment analysis. The authors create a high-quality annotated corpus with specially-designed annotation guidelines for QA-style sentiment classification.
  • Cross-topic Argument Mining from Heterogeneous Sources: The authors of this paper propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. They also open source annotations for over 25k instances covering 8 controversial topics.
  • Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised: This paper combines two weakly supervised components to identify salient opinions and form extractive summaries from multiple reviews.
  • CARER: Contextualized Affect Representations for Emotion Recognition: This paper proposes a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text.
  • [TACL] Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification: This paper proposes an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.

10A: Question Answering III

  • Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings: The authors of this paper address jointly two tasks for Question Answering in community forums: given a new question, find related existing questions AND find relevant answers to this new question.
  • What Makes Reading Comprehension Questions Easier: The authors of this paper investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice).
  • Commonsense for Generative Multi-Hop Question Answering Tasks: The authors of this paper focus on a challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer.
  • Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text: The authors of this paper investigate QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus.
  • A Nil-Aware Answer Extraction Framework for Question Answering: In this paper, the authors focus on developing QA systems that can extract an answer for a question if and only if the associated passage contains an answer.

Keynote III: The Moment of Meaning and the Future of Computational Semantics

This keynote was presented by Johan Bos (University of Groningen). The key ideas are:

  • Motivations include: Future language technology requires semantic interpretation (Explainable NLP) and better MT evaluation.
  • Integrating lexical with formal semantics, language neutral semantic annotation and multilingual models.
  • Future directions for semantics:
    • Computational semantics: more resources for inference, explainable NLP and thinking more “multilingual”.
    • Adding meaning to MT: outperform BLEU and verify translation with semantic parsing.

Find more on the slides.

Best Paper Awards

  • Best Long papers:
    • Linguistically-Informed Self-Attention for Semantic Role Labeling: In this paper, the authors present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL.
    • Phrase-Based & Neural Unsupervised Machine Translation: This paper investigates how to learn to translate when having access to only large monolingual corpora in each language. The authors propose two model variants, a neural and a phrase-based model.
  • Best short paper:
    • How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks: This paper gives baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well.
  • Best resource paper:
    • MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling: This paper introduces the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics.

Other resources

  • Link to the “Black Box NLP” tutorial.
  • Link to the “Writing Code for NLP Research” tutorial.

Blog posts:

This article was shared privately on 2018-12-05 and publicly on 2024-05-19.