LingoRobo

To view the demo website, click this link.

CSCI 527 Applied Machine Learning for Games

Enabling AI agents following natural language instructions is critical for games. Imagine when you are playing Pokemon games, and your Pikachu can strike thunder to its opponents when you . That would be so much fun!! Yet, the technology is not there yet. This project aims to bridge the gap between natural language instructions and agents’ actions.

Project Owners:

*equal contribution

Workload Distribution:

Kung-Hsiang Steeve Huang: knowledge incorporation, project website.
Shikhar Singh: multimodal transformer, demo video.

Task

We develop our system on the ALFRED dataset (Action Learning From Realistic Environments and Directives). ALFRED is a benchmark for challenges AI agents to a map between natural language instructions and sequences of agents.

Methods

We approach the ALFRED dataset in two directions:

Commonsense knowledge incorporation (Steeve)
Multimodal transformer. (Shikhar)

Commonsense knowledge incorporation

ConceptNet is a multi-lingual open source knowledge base that contains rich common sense knowledge. To integrate knowledge into text-based model, we need to fetch the most relevant sub-graph for each instruction. First, Spacy Matcher is used for grounding mentions in instructions to concepts in ConceptNet. Pairwise relations between grounded concepts are found by running shortest path algorithm implemented by NetworkX. Tokens in instructions are linked to corresponding concepts. Following these steps, we obtain a sub-graph for each instruction. Graph Convolutional Networks (GCN) is used for updating the token embeddings on each sub-graph. We used the PyTorch Geometric implementation of GCN.

LingoRoboDemo