Using GPT-3 for Named Entity Recognition

Ricky Ma
5 min readSep 16, 2020

--

Algomo is an easy to use, low code multilingual chatbot engine. One of our key objectives is to help users create a chatbot with as little data as possible. Our chatbot language model uses zero-shot learning for our classification algorithm, and we wanted to achieve something similar for named entity recognition (NER)- a common task for chatbots. NER helps to make the responses more personalized, but current approaches require massive amounts of data. This article explains how we used OpenAI’s GPT-3 for NER, without the need to train a NER model at all.

OpenAI recently released GPT-3 for developers to experiment with. Trained on a whopping 175 billion parameters, GPT-3 is a serious upgrade from the organization’s previous language model, GPT-2, with “only” 1.5 billion parameters. By now, you have likely seen various mind-boggling tools made with GPT-3 that seem too cool, too funny, too human, to be real: things like designing layouts with simple English or Emoji summaries of movies. While these examples are definitely fun, eye-catching, and impressive, there remain many practical use cases of GPT-3 that have not been explored — applications that can solve real problems while also pushing the boundaries of natural language processing.

Named entity recognition (NER) is one such NLP task. It involves extracting key information, called entities, from blocks of text. These entities are words or series of words that are classified into categories (i.e. “person”, “location”, “company”, “food”). Hence, the two main parts of NER are entity detection and entity categorization.

Use Cases

The main purposed of NER is information extraction. It is used to summarize a piece of text to understand the subject, theme, or other important pieces of information. Some interesting use cases for NER include:

  • Content recommendation: Extracting entities from articles or media descriptions and recommending content based on entity similarity
  • Chatbot optimization and customer support: Extracting some key facts from a users query that would make the response more personalized
  • Product reviews and sentiment analysis: finding product names and their relative sentiment to improve sales and marketing
  • Email inbox optimization: Notifying users of flight times, meeting locations, credit card charges, and more without having to open emails
  • Financial market analysis: Extracting key figures from financial news articles that can be used as signals for trading algorithms and market intelligence

Current Approaches

Traditionally, tremendous amounts of data are needed to develop a functional NER model. Current datasets used for NER are very limited. They are either too general (i.e. only names, locations, times, and organizations are labeled), or too specific (i.e. only product brands, model names, category names, are labeled). Due to the limitations from the datasets, the NER models developed are also quite limited, since they can only detect categories they were trained on. You can explore some of these datasets here:

A number of pre-trained NER models are provided by popular open-source NLP libraries. As you can expect, these pre-trained models contain only generic entities.

  • NLTK uses a tokenization and part-of-speech approach. Text is broken down into tokens are their part-of-speech is tagged. A parser then chunks the tokens based on their POS tags to find entities. Only three entity types are provided: person, organization, or GPE.
  • Spacy uses a shallow feedforward neural network with a single hidden layer to classify text into entity categories. It incorporates a feature engineering technique, called bloom embedding, to merge neighboring features and give each word its unique context. Spacy’s NER model contains 18 entity types, the most out of the three.
  • Stanford Core NLP uses a probabilistic model, called a conditional random field (CRF). Given a sequence of tokenized text, the CRF, combined with the Viterbi algorithm, decides the most probable sequence of tags. The model has a maximum of 7 entity types.

Using GPT-3 for NER

GPT-3 shines new light on named entity recognition, providing a model that is adaptive to both general text and specialized documents. Using a simple primer text to incorporate few-shot learning, we were able to have GPT-3 tag the entities it finds in various multilingual documents and text datasets. This labeling is done automatically, without the need for manual definition. If you have a general idea of the input text, for example, a restaurant menu, you can define the model to find very specific entities like “side dish”, “condiment”, “entree”, or “beverage”. Of course, you can also extract more traditional entities like “person”, “place”, or “time”. We built a simple web-app to demonstrate this application that you can explore for yourself here: https://ner.algomo.com/.

Tagging

The tags for entities can be manually defined: just input the tags you would like the app to search for into the input bar (i.e. person, location, time). The resulting output only contains the tags that were defined. For automatic tag generation, the input box can just be left blank.

Temperature

The temperature slider controls randomness. Hence, lowering the temperature results in less random completions. As temperature approaches zero, the model will become deterministic and repetitive. Given the task, we do not want a very “creative” model, as this creates issues in the output. I’ve found that a temperature of around 0.5 works well. When the model is more creative, you may notice issues in the tagging where subattributes are separated from main attributes (i.e. location, city). This can be combatted with the manual tagging feature, for example, by only allowing “location” and not “city”.

Input and Upload

There are two ways to input text for NER tagging. A simple textbox is provided for quick text input, copy-and-pasting, and experimentation. Datasets can also be uploaded for entity tagging. Simply select your CSV or Excel format dataset and use the dropdown to select the column of text you would like to tag.

Output and Analysis

The output table contains values (entities) found by GPT-3 and their respective tags. This table can be exported with the click of a button. The next tab contains a simple histogram showing the relative counts of each tag found.

--

--

Ricky Ma

Ideas about (the future of) A.I., machine learning, cognitive science, philosophy of mind, and more. https://ricky-ma.github.io/