1 Star 0 Fork 0

Jeremy Lee / pytorch-hubhub

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
huggingface_pytorch-pretrained-bert_gpt.md 4.40 KB
一键复制 编辑 原始数据 按行查看 历史
layout background-class body-class title summary category image author tags github-link featured_image_1 featured_image_2 accelerator order
hub_detail
hub-background
hub
GPT
Generative Pre-Training (GPT) models for language understanding
researchers
huggingface-logo.png
HuggingFace Team
nlp
https://github.com/huggingface/pytorch-pretrained-BERT.git
GPT1.png
no-image
cuda-optional
10

Model Description

GPT was released together with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford et al at OpenAI. It's a combination of two ideas: Transformer model and large scale unsupervised pre-training.

Here are three models based on OpenAI's pre-trained weights along with the associated Tokenizer. It includes:

  • openAIGPTModel: raw OpenAI GPT Transformer model (fully pre-trained)
  • openAIGPTLMHeadModel: OpenAI GPT Transformer with the tied language modeling head on top (fully pre-trained)
  • openAIGPTDoubleHeadsModel: OpenAI GPT Transformer with the tied language modeling head and a multiple choice classification head on top (OpenAI GPT Transformer is pre-trained, the multiple choice classification head is only initialized and has to be trained)

Requirements

Unlike most other PyTorch Hub models, GPT requires a few additional Python packages to be installed.

pip install tqdm boto3 requests regex ftfy spacy

Example

Here is an example on how to tokenize the text with openAIGPTTokenizer, and then get the hidden states computed by openAIGPTModel or predict the next token using openAIGPTLMHeadModel. Finally, we showcase how to use openAIGPTDoubleHeadsModel to combine the language modeling head and a multiple choice classification head.

### First, tokenize the input
#############################
import torch
tokenizer = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTTokenizer', 'openai-gpt')

#  Prepare tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])


### Get the hidden states computed by `openAIGPTModel`
######################################################
model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTModel', 'openai-gpt')
model.eval()

# Compute hidden states features for each layer
with torch.no_grad():
	hidden_states = model(tokens_tensor)


### Predict the next token using `openAIGPTLMHeadModel`
#######################################################
lm_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTLMHeadModel', 'openai-gpt')
lm_model.eval()

# Predict all tokens
with torch.no_grad():
	predictions = lm_model(tokens_tensor)

# Get the last predicted token
predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
assert predicted_token == '.</w>'


### Language modeling and multiple choice classification `openAIGPTDoubleHeadsModel`
####################################################################################
double_head_model = torch.hub.load('huggingface/pytorch-pretrained-BERT', 'openAIGPTDoubleHeadsModel', 'openai-gpt')
double_head_model.eval() # Set the model to train mode if used for training

text_bis = "Who was Jim Henson ? Jim Henson was a mysterious young man"
tokenized_text_bis = tokenizer.tokenize(text_bis)
indexed_tokens_bis = tokenizer.convert_tokens_to_ids(tokenized_text_bis)
tokens_tensor = torch.tensor([[indexed_tokens, indexed_tokens_bis]])
mc_token_ids = torch.LongTensor([[len(tokenized_text)-1, len(tokenized_text_bis)-1]])

with torch.no_grad():
    lm_logits, multiple_choice_logits = double_head_model(tokens_tensor, mc_token_ids)

Requirement

The model only support python3.

Resources

Python
1
https://gitee.com/Jelmy/pytorch-hubhub.git
git@gitee.com:Jelmy/pytorch-hubhub.git
Jelmy
pytorch-hubhub
pytorch-hubhub
master

搜索帮助