English | 简体中文
PaddleNLP is a powerful NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
Wide-range NLP Task Support
High Performance Distributed Training
More information about PaddlePaddle installation please refer to PaddlePaddle's Website.
pip install --upgrade paddlenlp
We provide 15 network architectures and 67 pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high quality Chinese pretrained model developed by other organizations. We also welcome developer to contribute your Transformer models! 🤗
from paddlenlp.transformers import * ernie = ErnieModel.from_pretrained('ernie-1.0') ernie_gram = ErnieGramModel.from_pretrained('ernie-gram-zh') bert = BertModel.from_pretrained('bert-wwm-chinese') albert = AlbertModel.from_pretrained('albert-chinese-tiny') roberta = RobertaModel.from_pretrained('roberta-wwm-ext') electra = ElectraModel.from_pretrained('chinese-electra-small') gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
PaddleNLP also provides unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.
import paddle from paddlenlp.transformers import ErnieTokenizer, ErnieModel tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0') text = tokenizer('natural language understanding') # Semantic Representation model = ErnieModel.from_pretrained('ernie-1.0') pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']])) # Text Classificaiton and Matching model = ErnieForSequenceClassifiation.from_pretrained('ernie-1.0') # Sequence Labeling model = ErnieForTokenClassifiation.from_pretrained('ernie-1.0') # Question Answering model = ErnieForQuestionAnswering.from_pretrained('ernie-1.0')
For more pretrained model usage, please refer to Transformer API
from paddlenlp.datasets import load_dataset train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
For more dataset API usage please refer to Dataset API.
from paddlenlp.embeddings import TokenEmbedding wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en") wordemb.cosine_sim("king", "queen") >>> 0.77053076 wordemb.cosine_sim("apple", "rail") >>> 0.29207364
TokenEmbedding usage, please refer to Embedding API
Please find more API Reference from our readthedocs.
PaddleNLP provide rich application examples covers mainstream NLP task to help developer accelerate problem solving.
Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio
What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.
To connect with other users and contributors, welcome to join our Slack channel.
Join our QQ Technical Group for technical exchange right now! ⬇️
For more details about our release, please refer to ChangeLog
PaddleNLP is provided under the Apache-2.0 License.
：Code submit frequency
：React/respond to issue & PR etc.
：Well-balanced team members and collaboration
：Recent popularity of project
：Star counts, download counts etc.