最新NLP相關最新資源整理分享

本資源整理了2020年關於自然語言處理相關的所有主題的最新最全的資源,涉及自然語言處理相關的資源(相關的會議、經典論文、資料集、相關NLP任務及最新進展),NLP相關的資訊、討論會,優秀的部落格資源,NLP相關任務最新benchmarks,NLP研究相關研究資源、工業資源,語言識別相關資源、主題建模資源等等。

資源整理自網路,源地址:

https://github。com/ivan-bilan/The-NLP-Pandect

Compendiums and awesome lists on the topic of NLP:

•Awesome NLP by keon [GitHub ~10k stars]

•Speech and Natural Language Processing Awesome List by elaboshira [GitHub ~2k stars]

•Awesome Deep Learning for Natural Language Processing (NLP) [GitHub ~1k stars]

•Text Mining and Natural Language Processing Resources by stepthom [GitHub ~300 stars]

•Made with ML List by madewithml。com

•Brainsources for #NLP enthusiasts by Philip Vollet

NLP Conferences, Paper Summaries and Paper Compendiums:

•NLP top 10 conferences Compendium by soulbliss [GitHub ~300 stars]

•NLP Paper Summaries by dair-ai [GitHub ~1k stars]

•Curated collection of papers for the NLP practitioner [GitHub ~1k stars]

•Papers on Textual Adversarial Attack and Defense [GitHub ~500 stars]

•NLP Conferences Calendar

•ICLR 2020 Trends

•The Most Influential NLP Research of 2019

•Recent Deep Learning papers in NLU and RL by Valentin Malykh [GitHub ~300 stars]

NLP Progress and NLP Tasks:

•NLP Progress by sebastianruder [GitHub ~16k stars]

•NLP Tasks by Kyubyong [GitHub ~3k stars]

•Reading list for Awesome Sentiment Analysis papers by declare-lab [GitHub ~100 stars]

•Awesome Sentiment Analysis by xiamx [GitHub ~800 stars]

NLP Datasets:

•NLP Datasets by niderhoff [GitHub ~4k stars]

•Big Bad NLP Database

•25 Best Parallel Text Datasets for Machine Translation Training

•UWA Unambiguous Word Annotations - Word Sense Disambiguation Dataset

•20 Best German Language Datasets for Machine Learning

Word and Sentence embeddings:

•Awesome Embedding Models by Hironsan [GitHub ~1。3k stars]

•Awesome list of Sentence Embeddings by Separius [GitHub ~1。5k stars]

•Awesome BERT by Jiakui [GitHub ~1。5k stars]

Notebooks, Scripts and Repositories

•The Super Duper NLP Repo [Website, 2020]

•NLP Highlights [Years: 2017 - now, Status: active]

•TWIML AI [Years: 2016 - now, Status: active]

•Data Hack Radio [Years: 2018 - now, Status: active]

•The Super Data Science Podcast [Years: 2016 - now, Status: active]

•AI Game Changers [Years: 2020 - now, Status: active]

•NLP News by Sebastian Ruder

•dair。ai Newsletter by dair。ai

•Papers with Code

•The Batch by deeplearning。ai

•Paper Digest by PaperDigest

•NLP Cypher by QuantumStat

•NLP Zurich

•Yannic Kilcher

•HuggingFace

•Kaggle Reading Group

•Rasa Paper Reading

•Stanford CS224N: NLP with Deep Learning

•ML Explained - A。I。 Socratic Circles - AISC

•Deeplearning。ai

•Machine Learning Street Talk

•SQuAD - Stanford Question Answering Dataset (SQuAD)

•GLUE - General Language Understanding Evaluation (GLUE) benchmark

•SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks

•XTREME - Massively Multilingual Multi-task Benchmark

•decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models

•RACE - ReAding Comprehension dataset collected from English Examinations

  General

•A Recipe for Training Neural Networks by Andrej Karpathy [Keywords: research, training, 2019]

Embeddings

Repositories

•Pre-trained ELMo Representations for Many Languages [GitHub ~1k stars]

•sense2vec - Contextually-keyed word vectors [GitHub ~1k stars]

•wikipedia2vec [GitHub ~500 stars]

•StarSpace [GitHub ~3k stars]

•fastText [GitHub ~21k stars]

Blogs

•Language Models and Contextualised Word Embeddings by David S。 Batista [Blog, 2018]

•An Essential Guide to Pretrained Word Embeddings for NLP Practitioners by AnalyticsVidhya [Blog, 2020]

•Polyglot Word Embeddings Discover Language Clusters [Blog, 2020]

•The Illustrated Word2vec by Jay Alammar [Blog, 2019]

Transformer-based Architectures

General

•The Transformer Family by Lilian Weng [Blog, 2020]

•Keeping up with the BERTs: a review of the main NLP benchmarks by Manuel Tonneau [Blog, 2020]

•Playing the lottery with rewards and multiple languages - about the effect of random initialization [ICLR 2020 Paper]

•Attention? Attention! by Lilian Weng [Blog, 2018]

•the transformer … “explained”? [Blog, 2019]

•Attention is all you need; Attentional Neural Network Models by Łukasz Kaiser [Talk, 2017]

•Understanding and Applying Self-Attention for NLP [Talk, 2018]

Transformer

•The Annotated Transformer by Harvard NLP [Blog, 2018]

•The Illustrated Transformer by Jay Alammar [Blog, 2018]

•Illustrated Guide to Transformers by Hong Jing [Blog, 2020]

•Sequential Transformer with Adaptive Attention Span by Facebook。 Blog [Blog, 2019]

•Evolution of Representations in the Transformer by Lena Voita [Blog, 2019]

•Reformer: The Efficient Transformer [Blog, 2020]

•T5: the Text-To-Text Transfer Transformer [Blog, 2020]

•Longformer — The Long-Document Transformer by Viktor Karlsson [Blog, 2020]

•TRANSFORMERS FROM SCRATCH [Blog, 2019]

•Universal Transformers by Mostafa Dehghani [Blog, 2019]

BERT

•A Visual Guide to Using BERT for the First Time by Jay Alammar [Blog, 2019]

•The Dark Secrets of BERT by Anna Rogers [Blog, 2020]

•Understanding searches better than ever before [Blog, 2019]

•Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework [Blog, 2019]

•SemBERT - Semantics-aware BERT for Language Understanding [Github ~100 stars]

GPT-family

General

•The Illustrated GPT-2 by Jay Alammar [Blog, 2019]

•The Annotated GPT-2 by Aman Arora

•OpenAI’s GPT-2: the model, the hype, and the controversy by Ryan Lowe [Blog, 2019]

•How to generate text by Patrick von Platen [Blog, 2020]

GPT-3

•Zero Shot Learning for Text Classification by Amit Chaudhary [Blog, 2020]

•GPT-3 A Brief Summary by Leo Gao [Blog, 2020]

•GPT-3, a Giant Step for Deep Learning And NLP by Yoel Zeldes [Blog, June 2020]

•GPT-3 Language Model: A Technical Overview by Chuan Li [Blog, June 2020]

•OpenAI API - API Demo to use GPT-3 for commercial applications

Other

•What is Two-Stream Self-Attention in XLNet by Xu LIANG [Blog, 2019]

•Visual Paper Summary: ALBERT (A Lite BERT) by Amit Chaudhary [Blog, 2020]

•Turing NLG by Microsoft

•Multi-Label Text Classification with XLNet by Josh Xin Jie Lee [Blog, 2019]

•ELECTRA [GitHub ~1k stars]

Distillation, Pruning and Quantization

•Distilling knowledge from Neural Networks to build smaller and faster models by FloydHub [Blog, 2019]

•David over Goliath: towards smaller models for cheaper, faster, and greener NLP by Manuel Tonneau [Blog, 2020]

Automated Summarization

•PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization by Google AI [Blog, June 2020]

  Transformer-based Architectures

•Why BERT Fails in Commercial Environments by Intel AI [Blog, 2020]

•Fine Tuning BERT for Text Classification with FARM by Sebastian Guggisberg [Blog, 2020]

•Practical NLP for the Real World [Presentation, 2019]

•From Paper to Product – How we implemented BERT by Christoph Henkelmann [Talk, 2020]

Embeddings as a Service

•embedding-as-service [GitHub, ~100 stars]

•Bert-as-service [GitHub, ~8k stars]

NLP Recipes Industrial Applications:

•NLP Recipes by microsoft [GitHub ~5k stars]

•NLP with Python by susanli2016 [GitHub ~1。5k stars]

•Basic Utilities for PyTorch NLP by PetrochukM [GitHub ~2k stars]

NLP Applications in Bio, Finance, Legal and other industries

•Blackstone - A spaCy pipeline and model for NLP on unstructured legal text [GitHub ~300 stars]

•Sci spaCy - spaCy pipeline and models for scientific/biomedical documents [GitHub ~600 stars]

•FinBERT: Pre-Trained on SEC Filings for Financial NLP Tasks [GitHub ~100 stars]

•LexNLP - Information retrieval and extraction for real, unstructured legal text [GitHub ~400 stars]

General Speech Recognition

•wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]

•DeepSpeech - Baidu‘s DeepSpeech architecture [GitHub ~14k stars]

•Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]

•kaldi - Kaldi is a toolkit for speech recognition [GitHub ~9k stars]

•awesome-kaldi - resources for using Kaldi [GitHub ~300 stars]

Text to Speech

•FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub ~500 stars]

Blogs

•Topic Modelling with PySpark and Spark NLP by Maria Obedkova [Spark, Blog, 2020]

Repositories

•Anchored Correlation Explanation Topic Modeling [GitHub ~300 stars]

•Topic Modeling in Embedding Spaces [GitHub ~200 stars] Paper

Data Augmentation

•A Visual Survey of Data Augmentation in NLP [Blog, 2020]

•Data augmentation for NLP [GitHub ~1k stars]

•snorkel Framework to generate training data [GitHub ~4k stars]

Ethics, Bias, and Equality in NLP

•Computational Ethics for NLP - course resources from the Carnegie Mellon University [Lecture Notes, Spring 2020]

•Ethics in NLP - resources from ACLs Ethics in NLP track

General Purpose

•transformers by HuggingFace [GitHub ~28k stars]

•spaCy by Explosion AI [GitHub ~17k stars]

•flair by Zalando [Github ~9k stars]

•AllenNLP by AI2 [Github ~9k stars]

•stanza (former Stanford NLP) [GitHub ~4k stars]

•spaCy stanza [GitHub ~400 stars]

•nltk [GitHub ~9k stars]

•NLP Architect - A Deep Learning NLP/NLU library by Intel® AI Lab [GitHub ~2。5k stars]

•Kashgari Transfer Learning with focus on Chinese [GitHub ~2k stars]

•polyglot - Multi-lingual NLP Framework [Github ~2k stars]

•FARM [GitHub ~1k stars]

•gobbli by RTI International [GitHub ~200 stars]

•headliner - training and deployment of seq2seq models [GitHub ~200 stars]

•SyferText - A privacy preserving NLP framework [GitHub ~100 stars]

Dialog Systems and Speech

•DeepPavlov by MIPT [Github ~4k stars]

•ParlAI by FAIR [Github ~6k stars]

•rasa - Framework for Conversational Agents [GitHub ~9k stars]

•wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]

Distributed NLP

•Spark NLP [Github ~1k stars]

Other NLP Topics

General

•NeuralCoref 4。0: Coreference Resolution in spaCy with Neural Networks by HuggingFace [GitHub ~2k stars]

Tokenization

•tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub ~3k stars]

•SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub ~4k stars]

Books

•Dive into Deep Learning - An interactive deep learning book with code, math, and discussions

•Natural Language Processing and Computational Linguistics - Speech, Morphology and Syntax (Cognitive Science)

Courses

•Choosing the right course for a Practical NLP Engineer

•12 Best Natural Language Processing Courses & Tutorials to Learn Online

Tutorials

•Hands-On NLTK Tutorial [GitHub ~300 stars]

•r/LanguageTechnology - NLP Reddit forum