NLP
- 14. Natural Language Processing: Pretraining
- 14.1. Word Embedding (word2vec)
- 14.2. Approximate Training
- 14.3. The Dataset for Pretraining Word Embedding
- 14.4. Pretraining word2vec
- 14.5. Word Embedding with Global Vectors (GloVe)
- 14.6. Subword Embedding
- 14.7. Finding Synonyms and Analogies
- 14.8. Bidirectional Encoder Representations from Transformers (BERT)
- 14.9. The Dataset for Pretraining BERT
- 14.10. Pretraining BERT
- 15. Natural Language Processing: Applications
- 15.1. Sentiment Analysis and the Dataset
- 15.2. Sentiment Analysis: Using Recurrent Neural Networks
- 15.3. Sentiment Analysis: Using Convolutional Neural Networks
- 15.4. Natural Language Inference and the Dataset
- 15.5. Natural Language Inference: Using Attention
- 15.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications
- 15.7. Natural Language Inference: Fine-Tuning BERT
NLP (Natural Language Processing)โ
- Lexical Processing
- Semantic Analysis
- Syntactic Analysis
- Neural Network (NN)
- Recurring NN
- Chatbot Project
Why Natural Language is hard for computer to parseโ
May is fun but June bores me.
Does it refer to months or to people?
https://www.toptal.com/machine-learning/google-nlp-tutorial
Natural Language Processing with TensorFlow 2 - Beginner's Course
https://www.freecodecamp.org/news/google-bert-nlp-machine-learning-tutorial
Spacyโ
Industrial-Strength Natural Language Processing
spaCy ยท Industrial-strength Natural Language Processing in Python
GitHub - explosion/spaCy: ๐ซ Industrial-strength Natural Language Processing (NLP) in Python
Gensim (Topic Modeling for Humans)โ
Gensim is a Python library fortopic modeling, document indexingandsimilarity retrievalwith large corpora. Target audience is thenatural language processing(NLP) andinformation retrieval(IR) community.
https://github.com/parulsethi/gensim
https://radimrehurek.com/gensim
https://www.toptal.com/python/topic-modeling-python
Text Similarity Methodsโ
- Normalized, metric, similarity and distance
- (Normalized) similarity and distance
- Metric distances
- Shingles (n-gram) based similarity and distance
- Levenshtein
- Normalized Levenshtein
- Weighted Levenshtein
- Damerau-Levenshtein
- Optimal String Alignment
- Jaro-Winkler
- Longest Common Subsequence
- Metric Longest Common Subsequence
- N-Gram
- Shingle(n-gram) based algorithms
- Q-Gram
- Cosine similarity
- Jaccard index
- Sorensen-Dice coefficient
- Overlap coefficient (i.e., Szymkiewicz-Simpson)
https://github.com/luozhouyang/python-string-similarity#python-string-similarity
FlashTextโ
Replace keywords in sentences or extract keywords from sentences
https://pypi.org/project/flashtext
ML Kit Natural Language APIsโ
- Language ID
- On-device translation
- Smart reply
- Entity extraction
https://developers.google.com/ml-kit
Haystackโ
Haystack is the open source Python framework by deepset for building custom apps with large language models (LLMs). It lets you quickly try out the latest models in natural language processing (NLP) while being flexible and easy to use. Our inspiring community of users and builders has helped shape Haystack into what it is today: a complete framework for building production-ready NLP apps.
Referencesโ
The Association for Computational Linguistics is the international organization that represents the field of NLP. The ACL website (http://www.aclweb.org) hosts many useful resources, including: information about international and regional conferences and workshops; the ACL Wiki with links to hundreds of useful resources; and the ACL Anthology, which contains most of the NLP research literature from the past 50+ years, fully indexed and freely downloadable.
https://www.freecodecamp.org/news/natural-language-processing-with-spacy-python-full-course