31
Getting started with Anaconda Notebooks
Course Overview and Learning Objectives
What is Natural Language Processing?
What Are Large Language Models?
How Computers See Text
Strengths and Limits of Large Language Models
Concerns in Procuring LLM Data
Exercise: Procuring LLM Data
Text Cleaning
Manual Tokenization
Using the Natural Language Toolkit (NLTK)
Stemming
Using spaCy
Exercise: Tokenize Text
Converting Text to Numbers
Word Counts
Word Frequencies
Word Hashing
Binary and Other Parameters
Exercise: Vectorize Text
Data Preparation with a Real-world Dataset
Cleaning and Tokenizing the Data
Vectorizing the Data
Exercise: Data Quality
What are Word Embeddings?
Word2Vec and GloVe
Word Embedding with Gensim
Word Embedding with Gensim Continued
Exercise: Word Embedding
Summary
End of course survey