AI7 min read
Natural Language Processing Tasks
Common NLP tasks and techniques.
Robert Anderson
December 18, 2025
0.0k0
Process text with AI.
Text Preprocessing
Clean text before using:
import re
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
text = "The cats are running in Boston!"
# Lowercase
text = text.lower()
# Remove punctuation
text = re.sub(r'[^ws]', '', text)
# Remove stopwords
stop_words = set(stopwords.words('english'))
words = [w for w in text.split() if w not in stop_words]
# Stemming
stemmer = PorterStemmer()
words = [stemmer.stem(w) for w in words]
print(words) # ['cat', 'run', 'boston']
Tokenization
Split text into pieces:
from nltk.tokenize import word_tokenize
text = "Hello from San Francisco!"
tokens = word_tokenize(text)
print(tokens) # ['Hello', 'from', 'San', 'Francisco', '!']
Bag of Words
Convert text to numbers:
from sklearn.feature_extraction.text import CountVectorizer
texts = [
"I love AI",
"AI is amazing",
"I love machine learning"
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
print(vectorizer.get_feature_names_out())
print(X.toarray())
TF-IDF
Better than bag of words:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)
# Rare words get higher scores
# Common words get lower scores
Sentiment Analysis
from textblob import TextBlob
text = "The restaurant in Austin was amazing!"
blob = TextBlob(text)
sentiment = blob.sentiment.polarity
if sentiment > 0:
print("Positive!")
elif sentiment < 0:
print("Negative!")
else:
print("Neutral")
Named Entity Recognition
Find names, places, organizations:
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Apple Inc. is located in Cupertino, California."
doc = nlp(text)
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
# Apple Inc.: ORG
# Cupertino: GPE
# California: GPE
Text Classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('tfidf', TfidfVectorizer()),
('clf', MultinomialNB())
])
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
Remember
- Always preprocess text
- TF-IDF better than bag of words
- Use pre-trained models when possible
#AI#Intermediate#NLP