AI45 min read

AI Interview Questions: 50 Essential Questions for Developers

Comprehensive collection of 50 essential AI interview questions covering machine learning, deep learning, neural networks, NLP, and modern AI concepts. Free AI, Artificial Intelligence interview questions with answers. AI ML interview prep guide.

Dr. Sarah Chen
December 16, 2025
0.0k0

This comprehensive guide covers 50 essential AI interview questions that every AI developer should know. These questions cover fundamental concepts, machine learning algorithms, deep learning, neural networks, natural language processing, and modern AI trends commonly asked in technical interviews.

Machine Learning Fundamentals

Understanding core machine learning concepts is essential for any AI developer. These questions test your knowledge of supervised/unsupervised learning, algorithms, evaluation metrics, and model training.

Deep Learning & Neural Networks

Deep learning has revolutionized AI. Master these questions to demonstrate your understanding of neural networks, backpropagation, activation functions, and modern architectures.

Natural Language Processing

NLP enables machines to understand human language. These questions cover tokenization, embeddings, transformers, and large language models that power modern AI applications.

Computer Vision

Computer vision allows machines to interpret visual information. These questions cover image processing, convolutional neural networks, object detection, and image classification.

Modern AI & Best Practices

Latest AI trends include transformers, LLMs, generative AI, and ethical considerations. These questions cover cutting-edge AI technologies and best practices for production systems.

#AI#Artificial Intelligence#Machine Learning#ML#Deep Learning#NLP#Interview#AI Interview#AI Tutorial#Neural Networks

Common Questions & Answers

Q1

What is Artificial Intelligence and how does it differ from Machine Learning?

A

Artificial Intelligence is broad field of creating intelligent machines that can perform tasks requiring human intelligence. Machine Learning is subset of AI that enables systems to learn from data without explicit programming. AI includes ML, but also rule-based systems, expert systems, and other approaches.

Q2

What is the difference between supervised and unsupervised learning?

A

Supervised learning uses labeled data to train models that predict outputs. Unsupervised learning finds patterns in unlabeled data without target outputs. Supervised: classification, regression. Unsupervised: clustering, dimensionality reduction, association.

python
# Supervised Learning
from sklearn.linear_model import LinearRegression
X_train, y_train = labeled_data
model.fit(X_train, y_train)

# Unsupervised Learning
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(unlabeled_data)
Q3

What is overfitting and how do you prevent it?

A

Overfitting occurs when model learns training data too well, including noise, and performs poorly on new data. Prevent with: cross-validation, regularization (L1/L2), dropout, early stopping, more data, feature selection, ensemble methods, reducing model complexity.

python
# Regularization
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)  # L2 regularization

# Early stopping
from tensorflow.keras.callbacks import EarlyStopping
callback = EarlyStopping(monitor='val_loss', patience=5)
Q4

What is cross-validation?

A

Cross-validation splits data into k folds, trains on k-1 folds, tests on remaining fold, repeats k times. Provides better estimate of model performance than single train/test split. Common: k-fold (k=5 or 10), stratified k-fold, leave-one-out.

python
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
scores = cross_val_score(model, X, y, cv=5)
print(f"Accuracy: {scores.mean():.2f} (+/- {scores.std() * 2:.2f})")
Q5

What is the difference between precision and recall?

A

Precision measures accuracy of positive predictions (TP / (TP + FP)). Recall measures ability to find all positives (TP / (TP + FN)). High precision: few false positives. High recall: few false negatives. F1-score balances both: 2 * (precision * recall) / (precision + recall).

python
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
Q6

What is gradient descent?

A

Gradient descent is optimization algorithm that minimizes cost function by iteratively moving in direction of steepest descent (negative gradient). Updates parameters: θ = θ - α * ∇J(θ). α is learning rate. Variants: batch, stochastic, mini-batch, Adam, RMSprop.

python
# Gradient descent update
def gradient_descent(X, y, theta, alpha, iterations):
    m = len(y)
    for i in range(iterations):
        predictions = X.dot(theta)
        error = predictions - y
        gradient = X.T.dot(error) / m
        theta = theta - alpha * gradient
    return theta
Q7

What is a neural network?

A

Neural network is computing system inspired by biological neurons. Consists of layers: input, hidden, output. Each neuron applies activation function to weighted sum of inputs. Learns by adjusting weights through backpropagation. Can approximate any function (universal approximation theorem).

python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(128, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])
Q8

What is backpropagation?

A

Backpropagation is algorithm for training neural networks. Calculates gradient of loss function with respect to weights using chain rule, propagating errors backward from output to input. Enables efficient computation of gradients for all layers. Core of deep learning training.

python
# Backpropagation computes gradients
def backward_pass(loss, model):
    gradients = []
    error = loss
    for layer in reversed(model.layers):
        grad = compute_gradient(layer, error)
        gradients.append(grad)
        error = propagate_error(layer, error)
    return gradients
Q9

What are activation functions and why are they important?

A

Activation functions introduce non-linearity to neural networks. Without them, network is just linear transformation. Common: ReLU (most popular), sigmoid (0-1), tanh (-1 to 1), softmax (probabilities). ReLU avoids vanishing gradient, faster training.

python
import numpy as np

def relu(x):
    return np.maximum(0, x)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / exp_x.sum()
Q10

What is the difference between batch, stochastic, and mini-batch gradient descent?

A

Batch uses all training data per update, stable but slow. Stochastic uses one sample per update, fast but noisy. Mini-batch uses small subset (32-256 samples), balances speed and stability. Mini-batch is most common in practice.

python
# Mini-batch gradient descent
batch_size = 32
for epoch in range(epochs):
    for i in range(0, len(X), batch_size):
        batch_X = X[i:i+batch_size]
        batch_y = y[i:i+batch_size]
        gradients = compute_gradients(batch_X, batch_y)
        update_weights(gradients)
Q11

What is a convolutional neural network (CNN)?

A

CNN is neural network architecture designed for image data. Uses convolutional layers to detect features (edges, shapes, patterns) through filters. Includes pooling layers for dimensionality reduction. Excellent for image classification, object detection, computer vision tasks.

python
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(2, 2),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(10, activation='softmax')
])
Q12

What is a recurrent neural network (RNN)?

A

RNN processes sequences by maintaining hidden state across time steps. Can handle variable-length inputs. Suffers from vanishing gradient problem. Variants: LSTM (long short-term memory), GRU (gated recurrent unit) solve gradient issues. Used for NLP, time series.

python
from tensorflow.keras.layers import LSTM

model = Sequential([
    LSTM(128, return_sequences=True, input_shape=(timesteps, features)),
    LSTM(64),
    Dense(1, activation='sigmoid')
])
Q13

What is the transformer architecture?

A

Transformer uses attention mechanism instead of recurrence. Self-attention allows parallel processing, better long-range dependencies. Consists of encoder-decoder with multi-head attention, feed-forward networks, layer normalization. Basis for BERT, GPT, modern LLMs.

python
from transformers import TransformerModel

model = TransformerModel(
    num_layers=6,
    d_model=512,
    num_heads=8,
    dff=2048
)
Q14

What is attention mechanism?

A

Attention allows model to focus on relevant parts of input when making predictions. Computes weighted sum of values based on query-key similarity. Self-attention relates different positions in same sequence. Enables understanding of context and relationships.

python
import torch.nn as nn

class Attention(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.query = nn.Linear(dim, dim)
        self.key = nn.Linear(dim, dim)
        self.value = nn.Linear(dim, dim)
    
    def forward(self, x):
        Q, K, V = self.query(x), self.key(x), self.value(x)
        scores = Q @ K.T / (dim ** 0.5)
        attention = torch.softmax(scores, dim=-1)
        return attention @ V
Q15

What is transfer learning?

A

Transfer learning uses pre-trained model on new task. Leverages knowledge from large dataset (ImageNet) for smaller dataset. Fine-tune last layers or use as feature extractor. Saves time and data. Common in computer vision, NLP (BERT, GPT).

python
from tensorflow.keras.applications import VGG16

base_model = VGG16(weights='imagenet', include_top=False)
base_model.trainable = False  # Freeze base

model = Sequential([
    base_model,
    Flatten(),
    Dense(128, activation='relu'),
    Dense(num_classes, activation='softmax')
])
Q16

What is dropout and why is it used?

A

Dropout randomly sets fraction of neurons to zero during training. Prevents overfitting by reducing co-adaptation. Forces network to learn redundant representations. Typically 0.2-0.5 dropout rate. Only active during training, all neurons used during inference.

python
from tensorflow.keras.layers import Dropout

model = Sequential([
    Dense(128, activation='relu'),
    Dropout(0.5),  # 50% dropout
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(10, activation='softmax')
])
Q17

What is batch normalization?

A

Batch normalization normalizes inputs to each layer by subtracting mean and dividing by standard deviation of batch. Stabilizes training, allows higher learning rates, reduces internal covariate shift. Usually applied before activation function.

python
from tensorflow.keras.layers import BatchNormalization

model = Sequential([
    Dense(128),
    BatchNormalization(),
    Activation('relu'),
    Dense(64),
    BatchNormalization(),
    Activation('relu')
])
Q18

What is the difference between classification and regression?

A

Classification predicts discrete categories (classes). Regression predicts continuous values. Classification: email spam/not spam, image labels. Regression: house prices, temperature prediction. Different loss functions: cross-entropy for classification, MSE for regression.

python
# Classification
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)  # y_train: categories

# Regression
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train, y_train)  # y_train: continuous values
Q19

What is feature engineering?

A

Feature engineering creates, transforms, selects features to improve model performance. Includes: scaling, encoding categorical variables, creating polynomial features, handling missing values, feature selection. Critical for model success, often more important than algorithm choice.

python
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Encoding
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y_categorical)
Q20

What is the bias-variance tradeoff?

A

Bias is error from oversimplifying assumptions. Variance is error from sensitivity to small fluctuations. High bias: underfitting. High variance: overfitting. Goal: balance both. Complex models: low bias, high variance. Simple models: high bias, low variance.

python
# High bias (underfitting) - too simple
model = LinearRegression()  # May miss patterns

# High variance (overfitting) - too complex
model = DecisionTreeClassifier(max_depth=None)  # Memorizes data

# Balanced
model = RandomForestClassifier(n_estimators=100, max_depth=10)
Q21

What is ensemble learning?

A

Ensemble combines multiple models for better performance. Types: bagging (parallel, e.g., Random Forest), boosting (sequential, e.g., XGBoost, AdaBoost), stacking (meta-learner). Reduces variance, improves accuracy. "Wisdom of crowds" principle.

python
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression

model1 = RandomForestClassifier()
model2 = LogisticRegression()
ensemble = VotingClassifier([('rf', model1), ('lr', model2)], voting='soft')
Q22

What is natural language processing (NLP)?

A

NLP enables computers to understand, interpret, generate human language. Tasks: text classification, sentiment analysis, machine translation, named entity recognition, question answering. Uses: tokenization, embeddings, transformers, language models.

python
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")
# [{'label': 'POSITIVE', 'score': 0.9998}]
Q23

What are word embeddings?

A

Word embeddings represent words as dense vectors capturing semantic meaning. Similar words have similar vectors. Methods: Word2Vec, GloVe, FastText, contextual embeddings (BERT, ELMo). Enable models to understand word relationships and meaning.

python
from gensim.models import Word2Vec

sentences = [["hello", "world"], ["machine", "learning"]]
model = Word2Vec(sentences, vector_size=100, window=5)
vector = model.wv['hello']  # 100-dimensional vector
Q24

What is BERT?

A

BERT (Bidirectional Encoder Representations from Transformers) is transformer-based language model. Pre-trained on masked language modeling and next sentence prediction. Generates contextual word embeddings. Fine-tuned for various NLP tasks. Revolutionized NLP performance.

python
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)
Q25

What is GPT and how does it work?

A

GPT (Generative Pre-trained Transformer) is autoregressive language model. Predicts next token given previous tokens. Uses decoder-only transformer. Pre-trained on large text corpus, fine-tuned for tasks. GPT-3, GPT-4 are large-scale versions powering ChatGPT.

python
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=50)
Q26

What is reinforcement learning?

A

Reinforcement learning learns through interaction with environment. Agent takes actions, receives rewards/penalties, learns optimal policy. Components: agent, environment, state, action, reward, policy. Used in games, robotics, recommendation systems.

python
import gym

env = gym.make('CartPole-v1')
state = env.reset()

for _ in range(1000):
    action = agent.choose_action(state)
    next_state, reward, done, _ = env.step(action)
    agent.learn(state, action, reward, next_state)
    state = next_state
Q27

What is the difference between generative and discriminative models?

A

Generative models learn joint probability P(X, Y), can generate new data. Discriminative models learn conditional probability P(Y|X), classify data. Generative: Naive Bayes, GANs, VAEs. Discriminative: Logistic Regression, SVMs, Neural Networks.

python
# Generative - learns P(X, Y)
from sklearn.naive_bayes import GaussianNB
generative_model = GaussianNB()

# Discriminative - learns P(Y|X)
from sklearn.linear_model import LogisticRegression
discriminative_model = LogisticRegression()
Q28

What is a GAN (Generative Adversarial Network)?

A

GAN consists of generator (creates fake data) and discriminator (distinguishes real from fake). Trained adversarially: generator tries to fool discriminator, discriminator tries to detect fakes. Used for image generation, data augmentation, style transfer.

python
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(100, 784)  # Generate 28x28 image
    
    def forward(self, z):
        return torch.tanh(self.fc(z))

class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(784, 1)
    
    def forward(self, x):
        return torch.sigmoid(self.fc(x))
Q29

What is the difference between L1 and L2 regularization?

A

L1 (Lasso) adds sum of absolute weights to loss, encourages sparsity (zero weights), feature selection. L2 (Ridge) adds sum of squared weights, prevents large weights, smoother solutions. Elastic Net combines both. L1 for feature selection, L2 for generalization.

python
from sklearn.linear_model import Lasso, Ridge, ElasticNet

# L1 regularization
lasso = Lasso(alpha=1.0)

# L2 regularization
ridge = Ridge(alpha=1.0)

# Both
elastic = ElasticNet(alpha=1.0, l1_ratio=0.5)
Q30

What is the curse of dimensionality?

A

Curse of dimensionality: as dimensions increase, data becomes sparse, distances become similar, volume increases exponentially. Makes learning difficult, requires more data. Solutions: dimensionality reduction (PCA, t-SNE), feature selection, regularization.

python
from sklearn.decomposition import PCA

# Reduce dimensions
pca = PCA(n_components=50)
X_reduced = pca.fit_transform(X_high_dimensional)
Q31

What is PCA (Principal Component Analysis)?

A

PCA reduces dimensionality by finding principal components (directions of maximum variance). Projects data onto lower-dimensional space. Preserves most variance with fewer dimensions. Unsupervised, linear transformation. Used for visualization, noise reduction, feature extraction.

python
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
print(f"Explained variance: {pca.explained_variance_ratio_}")
Q32

What is the difference between accuracy, precision, recall, and F1-score?

A

Accuracy: (TP + TN) / total, overall correctness. Precision: TP / (TP + FP), positive prediction accuracy. Recall: TP / (TP + FN), ability to find positives. F1: harmonic mean of precision and recall, balances both. Use based on problem requirements.

python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
Q33

What is the ROC curve and AUC?

A

ROC curve plots True Positive Rate vs False Positive Rate at different thresholds. AUC (Area Under Curve) measures classifier performance: 1.0 perfect, 0.5 random, >0.7 good. Higher AUC = better discrimination. Useful for binary classification evaluation.

python
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

fpr, tpr, thresholds = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.2f}')
Q34

What is hyperparameter tuning?

A

Hyperparameter tuning finds optimal hyperparameters (not learned, set before training). Methods: grid search (exhaustive), random search (random sampling), Bayesian optimization (efficient). Examples: learning rate, batch size, number of layers, regularization strength.

python
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30]
}

grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
Q35

What is the difference between training, validation, and test sets?

A

Training set: used to train model. Validation set: used to tune hyperparameters, select models, prevent overfitting. Test set: used for final evaluation, never used during training. Typical split: 60% train, 20% validation, 20% test. Test set should be held out completely.

python
from sklearn.model_selection import train_test_split

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Train on X_train, tune on X_val, evaluate on X_test
Q36

What is data augmentation?

A

Data augmentation creates new training examples by applying transformations to existing data. For images: rotation, flipping, scaling, color jittering. Increases dataset size, improves generalization, reduces overfitting. Common in computer vision, can apply to text/NLP too.

python
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)
augmented = datagen.flow(X_train, y_train, batch_size=32)
Q37

What is the difference between batch normalization and layer normalization?

A

Batch normalization normalizes across batch dimension, depends on batch statistics. Layer normalization normalizes across features for each sample, independent of batch. Layer norm better for RNNs, transformers, variable batch sizes. Batch norm common in CNNs.

python
from tensorflow.keras.layers import BatchNormalization, LayerNormalization

# Batch normalization
bn = BatchNormalization()

# Layer normalization
ln = LayerNormalization()
Q38

What is the vanishing gradient problem?

A

Vanishing gradient occurs when gradients become very small during backpropagation through deep networks. Makes early layers learn slowly or not at all. Caused by activation functions like sigmoid/tanh. Solutions: ReLU, residual connections, batch normalization, gradient clipping.

python
# ReLU avoids vanishing gradient
def relu(x):
    return np.maximum(0, x)  # Gradient is 1 for x > 0

# Residual connections help
def residual_block(x):
    identity = x
    out = conv_layer(x)
    out = out + identity  # Skip connection
    return out
Q39

What is the exploding gradient problem?

A

Exploding gradient occurs when gradients become very large, causing unstable training, NaN values. Common in RNNs with long sequences. Solutions: gradient clipping (limit gradient magnitude), better initialization, batch normalization, smaller learning rates.

python
# Gradient clipping
import torch.nn.utils as utils

utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

# Or in TensorFlow
optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)
Q40

What is the difference between RNN, LSTM, and GRU?

A

RNN: basic recurrent unit, suffers vanishing gradient. LSTM: long short-term memory, has forget gate, input gate, output gate, cell state. GRU: gated recurrent unit, simpler than LSTM, has reset and update gates. LSTM/GRU solve vanishing gradient, better for long sequences.

python
from tensorflow.keras.layers import SimpleRNN, LSTM, GRU

# Basic RNN
rnn = SimpleRNN(64)

# LSTM
lstm = LSTM(64)

# GRU
gru = GRU(64)
Q41

What is object detection?

A

Object detection identifies and locates objects in images with bounding boxes. More complex than classification (which only labels). Methods: YOLO (You Only Look Once), R-CNN, SSD, RetinaNet. Outputs: class labels and bounding box coordinates for each object.

python
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
results = model('image.jpg')
for result in results:
    boxes = result.boxes  # Bounding boxes
    classes = result.boxes.cls  # Class labels
Q42

What is semantic segmentation?

A

Semantic segmentation classifies each pixel in image into categories. Denser prediction than object detection. Used in medical imaging, autonomous vehicles, scene understanding. Architectures: U-Net, FCN, DeepLab, SegNet. Outputs pixel-level class labels.

python
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, UpSampling2D

# U-Net architecture for segmentation
def unet(input_shape):
    # Encoder-decoder with skip connections
    # Output: pixel-wise class predictions
    pass
Q43

What is the difference between fine-tuning and feature extraction?

A

Fine-tuning updates all or some layers of pre-trained model on new task. Feature extraction freezes pre-trained layers, only trains new classifier head. Fine-tuning: more data, similar task. Feature extraction: less data, different task. Fine-tuning usually better if data allows.

python
# Feature extraction
base_model.trainable = False
model = Sequential([base_model, Dense(num_classes)])

# Fine-tuning
base_model.trainable = True
for layer in base_model.layers[:-5]:
    layer.trainable = False  # Freeze early layers
Q44

What is the difference between supervised, unsupervised, and semi-supervised learning?

A

Supervised: labeled data, learns input-output mapping. Unsupervised: unlabeled data, finds patterns. Semi-supervised: mix of labeled and unlabeled data, uses both. Semi-supervised useful when labels are expensive/scarce but unlabeled data is abundant.

python
# Semi-supervised learning
from sklearn.semi_supervised import SelfTrainingClassifier

model = SelfTrainingClassifier(base_estimator=LogisticRegression())
model.fit(X_labeled + X_unlabeled, y_labeled + [-1]*len(X_unlabeled))
Q45

What is the difference between parametric and non-parametric models?

A

Parametric models have fixed number of parameters (e.g., linear regression, neural networks). Non-parametric models number of parameters grows with data (e.g., k-NN, decision trees). Parametric: faster, less data needed. Non-parametric: more flexible, need more data.

python
# Parametric
from sklearn.linear_model import LinearRegression
model = LinearRegression()  # Fixed parameters

# Non-parametric
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)  # Adapts to data
Q46

What is the difference between bagging and boosting?

A

Bagging trains models in parallel on different data subsets, averages predictions (e.g., Random Forest). Boosting trains models sequentially, each corrects previous errors (e.g., AdaBoost, XGBoost). Bagging reduces variance, boosting reduces bias. Both improve accuracy.

python
# Bagging
from sklearn.ensemble import RandomForestClassifier
bagging = RandomForestClassifier(n_estimators=100)

# Boosting
from sklearn.ensemble import AdaBoostClassifier
boosting = AdaBoostClassifier(n_estimators=100)
Q47

What is XGBoost?

A

XGBoost (Extreme Gradient Boosting) is optimized gradient boosting implementation. Features: regularization, parallel processing, handles missing values, tree pruning. Often wins Kaggle competitions. Fast, accurate, handles large datasets. Popular for tabular data.

python
import xgboost as xgb

model = xgb.XGBClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1
)
model.fit(X_train, y_train)
Q48

What is the difference between tokenization and stemming?

A

Tokenization splits text into tokens (words, subwords). Stemming reduces words to root form (running -> run). Both are text preprocessing. Tokenization: first step. Stemming: normalization, can be aggressive. Lemmatization is better alternative to stemming (preserves meaning).

python
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

tokens = word_tokenize("Running quickly")
stemmer = PorterStemmer()
stems = [stemmer.stem(token) for token in tokens]  # ['run', 'quickli']
Q49

What is the difference between TF-IDF and word embeddings?

A

TF-IDF is sparse vector representation based on term frequency and inverse document frequency. Word embeddings are dense vectors capturing semantic meaning. TF-IDF: interpretable, good for traditional ML. Embeddings: capture relationships, better for deep learning, more powerful.

python
# TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_tfidf = vectorizer.fit_transform(texts)

# Word embeddings
from gensim.models import Word2Vec
model = Word2Vec(sentences, vector_size=100)
embeddings = model.wv
Q50

What are the ethical considerations in AI?

A

AI ethics includes: bias and fairness (prevent discrimination), transparency (explainable AI), privacy (data protection), accountability (who is responsible), safety (robust systems), job displacement. Important for responsible AI development and deployment. Regulations emerging (GDPR, AI Act).

python
# Fairness metrics
from fairlearn.metrics import demographic_parity_difference

fairness = demographic_parity_difference(
    y_true, y_pred, sensitive_features=gender
)

# Explainability
from shap import TreeExplainer
explainer = TreeExplainer(model)
shap_values = explainer.shap_values(X)