Agent learns by taking actions and getting rewards.

What is Reinforcement Learning?

Agent interacts with environment to maximize rewards.

**Like training a dog**: Good behavior → Reward!

Key Concepts

**Agent**: Learner (like a robot) **Environment**: World agent interacts with **State**: Current situation **Action**: What agent can do **Reward**: Feedback (positive or negative)

Simple RL Example

```python import gym import numpy as np

Create environment (simple grid world) env = gym.make('FrozenLake-v1')

Q-learning table Q = np.zeros([env.observation_space.n, env.action_space.n])

Parameters learning_rate = 0.8 discount_factor = 0.95 epsilon = 0.1 # Exploration rate episodes = 2000

Training for episode in range(episodes): state = env.reset() done = False while not done: # Choose action (explore vs exploit) if np.random.uniform(0, 1) < epsilon: action = env.action_space.sample() # Explore else: action = np.argmax(Q[state, :]) # Exploit # Take action next_state, reward, done, info = env.step(action) # Update Q-table old_value = Q[state, action] next_max = np.max(Q[next_state, :]) new_value = old_value + learning_rate * (reward + discount_factor * next_max - old_value) Q[state, action] = new_value state = next_state if episode % 100 == 0: print(f"Episode {episode} completed")

print("Training finished!") ```

Test Trained Agent

```python # Test the agent state = env.reset() done = False total_reward = 0

while not done: action = np.argmax(Q[state, :]) # Use learned policy state, reward, done, info = env.step(action) total_reward += reward

print(f"Total reward: {total_reward}") ```

RL Algorithms

**Q-Learning**: Learn action values **SARSA**: On-policy learning **DQN**: Deep Q-Network (uses neural networks) **A3C**: Asynchronous Actor-Critic **PPO**: Proximal Policy Optimization

Applications

- Game playing (Chess, Go, video games) - Robotics - Self-driving cars - Resource management

Remember

- RL learns from trial and error - Balance exploration vs exploitation - Requires many episodes to learn - Works when you can simulate environment

Reinforcement Learning Basics