Chose a domain and concept of any example. Add some amount of probabilistic behavior and reward to this environment and model it as a Markov Decision Problem (MDP).

Markov Decision Processes (MDPs)

Regarding the idea of Markov Decision Processes (MDPs) as a way of formalizing what it means to make optimal decisions in probabilistic domains. MDPs also generalize the idea of having a single goal state to instead having reward, positive or negative, that can accumulate at various states. Chose a domain and concept of any example. Add some amount of probabilistic behavior and reward to this environment and model it as a Markov Decision Problem (MDP).

For example: maybe the environment is slippery, and actions sometimes don’t have the desired effects. Maybe some squares give negative reward some percentage of the time (traps?). Maybe all squares give negative reward some percentage of the time (meteorite?). Maybe some walls are electrified? Etc.

The required to write down how this would be modeled as an MDP:

States

Actions in each state

Transition function, i.e. probability that an action in a state will produce a given successor state

Reward function, i.e., which transitions produce a reward, and how much?

Do you have a guess what the optimal value function and policy should look like?

5 pages, 12 times new roman. Double Space.

Just make sure to define what is it and cover the mentioned bullet points above.

Solution preview for the order on Chose a domain and concept of any example. Add some amount of probabilistic behavior and reward to this environment and model it as a Markov Decision Problem (MDP).

APA

1556 words