The effectiveness of rule-based policies as a search control mechanism in planning has been demonstrated in several planners. A key benefit is that a single policy captures the solution to a set of related planning problems. However, it has been observed that a small number of weak rules, common in learned control knowledge, can make a rule system ineffective. As a result, research has focussed on approaches that improve the robustness of exploiting (potentially weak) rules in search. In this work we examine two aspects that can lead to weak rules: the language that the rules are drawn from and the approach used to learn the rules. The rules are often captured using the predicates and actions of the problem models that the knowledge applies to. However, this language is appropriate for expressing the constraints of the planning world, and will not necessarily include the appropriate words required to express a general solution. We present an approach to automatically invoke language enhancements that are appropriate for the particular aspects of the target problems. These enhancements support rules in problems that include structure interactions, such as graph traversal and block stacking; and optimisation tasks, such as resource management. There have been several approaches made to learning policies explored in the literature. Learning policies requires a fitness function, which measures the quality of a policy. In previous approaches these have relied on a collection of examples generated by a remote planner. However, we have observed that this leads to weak guidance in domains where global optimisation is required for an optimal solution (such as transportation domains). In these domains we expect good, but not optimal action choices, and this conflicts with the assumption that example states can be accurately explained and ultimately leads to weak rules. Instead of measuring performance from a set of remotely drawn example situations, we propose using progress towards goal instead. Our approach is evaluated using rule-based policies to control search in problems from the benchmark planning domains. We demonstrate that domain models can be automatically enhanced and that this enhanced language can be exploited by both hand-written and learned policies allowing them to effectively control search. The learning approach is evaluated by learning policies for several of the enhanced domains and it is analysed providing guidance for future work. A key contribution of this work is demonstrating that both hand-written and learned rule-based policies can be used to generate plans that have better quality than domain independent planners. We also learn effective policies for several domains currently untreated in the literature.
|Date of Award||9 Jun 2015|
- University Of Strathclyde
|Supervisor||Maria Fox (Supervisor) & Derek Long (Supervisor)|