License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CP.2022.30
URN: urn:nbn:de:0030-drops-166594
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16659/
Lafleur, Daphné ;
Chandar, Sarath ;
Pesant, Gilles
Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints
Abstract
While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On the other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each value and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus' style and the music theory constraints.
BibTeX - Entry
@InProceedings{lafleur_et_al:LIPIcs.CP.2022.30,
author = {Lafleur, Daphn\'{e} and Chandar, Sarath and Pesant, Gilles},
title = {{Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints}},
booktitle = {28th International Conference on Principles and Practice of Constraint Programming (CP 2022)},
pages = {30:1--30:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-240-2},
ISSN = {1868-8969},
year = {2022},
volume = {235},
editor = {Solnon, Christine},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16659},
URN = {urn:nbn:de:0030-drops-166594},
doi = {10.4230/LIPIcs.CP.2022.30},
annote = {Keywords: Constraint programming, reinforcement learning, RNN, music generation}
}
Keywords: |
|
Constraint programming, reinforcement learning, RNN, music generation |
Collection: |
|
28th International Conference on Principles and Practice of Constraint Programming (CP 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
23.07.2022 |
Supplementary Material: |
|
Software (Source Code): https://github.com/chandar-lab/RL-Tuner-CP |