Transcript – Learning with Safety Constraints
Hi, this is Aria talking about the poster titled Learning with Safety Constraints, Sampo, Complexity of Reinforcement, Learning for Constraint Handpiece, usually market decision processes or NDP are used to model the real world problems. But sometimes they face physical limitations, such as autonomous driving application to resolve such problems. Constraint and Sandeep’s are used with this poster. We address the problem of learning a generic S.P. by developing two algorithms with a theoretical sample complexity results. On one hand, we present another algorithm named Optimistic Chimbu. This algorithm takes the samples all at once at the beginning, creates a model and solves the problem for the model, the quiet. On the other hand, we present an online algorithm called Online Zero. Unlike optimistic jambia, online serial creates a model based on the sample it has so far. Then it solves that model and employs a solution to the environment to collect and update the samples. This procedure is repeated until all elements are assembled sufficiently. Further, we stated in the poster that both algorithms have similar theoretical sample complexity and both results show that the upper bound and sample complexity scares logarithmically the number of constraints. Finally, we simulated both algorithms on a five by five grid network to compare their performances in terms of difference in objective function and constant correlation to experimental results show that both algorithms have similar performance with regard to objective function. But all in theory requires less sampling budget to satisfy the constraints.