# Bayesian Learning for Lifelong Machine Learning

## Associate director of Bayesian learning team

Seungjin Choi (POSTECH)

Research areas:

(1) Incremental inference for nonparametric Bayesian learning

(2) Bayesian matrix factorization.

## Participant professors of Bayesian learning team

Kee-Eung Kim (KAIST)
Research areas: (1) Bayesian inverse reinforcement learning (2) Multi-task Bayesian inverse reinforcement learning. |
Alice Oh (KAIST)
Research areas: (1) Online learning for topic model (2) Distributed inference for topic model. |
Sunghwai Oh (SNU)
Research areas: (1) Online learning for Gaussian process (2) Robust Gaussian process for robotics. |

### Incremental Inference for Non-parametric Bayesian Learning

In this research center, we are trying to develop a novel tree-based inference method for MNRM mixture models, extending Bayesian hierarchical clustering (BHC) which was originally developed as a deterministic approximate inference for Dirichlet process mixture (DPM) models. We also present an incremental inference for MNRM mixture models, building a tree incrementally in the sense that the tree structure is partially updated whenever a new data point comes in. The tree, when constructed in such a way, allows us to efficiently perform tree-consistent MAP inference in MRNM mixture models, determining a most probable tree-consistent partition, as well as to compute a marginal likelihood approximately.

### Bayesian Matrix Factorization

In this research center, we are trying to develop an efficient inference algorithm for Bayesian matrix factorization (BMF), which is a popular method for collaborative prediction, because of its robustness to overfitting as well as of being free from cross-validation for fine tuning of regularization parameters. In practice, however, due to its cubic time complexity with respect to the rank of factor matrices, existing variational inference algorithms for BMF are not well suited to web-scale datasets where billions of ratings provided by millions of users are available. In this work, we present a scalable inference for VBMF with side information, the complexity of which is linear in the rank K of factor matrices. Moreover, the algorithm can be easily parallelized on multi-core systems. Experiments on large-scale datasets demonstrate the useful behavior of our algorithm such as scalability, fast learning, and prediction accuracy.

### Model-based Bayesian Reinforcement Learning

In this research center, we are trying to develop a model-based Bayesian reinforcement learning algorithm which has a tighter value function bounds. The Markov decision process (MDP) frameworks for reinforcement learning (RL) problem can be transformed as partially observed Markov decision process (POMDP) planning problem by Bayesian approach. Anytime error minimization search (AEMS) is one of the remarkable algorithm to solve the online POMDP planning problem using heuristic tree search. So we can apply POMDP planning problem to AEMS. This is called as AEMS-BRL. One of the goal is improve the performance of AEMS-BRL with some modification. Using tighter initial bounds or potential-based reward shaping, we can obtain tighter bounds of value function.

### Model-based Bayesian Reinforcement Learning

In this research center, we are trying to develop a model-free Bayesian reinforcement learning algorithm which can compute more exact value function. Model-free Bayesian reinforcement learning does not attempt to learn model of environment explicitly. In traditional Bayesian Q-learning, maximum of the expectation of discounted return is used as optimum Q-value. But, indeed, we should use expectation of the maximum of discounted return as optimum Q-value instead of maximum of the expectation of discounted return. Since maximum of expectation and expectation of maximum are different values, modification of definition will give more accuracy results.

### Online learning for Gaussian process

In this research, our goal is to develop an online learning algorithm for a non-parametric Bayesian regression method, Gaussian process (GP), to address a vast of data stream, since the prediction based on given data samples is limited to handle the big-data. Our baseline method, GP, suffers from a major drawback which is its high computational cost for evaluating the inverse of a kernel matrix whose size grows with the number of training data. To overcome the issue, we present an online dictionary learning method for kernel basis learning when a new data sample comes in while preserving the prediction accuracy. The proposed method is based on the low-rank matrix factorization of a kernel by assuming that the underlying basis of data is represented by a small number of dominant factors. Our online learning is also applied to other methods to handle a matrix which has similar property or structure to a kernel matrix.

### Robust Gaussian Process for Robotics

In this research, we aim to develop a robust Gaussian process regression method for robotics with two approaches: low rank approximation method and leverage optimization method. The formal low rank approximation method utilized a structured low-rank matrix approximation method using nuclear-norm regularized l1-norm minimization for robust motion prediction of dynamic obstacles. In particular, we find a small number of orthogonal bases of the kernel matrix and applied the low-rank kernel matrix to the Gaussian process regression to achieve robustness. The latter leverage optimization adapts recently proposed leverage Gaussian process framework that can incorporate both positive and negative training data in a single regression framework. Under this framework, sparse constrained leverage optimization method was used to find the erroneous training data for robust Gaussian process regression. The proposed leveraged Gaussian process regression with leverage optimization is applied to the autonomous robot navigation in a dynamic environment. .