Skip to menu

The First Korea-Japan Machine Learning Symposium


AI and Brain Science
Keynote Speaker

Prof. Shunichi Amari (

RIKEN Brain Science Institute

Short Bio Ph.D. in Dept. of Mathematical Engineering, The University of Tokyo, 1963
MS. in Dept. of Mathematical Engineering, The University of Tokyo, 1960
BS. in Dept. of Mathematical Engineering, The University of Tokyo, 1958
Abstract After short historical notes on both AI and neural network research, I will discuss two mathematical problems related to feature extraction by self-organization and singularities in dynamics of stochastic descent supervised learning, both of which are involved in deep learning, from the viewpoint of information geometry. I will then talk about the future perspective of the interactions between AI and brain science, focusing on what AI learns from brain science and what brain science learns from AI research.
Bayesian Machine Learning: Two Examples from Our Recent Work

Prof. Seungjin Choi (


Department of Computer Science and Engineering

Short Bio Professor, in Dept. of Computer Science and Engineering, POSTECH, 2001-Present
Ph.D. in Dept. of Electrical Engineering, University of Notre Dame, Indiana, USA, 1996
MS. in Dept. of Electrical Engineering, Seoul National University, 1989
BS. in Dept. of Electrical Engineering, Seoul National University, 1987
Abstract Suppose that you are given a statistical model which you wish to use to make predictions. To be Bayesian, you compute the posterior distribution over parameters, rather than seeking point estimates, in order to compute the predictive distribution of unseen examples. For past decades, Bayesian models and inference have been served as a big hammer in machine learning, leading to Bayesian machine learning. In this talk, I will introduce two recent work, emphasizing the benefit of Bayesian model comparison and the role of Bayesian nonparametric priors. The first half of this talk is devoted to show that Bayesian model comparison allows us to choose a subset of pre-trained deep representations expected to yield the best performance for classification, of the current image data provided to us. In the second half, I introduce Bayesian hierarchical clustering with exponential family, where a Bayesian model is used for the agglomerative hierarchical clustering.
Machine Learning from Weak Supervision

Masashi Sugiyama (

The University of Tokyo

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences,

Short Bio Ph. D., in Dept. of Computer Science, Tokyo Institute of Technology, 2001
M.S., in Dept. of Computer Science, Tokyo Institute of Technology, 1999
B.S., in Dept. of Computer Science, Tokyo Institute of Technology, 1997
Abstract Machine learning from big training data is making a great success in various real-world applications such as speech, image, and natural language processing. However, there are various application domains that prohibit the use of massive labeled data. In this talk, I will introduce our recent advances in machine learning from weak supervision, including unsupervised classification and classification only from positive and unlabeled data.
The Relational Automatic Statistician System for Multiple Time-Series Data Analysis

Prof. Jaesik Choi (


Short Bio Assistant Professor, School of Electrical and Computer Engineering, UNIST, 2013-present
Affiliate Research, in Computational Research Division, Lawrence Berkeley National Lab, 2013-present
Postdoc Fellow, in Computational Research Division, Lawrence Berkeley National Lab, 2013
Ph.D., in Dept. of Computer Science, University of Illinois at Urbana-Champaign, 2012
B.S., in Dept. of Computer Engineering, Seoul National University, 2004
Abstract Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.
Safe Feature/Sample Screening and its Applications to High-order Interaction Modeling and Quick Sensitivity Analysis

Prof. Ichiro Takeuchi (

Nagoya Institute of Technology

Department of Computer Science/Scientific and Engineering Simulation

Short Bio Ph. D., in Dept. of Electrical Engineering, Japan, 2000
M.S., in Dept. of Information and Electronics Engineering, Japan, 1998
B.S., Nagoya University, Japan, 1996
Abstract Sparse modeling is one of the key techniques for working with large scale data both in size and dimension. For example, the Lasso is designed for promoting feature sparsity, while the SVM exploits sample sparsity. The main computational bottleneck in optimizing these sparse models is in identifying active components (e.g., non-zero coefficients in Lasso or support vectors in SVM). Recently, El Ghaoui et al. introduced a promising approach called safe feature screening (SFS) that allows us to find a subset of non-active components without solving the optimization problem. SFS is useful for high-dimensional problems because we can discard a subset of features that are identified as non-active before solving the optimization problem without any risk of falsely removing out important features. In this talk, we present our three recent studies on extending the idea of SFS in several ways. First, we generalize the SFS approach for handling sample sparsity in the context of SVM (ICML2013). Second, we extend the approach for sparse high-order interaction modeling where the model has exponentially increasing number of high-order interaction features. Finally, we demonstrate that the main algorithmic idea of SFS can be used not only for sparse modeling but also for a variety of other machine learning tasks by applying it to quick sensitivity analysis on data perturbation (KDD2015).
Machine Learning for Materials Discovery: Virtual screening and Bayesian optimization

Prof. Koji Tsuda (

The University of Tokyo

Short Bio Research Scientist, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, 2003-2004 and 2006-2008
Visiting Scientist, in GMD FIRST (current Fraunhofer FIRST) in Berlin, Germany, 2000-2001
Ph. D., in Kyoto University, 1998
Abstract The scientific process of discovering new knowledge is often characterized as search from a space of candidates, and machine learning can accelerate the search by properly modelling the data and suggesting which candidates to apply experiments on. In many cases, experiments can be substituted by first principles calculation. I review two basic machine learning techniques called virtual screening and Bayesian optimization for fast discovery. The power of this approach is exemplified by two of our recent studies. One is discovery of compounds of low lattice thermal conductivity from the materials project database. The other is fast determination of atomic structure of a crystalline interface.
A Bit More of Network based ML Algorithms for Intra-relation, Integration, and Inter-relation of Multiple Data and Multiple Layers

Prof. Hyunjung Shin (

Ajou University

Short Bio Professor, Ajou university, 2006-present
Research Scientist, Friedrich-Miescher-Laboratory, Max-Planck-Institute, Germany, 2005.03-2006.03
Researcher, Max-Planck-Institute for Biological Cybernetics, Germany, 2004.03-2005.03
Ph.D, Seoul National University
Abstract A novel knowledge has been obtained mostly by identifying “intra-relation,” the relation between entities on a specific data layer, and many such machine learning (ML) researches have been well established. Nowadays, a number of heterogeneous types of data have become more available, and different spectrums on granulation of entity have formed multiple strata of data. The former can aid in extracting knowledge by drawing an “integrative” conclusion from many pieces of information collected from diverse data sources. Given multiple layers of data, in the meantime, the latter may lead to some hints that we can uncover an unknown knowledge through “inter-relation,” the relation between different layers: from a lower layer to a higher layer, and vice versa. Therefore, it is expected that the next attempt will be more focused on how to utilize information from integration and inter-relation. In this talk, the prototypes of ML research schemes for intra-relation, integration, and inter-relation will be discussed. The three schemes will be exemplified based on the pilot experimental results on diverse prediction problems of bio-molecular data, clinical data, historical literature data, signal data, and so on.
Regularized Optimal Transport and Applications

Prof. Marco Cuturi (

Kyoto University

Yamamoto Cuturi Lab

Graduate School of Informatics

Short Bio Associate professor, in the Yamamoto-Cuturi lab, 2013-present
Ph.D., in Ecole des Mines de Paris, 2005
M.S., in ENSAE
Abstract Optimal transport (OT) theory provides geometric tools to compare probability measures. After reviewing the basics of OT distances (a.k.a Wasserstein or Earth Mover’s), I will show how an adequate regularization of the OT problem can result in substantially faster (GPU parallel) and much better behaved (strongly convex) numerical computations. I will then show how this regularization can enable several applications of OT to learn from probability measures. I will focus on in particular on the computation of Wasserstein barycenters and inverse problems in the simplex with the OT geometry, such as regression (the latter part being joint work with G. Peyré and N. Bonneel).
Deep Weakly Supervised Learning in Computer Vision

Prof. Bohyung Han (


Department of Computer Science and Engineering

Short Bio Associate Professor in Dept. of Computer Science and Engineering, POSTECH, 2014-present
Assistant Professor in Dept. of Computer Science and Engineering, POSTECH, 2010-2013
Ph.D. in Dept. Computer Science, University of Maryland at College Park, 2005
M.S. in Dept. Computer Engineering, Seoul National University, 2000
B.S. in Dept. Computer Science, Seoul National University, 1997
Abstract The success of deep learning in computer vision is partly attributed to the construction of large-scale annotated datasets such as ImageNet. However, computer vision problems often require substantial amount of human efforts to obtain accurate annotations due to dynamic aspects of class labels, needs for pixel-level labeling, and annotation ambiguities. Hence, collecting high quality large-scale annotated datasets is very time consuming and even unrealistic. This talk mainly discusses several problems in computer vision including image classification, object detection, and semantic segmentation, which can derive benefit from weakly supervised learning based on convolutional neural networks.
Statistical Performance and Computational Efficiency of Parametric and Nonparametric Low Rank Tensor Estimators

Prof. Taiji Suzuki (

Tokyo Institute of Technology

Department of Mathematical and Computing Sciences

Graduate School of Information Science and Engineering

Short Bio Ph.D., in Dept. of Mathematical Informatics, The University of Tokyo, 2009
M.S., in Dept. of Mathematical Informatics, The University of Tokyo, 2006
B.S., in Dept. of Mathematical Engineering and Infomation Physics, The University of Tokyo, 2004
Abstract In this talk, we consider a problem of estimating a low rank tensor, and discuss statistical properties and computational efficiency of some estimators. Low rank tensor models have a wide range applications such as recommendation system, spatio-temporal data analysis, and multi-task learning. Several methods have been proposed for this problem. From a statistical view point, a Bayesian approach achieves the mini-max optimal predictive accuracy. On the other hand, a convex approach and an alternating minimization approach are computationally attractive ways. We discuss the trade-off between the statistical performances and the computational efficiency by showing some theoretical and numerical results.
Bayesian Reinforcement Learning with Behavioral Feedback

Prof. Kee-Eung Kim (


Short Bio Ph.D., in Dept. of Computer Science, Brown University, 2001
M.Sc., in Dept. of Computer Science, Brown University, 1998
B.S., in Dept. of Computer Science, KAIST, 1995
Abstract In the standard reinforcement learning setting, the agent is assumed to learn solely from state transitions and rewards from the environment. We consider the extended setting where there is a trainer providing behavioral feedback to the agent whether the executed action was desirable or not. The agent has access to additional information on how to act optimally, but now has to deal with noise in the feedback signal since it is not necessarily accurate. In this talk, I present a Bayesian approach to reinforcement learning with behavioral feedback. Specifically, we extend Kalman temporal difference learning to compute the posterior distribution over Q-values given the state transitions and rewards from the environment as well as the feedback signals from the trainer. I will show that the algorithm can significantly improve performance through experiments on standard reinforcement learning tasks.