프로그램
AI and Brain Science | |
Keynote Speaker |
Prof. Shunichi Amari (http://www.brain.riken.go.jp/labs/mns/amari/home-E.html) RIKEN Brain Science Institute |
Short Bio | Ph.D. in Dept. of Mathematical Engineering, The University of Tokyo, 1963 MS. in Dept. of Mathematical Engineering, The University of Tokyo, 1960 BS. in Dept. of Mathematical Engineering, The University of Tokyo, 1958 |
Abstract | After short historical notes on both AI and neural network research, I will discuss two mathematical problems related to feature extraction by self-organization and singularities in dynamics of stochastic descent supervised learning, both of which are involved in deep learning, from the viewpoint of information geometry. I will then talk about the future perspective of the interactions between AI and brain science, focusing on what AI learns from brain science and what brain science learns from AI research. |
Bayesian Machine Learning: Two Examples from Our Recent Work | |
Speaker |
Prof. Seungjin Choi (http://mlg.postech.ac.kr/~seungjin/) POSTECH Department of Computer Science and Engineering |
Short Bio | Professor, in Dept. of Computer Science and Engineering, POSTECH, 2001-Present Ph.D. in Dept. of Electrical Engineering, University of Notre Dame, Indiana, USA, 1996 MS. in Dept. of Electrical Engineering, Seoul National University, 1989 BS. in Dept. of Electrical Engineering, Seoul National University, 1987 |
Abstract | Suppose that you are given a statistical model which you wish to use to make predictions. To be Bayesian, you compute the posterior distribution over parameters, rather than seeking point estimates, in order to compute the predictive distribution of unseen examples. For past decades, Bayesian models and inference have been served as a big hammer in machine learning, leading to Bayesian machine learning. In this talk, I will introduce two recent work, emphasizing the benefit of Bayesian model comparison and the role of Bayesian nonparametric priors. The first half of this talk is devoted to show that Bayesian model comparison allows us to choose a subset of pre-trained deep representations expected to yield the best performance for classification, of the current image data provided to us. In the second half, I introduce Bayesian hierarchical clustering with exponential family, where a Bayesian model is used for the agglomerative hierarchical clustering. |
Machine Learning from Weak Supervision | |
Speaker |
Masashi Sugiyama (http://www.ms.k.u-tokyo.ac.jp/sugi/) The University of Tokyo Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, |
Short Bio | Ph. D., in Dept. of Computer Science, Tokyo Institute of Technology, 2001 M.S., in Dept. of Computer Science, Tokyo Institute of Technology, 1999 B.S., in Dept. of Computer Science, Tokyo Institute of Technology, 1997 |
Abstract | Machine learning from big training data is making a great success in various real-world applications such as speech, image, and natural language processing. However, there are various application domains that prohibit the use of massive labeled data. In this talk, I will introduce our recent advances in machine learning from weak supervision, including unsupervised classification and classification only from positive and unlabeled data. |
The Relational Automatic Statistician System for Multiple Time-Series Data Analysis | |
Speaker |
Prof. Jaesik Choi (http://pail.unist.ac.kr/) UNIST |
Short Bio | Assistant Professor, School of Electrical and Computer Engineering, UNIST, 2013-present Affiliate Research, in Computational Research Division, Lawrence Berkeley National Lab, 2013-present Postdoc Fellow, in Computational Research Division, Lawrence Berkeley National Lab, 2013 Ph.D., in Dept. of Computer Science, University of Illinois at Urbana-Champaign, 2012 B.S., in Dept. of Computer Engineering, Seoul National University, 2004 |
Abstract | Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data. |
Safe Feature/Sample Screening and its Applications to High-order Interaction Modeling and Quick Sensitivity Analysis | |
Speaker |
Prof. Ichiro Takeuchi (http://www-als.ics.nitech.ac.jp/~takeuchi/) Nagoya Institute of Technology Department of Computer Science/Scientific and Engineering Simulation |
Short Bio | Ph. D., in Dept. of Electrical Engineering, Japan, 2000 M.S., in Dept. of Information and Electronics Engineering, Japan, 1998 B.S., Nagoya University, Japan, 1996 |
Abstract | Sparse modeling is one of the key techniques for working with large scale data both in size and dimension. For example, the Lasso is designed for promoting feature sparsity, while the SVM exploits sample sparsity. The main computational bottleneck in optimizing these sparse models is in identifying active components (e.g., non-zero coefficients in Lasso or support vectors in SVM). Recently, El Ghaoui et al. introduced a promising approach called safe feature screening (SFS) that allows us to find a subset of non-active components without solving the optimization problem. SFS is useful for high-dimensional problems because we can discard a subset of features that are identified as non-active before solving the optimization problem without any risk of falsely removing out important features. In this talk, we present our three recent studies on extending the idea of SFS in several ways. First, we generalize the SFS approach for handling sample sparsity in the context of SVM (ICML2013). Second, we extend the approach for sparse high-order interaction modeling where the model has exponentially increasing number of high-order interaction features. Finally, we demonstrate that the main algorithmic idea of SFS can be used not only for sparse modeling but also for a variety of other machine learning tasks by applying it to quick sensitivity analysis on data perturbation (KDD2015). |
Machine Learning for Materials Discovery: Virtual screening and Bayesian optimization | |
Speaker |
Prof. Koji Tsuda (http://tsudalab.org/en/member/koji_tsuda/) The University of Tokyo |
Short Bio | Research Scientist, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, 2003-2004 and 2006-2008 Visiting Scientist, in GMD FIRST (current Fraunhofer FIRST) in Berlin, Germany, 2000-2001 Ph. D., in Kyoto University, 1998 |
Abstract | The scientific process of discovering new knowledge is often characterized as search from a space of candidates, and machine learning can accelerate the search by properly modelling the data and suggesting which candidates to apply experiments on. In many cases, experiments can be substituted by first principles calculation. I review two basic machine learning techniques called virtual screening and Bayesian optimization for fast discovery. The power of this approach is exemplified by two of our recent studies. One is discovery of compounds of low lattice thermal conductivity from the materials project database. The other is fast determination of atomic structure of a crystalline interface. |
A Bit More of Network based ML Algorithms for Intra-relation, Integration, and Inter-relation of Multiple Data and Multiple Layers | |
Speaker |
Prof. Hyunjung Shin (http://www.alphaminers.net/) Ajou University |
Short Bio | Professor, Ajou university, 2006-present Research Scientist, Friedrich-Miescher-Laboratory, Max-Planck-Institute, Germany, 2005.03-2006.03 Researcher, Max-Planck-Institute for Biological Cybernetics, Germany, 2004.03-2005.03 Ph.D, Seoul National University |
Abstract | A novel knowledge has been obtained mostly by identifying “intra-relation,” the relation between entities on a specific data layer, and many such machine learning (ML) researches have been well established. Nowadays, a number of heterogeneous types of data have become more available, and different spectrums on granulation of entity have formed multiple strata of data. The former can aid in extracting knowledge by drawing an “integrative” conclusion from many pieces of information collected from diverse data sources. Given multiple layers of data, in the meantime, the latter may lead to some hints that we can uncover an unknown knowledge through “inter-relation,” the relation between different layers: from a lower layer to a higher layer, and vice versa. Therefore, it is expected that the next attempt will be more focused on how to utilize information from integration and inter-relation. In this talk, the prototypes of ML research schemes for intra-relation, integration, and inter-relation will be discussed. The three schemes will be exemplified based on the pilot experimental results on diverse prediction problems of bio-molecular data, clinical data, historical literature data, signal data, and so on. |
Regularized Optimal Transport and Applications | |
Speaker |
Prof. Marco Cuturi (http://www.iip.ist.i.kyoto-u.ac.jp/member/cuturi/) Kyoto University Yamamoto Cuturi Lab Graduate School of Informatics |
Short Bio | Associate professor, in the Yamamoto-Cuturi lab, 2013-present Ph.D., in Ecole des Mines de Paris, 2005 M.S., in ENSAE |
Abstract | Optimal transport (OT) theory provides geometric tools to compare probability measures. After reviewing the basics of OT distances (a.k.a Wasserstein or Earth Mover’s), I will show how an adequate regularization of the OT problem can result in substantially faster (GPU parallel) and much better behaved (strongly convex) numerical computations. I will then show how this regularization can enable several applications of OT to learn from probability measures. I will focus on in particular on the computation of Wasserstein barycenters and inverse problems in the simplex with the OT geometry, such as regression (the latter part being joint work with G. Peyré and N. Bonneel). |
Deep Weakly Supervised Learning in Computer Vision | |
Speaker |
Prof. Bohyung Han (http://cvlab.postech.ac.kr/~bhhan/) POSTECH Department of Computer Science and Engineering |
Short Bio | Associate Professor in Dept. of Computer Science and Engineering, POSTECH, 2014-present Assistant Professor in Dept. of Computer Science and Engineering, POSTECH, 2010-2013 Ph.D. in Dept. Computer Science, University of Maryland at College Park, 2005 M.S. in Dept. Computer Engineering, Seoul National University, 2000 B.S. in Dept. Computer Science, Seoul National University, 1997 |
Abstract | The success of deep learning in computer vision is partly attributed to the construction of large-scale annotated datasets such as ImageNet. However, computer vision problems often require substantial amount of human efforts to obtain accurate annotations due to dynamic aspects of class labels, needs for pixel-level labeling, and annotation ambiguities. Hence, collecting high quality large-scale annotated datasets is very time consuming and even unrealistic. This talk mainly discusses several problems in computer vision including image classification, object detection, and semantic segmentation, which can derive benefit from weakly supervised learning based on convolutional neural networks. |
Statistical Performance and Computational Efficiency of Parametric and Nonparametric Low Rank Tensor Estimators | |
Speaker |
Prof. Taiji Suzuki (http://www.is.titech.ac.jp/~s-taiji/) Tokyo Institute of Technology Department of Mathematical and Computing Sciences Graduate School of Information Science and Engineering |
Short Bio | Ph.D., in Dept. of Mathematical Informatics, The University of Tokyo, 2009 M.S., in Dept. of Mathematical Informatics, The University of Tokyo, 2006 B.S., in Dept. of Mathematical Engineering and Infomation Physics, The University of Tokyo, 2004 |
Abstract | In this talk, we consider a problem of estimating a low rank tensor, and discuss statistical properties and computational efficiency of some estimators. Low rank tensor models have a wide range applications such as recommendation system, spatio-temporal data analysis, and multi-task learning. Several methods have been proposed for this problem. From a statistical view point, a Bayesian approach achieves the mini-max optimal predictive accuracy. On the other hand, a convex approach and an alternating minimization approach are computationally attractive ways. We discuss the trade-off between the statistical performances and the computational efficiency by showing some theoretical and numerical results. |
Bayesian Reinforcement Learning with Behavioral Feedback | |
Speaker |
Prof. Kee-Eung Kim (http://ailab.kaist.ac.kr/users/kekim) KAIST |
Short Bio | Ph.D., in Dept. of Computer Science, Brown University, 2001 M.Sc., in Dept. of Computer Science, Brown University, 1998 B.S., in Dept. of Computer Science, KAIST, 1995 |
Abstract | In the standard reinforcement learning setting, the agent is assumed to learn solely from state transitions and rewards from the environment. We consider the extended setting where there is a trainer providing behavioral feedback to the agent whether the executed action was desirable or not. The agent has access to additional information on how to act optimally, but now has to deal with noise in the feedback signal since it is not necessarily accurate. In this talk, I present a Bayesian approach to reinforcement learning with behavioral feedback. Specifically, we extend Kalman temporal difference learning to compute the posterior distribution over Q-values given the state transitions and rewards from the environment as well as the feedback signals from the trainer. I will show that the algorithm can significantly improve performance through experiments on standard reinforcement learning tasks. |