
LG - 機器學習 CV - 計算機視覺 CL - 計算與語言

1、[LG]Representational aspects of depth and conditioning in normalizing flows

F Koehler, V Mehta, A Risteski


標準化流(Normalizing Flow)訓練困難兩大根源——深度和條件性的表示面研究,重點研究了標準化流深度和條件性的表示面——


耦合(affine couplings),對於一般的可逆結構,證明了可逆性是以深度為代價的,分割槽的選擇不會成為深度的瓶頸;證明了仿射耦合是通用逼近器——只要模型的雅可比矩陣被允許接近奇異。

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point。 Normalizing flows also come with difficulties: models which produce good samples typically need to be extremely deep —— which comes with accompanying vanishing/exploding gradient problems。 Relatedly, they are often poorly conditioned since typical training data like images intuitively are lower-dimensional, and the learned maps often have Jacobians that are close to being singular。 In our paper, we tackle representational aspects around depth and conditioning of normalizing flows —— both for general invertible architectures, and for a particular common architecture —— affine couplings。 For general invertible architectures, we prove that invertibility comes at a cost in terms of depth: we show examples where a much deeper normalizing flow model may need to be used to match the performance of a non-invertible generator。 For affine couplings, we first show that the choice of partitions isn‘t a likely bottleneck for depth: we show that any invertible linear map (and hence a permutation) can be simulated by a constant number of affine coupling layers, using a fixed partition。 This shows that the extra flexibility conferred by 1x1 convolution layers, as in GLOW, can in principle be simulated by increasing the size by a constant factor。 Next, in terms of conditioning, we show that affine couplings are universal approximators —— provided the Jacobian of the model is allowed to be close to singular。 We furthermore empirically explore the benefit of different kinds of padding —— a common strategy for improving conditioning —— on both synthetic and real-life datasets。




2、[CL]If beam search is the answer, what was the question?

C Meister, T Vieira, R Cotterell

[ETH Zurich & Johns Hopkins University & University of Cambridge]

集束搜尋(Beam Search)的認知歸納偏差,將集束搜尋作為不同解碼目標的明確解決方案,以深入瞭解為什麼僅在一個模型下的高機率可能無法表明其充分性。受認知科學啟發,發現



Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results。 Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate。 This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy。 We find that beam search enforces uniform information density in text, a property motivated by cognitive science。 We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models。 Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU。



3、[CV]Semi-Supervised Learning for Multi-Task Scene Understanding by Neural Graph Consensus

M Leordeanu, M Pirvu, D Costea, A Marcu, E Slusanschi, R Sukthankar

[University Politehnica of Bucharest]


We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks。 Each graph node is a scene interpretation layer, while each edge is a deep net that transforms one layer at one node into another from a different node。 During the supervised phase edge networks are trained independently。 During the next unsupervised stage edge nets are trained on the pseudo-ground truth provided by consensus among multiple paths that reach the nets’ start and end nodes。 These paths act as ensemble teachers for any given edge and strong consensus is used for high-confidence supervisory signal。 The unsupervised learning process is repeated over several generations, in which each edge becomes a “student” and also part of different ensemble “teachers” for training other students。 By optimizing such consensus between different paths, the graph reaches consistency and robustness over multiple interpretations and generations, in the face of unknown labels。 We give theoretical justifications of the proposed idea and validate it on a large dataset。 We show how prediction of different representations such as depth, semantic segmentation, surface normals and pose from RGB input could be effectively learned through self-supervised consensus in our graph。 We also compare to state-of-the-art methods for multi-task and semi-supervised learning and show superior performance。





4、[CL]MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

P Xu, M Patwary, M Shoeybi, R Puri, P Fung, A Anandkumar, B Catanzaro

[The Hong Kong University of Science and Technology & NVIDIA]


Existing pre-trained large language models have shown unparalleled generative capabilities。 However, they are not controllable。 In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base。 Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator。 As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding。 The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset。 We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process。 Human evaluation results show that 77。5% of these stories are successfully controlled by the new keywords。 Furthermore, by scaling our model from 124 million to 8。3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74。5% to 93。0% for consistency) and controllability (from 77。5% to 91。5%)。



5、[LG]Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

X Wang, Y Wang, B Weng, A Vinel

[Auburn University & Verizon Media Group (Yahoo!) & Amazon。com Inc]


We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market。 With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series。 Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models。



