愛可可AI論文推介(10月12日)

LG - 機器學習 CV - 計算機視覺 CL - 計算與語言

1、[LG]Representational aspects of depth and conditioning in normalizing flows

F Koehler， V Mehta， A Risteski

［MIT & CMU］

標準化流(Normalizing Flow)訓練困難兩大根源——深度和條件性的表示面研究，重點研究了標準化流深度和條件性的表示面——

仿射

耦合(affine couplings)，對於一般的可逆結構，證明了可逆性是以深度為代價的，分割槽的選擇不會成為深度的瓶頸；證明了仿射耦合是通用逼近器——只要模型的雅可比矩陣被允許接近奇異。

Normalizing flows are among the most popular paradigms in generative modeling， especially for images， primarily because we can efficiently evaluate the likelihood of a data point。 Normalizing flows also come with difficulties： models which produce good samples typically need to be extremely deep —— which comes with accompanying vanishing/exploding gradient problems。 Relatedly， they are often poorly conditioned since typical training data like images intuitively are lower-dimensional， and the learned maps often have Jacobians that are close to being singular。 In our paper， we tackle representational aspects around depth and conditioning of normalizing flows —— both for general invertible architectures， and for a particular common architecture —— affine couplings。 For general invertible architectures， we prove that invertibility comes at a cost in terms of depth： we show examples where a much deeper normalizing flow model may need to be used to match the performance of a non-invertible generator。 For affine couplings， we first show that the choice of partitions isn‘t a likely bottleneck for depth： we show that any invertible linear map （and hence a permutation） can be simulated by a constant number of affine coupling layers， using a fixed partition。 This shows that the extra flexibility conferred by 1x1 convolution layers， as in GLOW， can in principle be simulated by increasing the size by a constant factor。 Next， in terms of conditioning， we show that affine couplings are universal approximators —— provided the Jacobian of the model is allowed to be close to singular。 We furthermore empirically explore the benefit of different kinds of padding —— a common strategy for improving conditioning —— on both synthetic and real-life datasets。

https：//weibo。com/1402400261/JoMMEnyEE

2、[CL]If beam search is the answer, what was the question?

C Meister， T Vieira， R Cotterell

［ETH Zurich & Johns Hopkins University & University of Cambridge］

集束搜尋(Beam Search)的認知歸納偏差，將集束搜尋作為不同解碼目標的明確解決方案，以深入瞭解為什麼僅在一個模型下的高機率可能無法表明其充分性。受認知科學啟發，發現

集束

搜尋加強了文字的統一資訊密度。設計了一組目標，明確鼓勵由神經機率模型生成文字具有統一資訊密度，發現有助於減輕通常隨著集束寬度增加而出現的質量下降。分析了使用各種譯碼策略生成的文字，並發現在神經網路機器翻譯實驗中，這一特性的遵守程度與BLEU密切相關。

Quite surprisingly， exact maximum a posteriori （MAP） decoding of neural language generators frequently leads to low-quality results。 Rather， most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate。 This implies that the MAP objective alone does not express the properties we desire in text， which merits the question： if beam search is the answer， what was the question？ We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy。 We find that beam search enforces uniform information density in text， a property motivated by cognitive science。 We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models。 Additionally， we analyze the text produced using various decoding strategies and see that， in our neural machine translation experiments， the extent to which this property is adhered to strongly correlates with BLEU。

https：//weibo。com/1402400261/JoMTaiKwO

3、[CV]Semi-Supervised Learning for Multi-Task Scene Understanding by Neural Graph Consensus

M Leordeanu， M Pirvu， D Costea， A Marcu， E Slusanschi， R Sukthankar

［University Politehnica of Bucharest］

神經網路圖共識(NGC)半監督學習多工場景理解，透過在神經網路圖尋找共識(consensus)來解決半監督學習問題，該模型將神經網路與離散圖結合，將多種深度網路組合成神經網路圖進行學習，由多條路徑相互共識實現半監督學習。

We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks。 Each graph node is a scene interpretation layer， while each edge is a deep net that transforms one layer at one node into another from a different node。 During the supervised phase edge networks are trained independently。 During the next unsupervised stage edge nets are trained on the pseudo-ground truth provided by consensus among multiple paths that reach the nets’ start and end nodes。 These paths act as ensemble teachers for any given edge and strong consensus is used for high-confidence supervisory signal。 The unsupervised learning process is repeated over several generations， in which each edge becomes a “student” and also part of different ensemble “teachers” for training other students。 By optimizing such consensus between different paths， the graph reaches consistency and robustness over multiple interpretations and generations， in the face of unknown labels。 We give theoretical justifications of the proposed idea and validate it on a large dataset。 We show how prediction of different representations such as depth， semantic segmentation， surface normals and pose from RGB input could be effectively learned through self-supervised consensus in our graph。 We also compare to state-of-the-art methods for multi-task and semi-supervised learning and show superior performance。

https：//weibo。com/1402400261/JoN0ZxPHt

4、[CL]MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

P Xu， M Patwary， M Shoeybi， R Puri， P Fung， A Anandkumar， B Catanzaro

［The Hong Kong University of Science and Technology & NVIDIA］

利用大規模語言模型結合外部知識實現可控的故事生成(MEGATRON-CNTRL)，提出一個新框架，透過結合外部知識增加文字生成的可控性。MEGATRON-CNTRL首先生成一組關鍵字和一個知識檢索器，從外部知識庫查詢與關鍵字相關的三元組。基於與故事上下文的相關性，用上下文知識排序器對檢索到的知識句子排序，將排名靠前的知識句子提供給條件生成器，並生成下故事句。在ROC故事資料集上的實驗結果顯示，相比之前的最先進模型，該模型可生成更少重複、更多樣化且邏輯一致的故事。

Existing pre-trained large language models have shown unparalleled generative capabilities。 However， they are not controllable。 In this paper， we propose MEGATRON-CNTRL， a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base。 Our framework consists of a keyword predictor， a knowledge retriever， a contextual knowledge ranker， and a conditional text generator。 As we do not have access to ground-truth supervision for the knowledge ranker， we make use of weak supervision from sentence embedding。 The empirical results show that our model generates more fluent， consistent， and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset。 We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process。 Human evaluation results show that 77。5% of these stories are successfully controlled by the new keywords。 Furthermore， by scaling our model from 124 million to 8。3 billion parameters we demonstrate that larger models improve both the quality of generation （from 74。5% to 93。0% for consistency） and controllability （from 77。5% to 91。5%）。

https：//weibo。com/1402400261/JoN6zvNvN

5、[LG]Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

X Wang， Y Wang， B Weng， A Vinel

［Auburn University & Verizon Media Group （Yahoo！） & Amazon。com Inc］

股市預測混合深度學習框架Stock2Vec，構建了一個混合深度學習模型來預測標普股票價格，用stock2Vec嵌入學習不同股票間的關係，用1-D擴充套件因果卷積層(TCN)從歷史資訊提取時間特徵，從整個市場的資料學習和應用實體嵌入的分類特性，透過有監督降維方式，提高預測效能。

We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market。 With representation learning， we derived an embedding called Stock2Vec， which gives us insight for the relationship among different stocks， while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series。 Evaluated on S&P 500， our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models。

https：//weibo。com/1402400261/JoNcxyGxf

別眨眼網

愛可可AI論文推介(10月12日)

相關推薦