星の本棚

サイエンス、テクノロジー、エンジニア関連のノート（忘備録）です。

自然言語処理（NLP）

コンピューターサイエンス

自然言語処理 [NLP : natural language processing]

自然言語処理（NLP）に関してのマイノートです。
特に、ニューラルネットワーク、ディープラーニングによる自然言語処理（NLP）を重点的に取り扱っています。
今後も随時追加予定です。

尚、ニューラルネットワークに関しては、以下の記事に記載しています。

yagami12.hatenablog.com

又、より一般的な機械学習に関しては、以下の記事に記載しています。

yagami12.hatenablog.com

目次 [Contents]

自然言語処理（NLP）
参考文献

自然言語処理（NLP）

埋め込みベクトル [embedding vector] と埋め込み行列 [embedding matrix]

言語モデル [LM : Language model]

ニューラル言語モデル [NNLM : Neural network Language model ]

順伝播型ニューラル言語モデル（FFNN-LM）

再帰ニューラル言語モデル（RNN-LM）

分散表現 [distributional / distributed]

単語の分散表現 [disturibute representation]、単語埋め込み [Word Embeddings]

単語の分散表現、単語埋め込み [Word Embeddings] の具体的な獲得方法

ニューラル言語モデルを用いた分散表現の獲得方法

対数双線形モデルを用いた分散表現の獲得方法

word2vec ツール

CBOW [Countinuous Bag-of-Words] モデル

skip-gram モデル

参考サイト Word2Vec のニューラルネットワーク学習過程を理解する · けんごのお屋敷

負例サンプリング [Negative Sampling] による skip-gram モデルの学習の高速化

系列変換モデル [sequence-to-sequence / seq2seq]

モデルの構造（アーキテクチャ）[model architecture]

符号化器 - 埋め込み層 [encoder - embedding layer]

符号化器 - 再帰層 [encoder - recurrent layer]

復号化器 - 埋め込み層 [decoder - embedding layer]

復号化器 - 再帰層 [decoder - recurrent layer]

復号化器 - 出力層 [decoder - output layer]

seq2seq モデルの処理負荷

seq2seq モデルの学習方法

seq2seq モデルにおける系列生成方法

貪欲法 [greedy algorithm]

ビーム探索 [beam search]

参考サイト deepage.net

注意機構 [attention mechanism] / seq2seq model

ソフト注意機構 [soft attention mechanism]

seq2seq モデルでのソフト注意機構 [soft attention mechanism]

より一般的なモデルでのソフト注意機構 [soft attention mechanism]

ハード注意機構 [hard attention mechanism]

記載中...

記憶ネットワーク [MemN : memory networks]

参照サイト deeplearning.hatenablog.com

記憶ネットワークのアーキテクチャ [architecture]

教師あり記憶ネットワーク [supervised memory networks / strongly supervised memory networks]

I : 入力情報変換 [input feature map]

G : 一般化 [genelarization]

O : 出力情報変換 [output feature map]

end-to-end 記憶ネットワーク [end-to-end memory networks]

I : 入力情報変換 [input feature map]

G : 一般化 [genelarization]

O : 出力情報変換 [output feature map]

動的記憶ネットワーク [DMN : dynamic memory networks]

ニューラル言語モデル、seq2seq モデルの出力層の高速化手法
（クロス・エントロピー損失関数の勾配計算の効率化）

重点サンプリング [importance sampling]

雑音対照推定 [NCE : noise-contrastive estimation]

参考サイト qiita.com

負例サンプリング [negative sampling]

ブラックアウト [black-out]

階層的ソフトマックス [HSM : hierarchial softmax]

自然言語処理の応用タスク [application]

機械翻訳 [MT : machine translation]

GroundHog / RNNSearch

参考サイト（公式）
GitHub - lisa-groundhog/GroundHog: Library for implementing RNNs with Theanogithub.com

モントリオール大学からリリースされているニューラル翻訳（NMT）のツール。
GroundHog ツールのモデルの実装は、注意機構 [attention] 有りとなし両方の場合を含む。

OpenNMT / seq2seq-attn

参考サイト（公式）
OpenNMT - Open-Source Neural Machine Translationopennmt.net
GitHub（公式）TensorFlow での実装
github.com

ツールの使い方は、上記の参考サイトにて。
以下は、OpenNMT ツールで採用されているモデルのアーキテクチャの説明。

符号化器 [Encoder] / 2 層の Bidirectional-LSTM

符号化器 - 埋め込み層 [encoder - embedding layer]

符号化器 - 再帰層 [encoder - recurrent layer]

復号化器 [Decoder] / 2 層の LSTM + attention 層

復号化器 - 埋め込み層 [decoder - recurrent layer]

復号化器 - 再帰層 [decoder - recurrent layer]

復号化器 - 注意層 [decoder - attention layer]

復号化器 - 出力層 [decoder - output layer] / モデルの学習時の処理

復号化器 - 出力層 [decoder - output layer] / モデルの評価時の処理

機械翻訳タスクで共通の課題とその対策、改良

語彙数、未知語と入出力単位

過剰生成 [over-generation]、不足生成 [under-generation] と被覆 [coverage]

元論文「Modeling Coverage for Neural Machine Translation」
- arXiv.org : [1601.04811] Modeling Coverage for Neural Machine Translation

文書要約 [text summarization]

参考サイト qiita.com

見出し生成タスク [headline generation task] / 短文生成タスク

文書要約タスクでの Encoder-Decoder 方式 / attention 構造を用いるモデル

論文「A Neural Attention Model for Abstractive Sentence Summarization」/ Attention-Based Summarization (ABS)

元論文「A Neural Attention Model for Abstractive Sentence Summarization」
- ariXiv.org : [1509.00685] A Neural Attention Model for Abstractive Sentence Summarization
実装 github.com

順伝播ニューラル言語モデル側のアーキテクチャ

attention 構造付き Encoder 側のアーキテクチャ

RNN による ABS モデルの拡張、改良

元論文「Abstractive Sentence Summarization with Attentive Recurrent Neural Networks」
元論文「Sequence-to-sequence RNNs for text summarization」
元論文「Abstract text summarization using sequence-to-sequence RNNs and beyond」
- arXiv.org : [1602.06023] Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

対話システム [dialog system]

参考サイト
- Qiita : 機械学習を使って作る対話システム

対話モデル

対話システムにおける seq2seq モデルの適用

attention 構造を用いた、対話モデル

元論文「Neural Responding Machine for Short-Text Conversation」
- arXiv.org : [1503.02364] Neural Responding Machine for Short-Text Conversation

対話システムの特徴である話者交代と発話者を積極的にモデルに取り組んだ手法

元論文「Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models」
- arXiv.org : [1507.04808] Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
元論文「A Persona-Based Neural Conversation Model」
- arXiv.org : [1603.06155] A Persona-Based Neural Conversation Model
- GitHub : GitHub - jiweil/Neural-Dialogue-Generation
元論文「Addressee and Response Selection for Multi-Party Conversation」

論文「Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models」

論文「A Persona-Based Neural Conversation Model」

論文「Addressee and Response Selection for Multi-Party Conversation」

対話システムの自動評価

論文「The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems」

元論文「The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems」
- arXiv.org : [1506.08909] The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

質問応答 [QA : question answering]

回答選択タスク

回答選択問題の評価方法

end-to-end な質問応答

参考文献

深層学習 Deep Learning (監修:人工知能学会)

深層学習 Deep Learning (監修:人工知能学会)

作者: 麻生英樹,安田宗樹,前田新一,岡野原大輔,岡谷貴之,久保陽太郎,ボレガラダヌシカ,人工知能学会,神嶌敏弘
出版社/メーカー: 近代科学社
発売日: 2015/11/05
メディア: 単行本
この商品を含むブログ (2件) を見る

詳解ディープラーニング ~TensorFlow・Kerasによる時系列データ処理~

詳解ディープラーニング ~TensorFlow・Kerasによる時系列データ処理~

作者: 巣籠悠輔
出版社/メーカー: マイナビ出版
発売日: 2017/05/30
メディア: 単行本（ソフトカバー）
この商品を含むブログ (5件) を見る

深層学習による自然言語処理 (機械学習プロフェッショナルシリーズ)

深層学習による自然言語処理 (機械学習プロフェッショナルシリーズ)

作者: 坪井祐太,海野裕也,鈴木潤
出版社/メーカー: 講談社
発売日: 2017/05/25
メディア: 単行本（ソフトカバー）
この商品を含むブログ (1件) を見る

TensorFlow機械学習クックブック Pythonベースの活用レシピ60+ (impress top gear)

TensorFlow機械学習クックブック Pythonベースの活用レシピ60+ (impress top gear)

作者: Nick McClure,株式会社クイープ
出版社/メーカー: インプレス
発売日: 2017/08/14
メディア: 単行本（ソフトカバー）
この商品を含むブログ (1件) を見る