12. Text Similarity

12.1 Overview

12.2 Paper

Siamese Recurrent Architectures for Learning Sentence Similarity - MIT2016

Code: https://github.com/LuJunru/Sentences_Pair_Similarity_Calculation_Siamese_LSTM (Keras)

Code: 基于Simaese LSTM的句子相似度计算 (Keras)

Code: https://github.com/eliorc/Medium/blob/master/MaLSTM.ipynb (Keras)

Article: How to predict Quora Question Pairs using Siamese Manhattan LSTM - 2017

Chinese: Siamese Recurrent Architectures for Learning Sentence Similarity
Learning Text Similarity with Siamese Recurrent Networks - Netherlands2016

Code:

https://github.com/likejazz/Siamese-LSTM (Keras)

https://github.com/eliorc/Medium/blob/master/MaLSTM.ipynb (Keras)

https://github.com/dhwajraj/deep-siamese-text-similarity (Tensorflow)

https://github.com/vishnumani2009/siamese-text-similarity (Tensorflow)

https://github.com/aditya1503/Siamese-LSTM (Theano)
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - Germany2019

详情参考：07-Pretrained_Model.md

12.3 Practice

【Great】https://github.com/RandolphVI/Text-Pairs-Relation-Classification (Tensorflow)

Text Pairs (Sentence Level) Classification (Similarity Modeling) Based on Neural Network

模型有：ABCNN, ANN, CNN, CRNN, FastText, HAN, RCNN, RNN, SANN

YAO:

模型很丰富，且具有结构可视化结果，待看……
https://github.com/Vincent131499/TextSim_cn_finetune (Tensorflow)

微调预训练语言模型(BERT、Roberta、XLBert等),用于计算两个中文文本之间的相似度（通过句子对分类任务转换）
https://github.com/yanqiangmiffy/sentence-similarity (Keras)

问题句子相似度计算，即给定客服里用户描述的两句话，用算法来判断是否表示了相同的语义。

YAO: 里面提到了5个文本相似度计算的比赛
https://github.com/liuhuanyong/SentenceSimilarity

基于同义词词林，知网，指纹，字词向量，向量空间模型的句子相似度计算
https://github.com/ashengtx/CilinSimilarity

Word similarity computation based on Tongyici Cilin
https://github.com/BiLiangLtd/WordSimilarity

基于哈工大同义词词林扩展版的单词相似度计算方法

Article: 基于同义词词林扩展版的词语相似度计算
https://github.com/PengboLiu/Doc2Vec-Document-Similarity

利用Doc2Vec计算文本相似度
https://github.com/cjymz886/sentence-similarity

对四种句子/文本相似度计算方法进行实验与比较: cosine, cosine+idf, bm25, jaccard
https://github.com/liuhuanyong/SiameseSentenceSimilarity

SiameseSentenceSimilarity,个人实现的基于Siamese bilstm模型的相似句子判定模型,提供训练数据集和测试数据集.
https://github.com/fssqawj/SentenceSim

中文短文句相似读, 2016年项目，比较传统，方法有：基于知网、onehot向量模型、基于Word2Vec、基于哈工大SDP、融合算法、LSTM

12.4 Competition

https://github.com/Leputa/CIKM-AnalytiCup-2018 (Tensorflow)

CIKM AnalytiCup 2018 – 阿里小蜜机器人跨语言短文本匹配算法竞赛 – Rank12方案

判断不同语言的两个问句语义是否相同。
https://github.com/ziweipolaris/atec2018-nlp (Keras, PyTorch)

ATEC2018 NLP赛题，判断两个问句是否意思相同
https://github.com/zake7749/CIKM-AnalytiCup-2018 (Tensorflow & Keras)

[ACM-CIKM] 2nd place solution at CIKM AnalytiCup 2018, a task for determining short text similarities

2018atec蚂蚁金服NLP智能客服比赛

给定客服里用户描述的两句话，判断问句相似度

https://github.com/zle1992/atec (Keras)

Rank 16/2631
https://github.com/Lapis-Hong/atec-nlp (PyTorch)

Kaggle: Quora Question Pairs

判断 whether question pairs are duplicates or not

https://github.com/HouJP/kaggle-quora-question-pairs (TextNet)

Rank 4

12.5 Traditional Method

12.5.1 Simhash

Article

simhash与重复信息识别 - 2011

liuyaox / roadmap_nlp

12. Text Similarity

12.1 Overview

Tool

Article

Library

12.2 Paper

12.3 Practice

12.4 Competition

12.5 Traditional Method

12.5.1 Simhash

Article

简介

发行版

贡献者

近期动态

liuyaox / roadmap_nlp .gitee-modal { width: 500px !important; }

12. Text Similarity

12.1 Overview

Tool

Article

Library

12.2 Paper

12.3 Practice

12.4 Competition

12.5 Traditional Method

12.5.1 Simhash

Article

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

liuyaox / roadmap_nlp