.. role:: python(code)
   :language: python

鑳屾櫙鐭ヨ瘑浠嬬粛
========================================

浠ヤ笅涓昏浠嬬粛鏂囨湰鎯呮劅鍒嗘瀽妯″瀷浠ュ強鐩稿叧鐨勮瘎娴嬬畻娉曠殑涓€浜涜儗鏅煡璇嗐€�


鍚嶈瘝瑙i噴
--------

浣欏鸡璺濈
~~~~~~~~~~~~~~~~~~~~~~~~

瀵逛簬涓や釜鐩稿悓缁存暟鐨勯潪闆跺悜閲� :math:`\mathbf {A}` 涓� :math:`\mathbf {B}` 锛屼粬浠殑浣欏鸡鐩镐技搴﹀畾涔変负

   .. math::
      {\displaystyle {\operatorname{CosSim}(\mathbf {A}, \mathbf {B})}={\mathbf {A} \cdot \mathbf {B}  \over \|\mathbf {A} \|\|\mathbf {B} \|}={\frac {\sum \limits_{i=1}^{n}{A_{i}B_{i}}}{{\sqrt {\sum \limits_{i=1}^{n}{A_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{B_{i}^{2}}}}}}}


鍒嗚瘝
~~~~~~~~~~~~~~~~~~~~~~~~
鍒嗚瘝鏄腑鏂囨枃鏈嚜鐒惰瑷€澶勭悊鐩歌緝浜庤嫳鏂囨枃鏈嚜鐒惰瑷€澶勭悊闇€瑕侀澶栬繘琛岀殑棰勫鐞嗗伐浣溿€傚洜涓烘病鏈夊儚绌烘牸閭f牱鐨勫垎闅旂锛屽綋闇€瑕佽繘琛岃瘝绾у埆鐨勬搷浣滅殑鏃跺€欙紝闇€瑕佸涓枃鏂囨湰杩涜鍒嗚瘝鐨勬搷浣滐紝渚嬪鈥滄垜鐖卞寳浜ぉ瀹夐棬鈥濆簲琚鍒嗚瘝鍣ㄥ垎涓衡€滄垜鈥濃€滅埍鈥濃€滃寳浜€濃€滃ぉ瀹夐棬鈥濄€�


鍋滅敤璇嶏紙stop words锛�
~~~~~~~~~~~~~~~~~~~~~~~~
鍋滅敤璇嶄竴鑸寚鐨勬槸璇█涓嚭鐜伴鐜囨瀬楂樼殑璇嶏紝杩欎簺璇嶅寘鎷苟鏃犲疄闄呮剰涔夌殑鍔熻兘璇嶏紝渚嬪鈥滀簡鈥濓紝鈥滃憿鈥濓紱鎴栨槸娉涚敤鎬у緢寮虹殑璇嶏紝渚嬪鈥滅劧鍚庘€濄€傛枃鏈嚜鐒惰瑷€澶勭悊寰堝鏃跺€欓渶瑕佸湪棰勫鐞嗙殑鏃跺€欐妸杩欑被璇嶅幓鎺夈€�


璇嶆€э紙part-of-speech锛�
~~~~~~~~~~~~~~~~~~~~~~~~
璇嶆€э紝鍙堢О璇嶇被鏄竴涓瑷€瀛︽湳璇紝鏄竴绉嶈瑷€涓瘝鐨勮娉曞垎绫伙紝鏄互璇硶鐗瑰緛锛堝寘鎷彞娉曞姛鑳藉拰褰㈡€佸彉鍖栵級涓轰富瑕佷緷鎹€佸吋椤捐瘝姹囨剰涔夊璇嶈繘琛屽垝鍒嗙殑缁撴灉锛屼緥濡傚悕璇嶏紝鍔ㄨ瘝锛屽舰瀹硅瘝锛屽壇璇嶇瓑銆傝繖浜涘垝鍒嗚繕鍙互杩涜杩涗竴姝ョ粏鍖栧垎绫伙紝渚嬪鍚嶈瘝鍙互杩涗竴姝ョ粏鍒嗕负鏅€氬悕璇嶏紝鏂逛綅鍚嶈瘝锛屽鎵€鍚嶈瘝锛屼汉鍚嶏紝鍦板悕绛夈€備互涓嬫槸姣旇緝鏈変唬琛ㄦ€х殑 :code:`python` 鍖� :code:`jieba` 鍦� :code:`paddle` 妯″紡涓嬬殑璇嶆€э紙POS锛夊拰涓撳悕绫诲埆锛圢ER, name entity recognition锛夋爣绛鹃泦鍚�

.. csv-table:: python鍖卝ieba鍦╬addle妯″紡涓嬬殑璇嶆€у拰涓撳悕绫诲埆鏍囩闆嗗悎
   :header: "鏍囩", "鍚箟", "鏍囩", "鍚箟", "鏍囩", "鍚箟", "鏍囩", "鍚箟"
   :widths: 10, 20, 10, 20, 10, 20, 10, 20

   n, 鏅€氬悕璇�, f, 鏂逛綅鍚嶈瘝, s, 澶勬墍鍚嶈瘝, t, 鏃堕棿
   nr, 浜哄悕, ns, 鍦板悕, nt, 鏈烘瀯鍚�, nw, 浣滃搧鍚�
   nz, 鍏朵粬涓撳悕, v, 鏅€氬姩璇�, vd, 鍔ㄥ壇璇�, vn, 鍚嶅姩璇�
   a, 褰㈠璇�, ad, 鍓舰璇�, an, 鍚嶅舰璇�, d, 鍓瘝
   m, 鏁伴噺璇�, q, 閲忚瘝, r, 浠h瘝, p, 浠嬭瘝
   c, 杩炶瘝, u, 鍔╄瘝, xc, 鍏朵粬铏氳瘝, w, 鏍囩偣绗﹀彿
   PER, 浜哄悕, LOC, 鍦板悕, ORG, 鏈烘瀯鍚�, TIME, 鏃堕棿


鏂囨湰鏀诲嚮鏂规硶锛堥粦鐧界洅锛�
~~~~~~~~~~~~~~~~~~~~~~~~

鐧界洅鏀诲嚮
^^^^^^^^

椤圭洰鐨勯儴鍒嗘敾鍑荤畻娉曟槸鐧界洅鐨勬敾鍑绘柟寮忥紝闇€瑕佸畬鏁寸殑鐭ラ亾model鐨勭粨鏋勫拰瀵瑰簲鐨勬搴︾瓑淇℃伅銆�

榛戠洅鏀诲嚮
^^^^^^^^

椤圭洰鐨勯儴鍒嗘敾鍑荤畻娉曟槸榛戠洅鐨勬敾鍑绘柟寮忥紝涓嶉渶瑕佸畬鏁寸殑鐭ラ亾model鐨勭粨鏋勫拰瀵瑰簲鐨勬搴︾瓑淇℃伅锛屽彧闇€瑕佺煡閬撶粡杩囪妯″瀷鐨勯娴嬬粨鏋滐紝鐢ㄤ簬浣滀负璇勬祴鐨勬暟鎹緭鍏ャ€�


鏂囨湰鏀诲嚮鏂规硶锛堢洰鏍�/闈炵洰鏍囷級
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

鐩爣鏀诲嚮
^^^^^^^^

鐩爣鏀诲嚮鏄寚灏嗗師濮嬫牱鏈€氳繃鏀诲嚮鍚庯紝鎸囧畾鏀诲嚮鍚庣殑缁撴灉绫诲埆

闈炵洰鏍囨敾鍑�
^^^^^^^^^^

闈炵洰鏍囨敾鍑绘槸鎸囧皢鍘熷鏍锋湰閫氳繃鏀诲嚮鍚庯紝缁撴灉绫诲埆涓庡師鏈夌殑妯″瀷绫诲埆涓嶅悓鍗冲彲


鏂囨湰鏀诲嚮绮掑害
----------------------

瀛楃锛堝瓧姣嶏級绾у埆鏀诲嚮
~~~~~~~~~~~~~~~~~~~~~~~~
瀛楃锛堝瓧姣嶏級绾у埆鏀诲嚮鏂瑰紡鍖呮嫭鐩搁偦瀛楃浜ゆ崲锛屾坊鍔犮€佸垹闄ゃ€佹浛鎹㈠瓧绗︿互鍙婂鏌愪簺瀛楃杩涜閲嶅绛夋柟寮忋€�


鍗曡瘝绾у埆鏀诲嚮
~~~~~~~~~~~~~~~~~~~~~~~~
鍗曡瘝绾у埆鏀诲嚮鏂瑰紡鍖呮嫭鍚屼箟璇嶆浛鎹紝娣诲姞鍗曡瘝锛屽垹闄ゅ崟璇嶇瓑鏂瑰紡銆�


鍙ュ瓙绾у埆鏀诲嚮
~~~~~~~~~~~~~~~~~~~~~~~~
鍙ュ瓙绾у埆鏀诲嚮鏂瑰紡鍖呮嫭娣诲姞鍙ュ瓙锛岃皟鎹㈠彞瀛愮殑浣嶇疆锛屾垨鑰呮敼鍐欙紙paraphrasing锛夌瓑鏂瑰紡銆�


鏂囨湰鏀诲嚮闄愬埗鏉′欢
-------------------------
鏂囨湰鐢变簬鍏剁壒娈婃€э紝鍦ㄥ妯″瀷杩涜鏀诲嚮锛岀敓鎴愬鎶楁牱鏈殑鏃跺€欙紝涓嶄粎浠呴渶瑕佹弧瓒虫壈鍔ㄨ冻澶熷皬锛屽嵆瀵瑰悜閲忓寲鐨勫師杈撳叆鏂囨湰 :math:`x` 涓庡鎶楁牱鏈� :math:`x'` 鏈�

   .. math::
      \lVert x - x' \rVert \leqslant \epsilon

鍏朵腑 :math:`\epsilon` 鏄壈鍔ㄥぇ灏忥紝杩橀渶瑕佹弧瓒充竴浜涢澶栫殑鎷煎啓銆佽娉曘€佽涔夌瓑鏉′欢锛屼緥濡傦紙寰呯粏鍖栵級锛�

1. 璇嶆浛鎹㈡瘮渚嬶紙word modification rate锛�
2. 璇嶅祵鍏ュ悜閲忚窛绂伙紙word embedding distance锛�
3. 璇嶆€т竴鑷存€э紙part-of-speech consistency锛�
4. 缂栬緫璺濈锛坋dit (Levenshtein) distance锛�
5. 璇硶閿欒鏁帮紙涓€鑸彲浠ラ€氳繃璇硶妫€鏌ュ櫒杩涜璇硶閿欒妫€鏌ヤ笌璁℃暟锛�
6. 璇箟鐩镐技搴︼紙涓€鑸彲浠ラ€氳繃鍙ョ紪鐮佹ā鍨嬶紝渚嬪Universal Sentence Encoder锛屽悜閲忓寲涔嬪悗鐨勪綑寮﹁窛绂昏繘琛屽害閲忥級


鏂囨湰鏀诲嚮鏁堟灉璇勪环鏂规硶
---------------------
闄や簡 **鏀诲嚮鎴愬姛鐜�** 锛坅ttack success rate锛夛紝浠ュ強 **鏀诲嚮鏁堢巼** 锛堝钩鍧囧崟娆℃敾鍑诲皾璇曟鏁颁互鍙婅姳璐规椂闂达級涔嬪锛屾枃鏈敾鍑婚檺鍒舵潯浠朵腑澶ч儴鍒嗛」鐩篃鑳戒綔涓烘枃鏈敾鍑绘晥鏋滆瘎浠风殑鏂规硶銆�