.. role:: python(code) :language: python 鑳屾櫙鐭ヨ瘑浠嬬粛 ======================================== 浠ヤ笅涓昏浠嬬粛鏂囨湰鎯呮劅鍒嗘瀽妯″瀷浠ュ強鐩稿叧鐨勮瘎娴嬬畻娉曠殑涓€浜涜儗鏅煡璇嗐€� 鍚嶈瘝瑙i噴 -------- 浣欏鸡璺濈 ~~~~~~~~~~~~~~~~~~~~~~~~ 瀵逛簬涓や釜鐩稿悓缁存暟鐨勯潪闆跺悜閲� :math:`\mathbf {A}` 涓� :math:`\mathbf {B}` 锛屼粬浠殑浣欏鸡鐩镐技搴﹀畾涔変负 .. math:: {\displaystyle {\operatorname{CosSim}(\mathbf {A}, \mathbf {B})}={\mathbf {A} \cdot \mathbf {B} \over \|\mathbf {A} \|\|\mathbf {B} \|}={\frac {\sum \limits_{i=1}^{n}{A_{i}B_{i}}}{{\sqrt {\sum \limits_{i=1}^{n}{A_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{B_{i}^{2}}}}}}} 鍒嗚瘝 ~~~~~~~~~~~~~~~~~~~~~~~~ 鍒嗚瘝鏄腑鏂囨枃鏈嚜鐒惰瑷€澶勭悊鐩歌緝浜庤嫳鏂囨枃鏈嚜鐒惰瑷€澶勭悊闇€瑕侀澶栬繘琛岀殑棰勫鐞嗗伐浣溿€傚洜涓烘病鏈夊儚绌烘牸閭f牱鐨勫垎闅旂锛屽綋闇€瑕佽繘琛岃瘝绾у埆鐨勬搷浣滅殑鏃跺€欙紝闇€瑕佸涓枃鏂囨湰杩涜鍒嗚瘝鐨勬搷浣滐紝渚嬪鈥滄垜鐖卞寳浜ぉ瀹夐棬鈥濆簲琚鍒嗚瘝鍣ㄥ垎涓衡€滄垜鈥濃€滅埍鈥濃€滃寳浜€濃€滃ぉ瀹夐棬鈥濄€� 鍋滅敤璇嶏紙stop words锛� ~~~~~~~~~~~~~~~~~~~~~~~~ 鍋滅敤璇嶄竴鑸寚鐨勬槸璇█涓嚭鐜伴鐜囨瀬楂樼殑璇嶏紝杩欎簺璇嶅寘鎷苟鏃犲疄闄呮剰涔夌殑鍔熻兘璇嶏紝渚嬪鈥滀簡鈥濓紝鈥滃憿鈥濓紱鎴栨槸娉涚敤鎬у緢寮虹殑璇嶏紝渚嬪鈥滅劧鍚庘€濄€傛枃鏈嚜鐒惰瑷€澶勭悊寰堝鏃跺€欓渶瑕佸湪棰勫鐞嗙殑鏃跺€欐妸杩欑被璇嶅幓鎺夈€� 璇嶆€э紙part-of-speech锛� ~~~~~~~~~~~~~~~~~~~~~~~~ 璇嶆€э紝鍙堢О璇嶇被鏄竴涓瑷€瀛︽湳璇紝鏄竴绉嶈瑷€涓瘝鐨勮娉曞垎绫伙紝鏄互璇硶鐗瑰緛锛堝寘鎷彞娉曞姛鑳藉拰褰㈡€佸彉鍖栵級涓轰富瑕佷緷鎹€佸吋椤捐瘝姹囨剰涔夊璇嶈繘琛屽垝鍒嗙殑缁撴灉锛屼緥濡傚悕璇嶏紝鍔ㄨ瘝锛屽舰瀹硅瘝锛屽壇璇嶇瓑銆傝繖浜涘垝鍒嗚繕鍙互杩涜杩涗竴姝ョ粏鍖栧垎绫伙紝渚嬪鍚嶈瘝鍙互杩涗竴姝ョ粏鍒嗕负鏅€氬悕璇嶏紝鏂逛綅鍚嶈瘝锛屽鎵€鍚嶈瘝锛屼汉鍚嶏紝鍦板悕绛夈€備互涓嬫槸姣旇緝鏈変唬琛ㄦ€х殑 :code:`python` 鍖� :code:`jieba` 鍦� :code:`paddle` 妯″紡涓嬬殑璇嶆€э紙POS锛夊拰涓撳悕绫诲埆锛圢ER, name entity recognition锛夋爣绛鹃泦鍚� .. csv-table:: python鍖卝ieba鍦╬addle妯″紡涓嬬殑璇嶆€у拰涓撳悕绫诲埆鏍囩闆嗗悎 :header: "鏍囩", "鍚箟", "鏍囩", "鍚箟", "鏍囩", "鍚箟", "鏍囩", "鍚箟" :widths: 10, 20, 10, 20, 10, 20, 10, 20 n, 鏅€氬悕璇�, f, 鏂逛綅鍚嶈瘝, s, 澶勬墍鍚嶈瘝, t, 鏃堕棿 nr, 浜哄悕, ns, 鍦板悕, nt, 鏈烘瀯鍚�, nw, 浣滃搧鍚� nz, 鍏朵粬涓撳悕, v, 鏅€氬姩璇�, vd, 鍔ㄥ壇璇�, vn, 鍚嶅姩璇� a, 褰㈠璇�, ad, 鍓舰璇�, an, 鍚嶅舰璇�, d, 鍓瘝 m, 鏁伴噺璇�, q, 閲忚瘝, r, 浠h瘝, p, 浠嬭瘝 c, 杩炶瘝, u, 鍔╄瘝, xc, 鍏朵粬铏氳瘝, w, 鏍囩偣绗﹀彿 PER, 浜哄悕, LOC, 鍦板悕, ORG, 鏈烘瀯鍚�, TIME, 鏃堕棿 鏂囨湰鏀诲嚮鏂规硶锛堥粦鐧界洅锛� ~~~~~~~~~~~~~~~~~~~~~~~~ 鐧界洅鏀诲嚮 ^^^^^^^^ 椤圭洰鐨勯儴鍒嗘敾鍑荤畻娉曟槸鐧界洅鐨勬敾鍑绘柟寮忥紝闇€瑕佸畬鏁寸殑鐭ラ亾model鐨勭粨鏋勫拰瀵瑰簲鐨勬搴︾瓑淇℃伅銆� 榛戠洅鏀诲嚮 ^^^^^^^^ 椤圭洰鐨勯儴鍒嗘敾鍑荤畻娉曟槸榛戠洅鐨勬敾鍑绘柟寮忥紝涓嶉渶瑕佸畬鏁寸殑鐭ラ亾model鐨勭粨鏋勫拰瀵瑰簲鐨勬搴︾瓑淇℃伅锛屽彧闇€瑕佺煡閬撶粡杩囪妯″瀷鐨勯娴嬬粨鏋滐紝鐢ㄤ簬浣滀负璇勬祴鐨勬暟鎹緭鍏ャ€� 鏂囨湰鏀诲嚮鏂规硶锛堢洰鏍�/闈炵洰鏍囷級 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 鐩爣鏀诲嚮 ^^^^^^^^ 鐩爣鏀诲嚮鏄寚灏嗗師濮嬫牱鏈€氳繃鏀诲嚮鍚庯紝鎸囧畾鏀诲嚮鍚庣殑缁撴灉绫诲埆 闈炵洰鏍囨敾鍑� ^^^^^^^^^^ 闈炵洰鏍囨敾鍑绘槸鎸囧皢鍘熷鏍锋湰閫氳繃鏀诲嚮鍚庯紝缁撴灉绫诲埆涓庡師鏈夌殑妯″瀷绫诲埆涓嶅悓鍗冲彲 鏂囨湰鏀诲嚮绮掑害 ---------------------- 瀛楃锛堝瓧姣嶏級绾у埆鏀诲嚮 ~~~~~~~~~~~~~~~~~~~~~~~~ 瀛楃锛堝瓧姣嶏級绾у埆鏀诲嚮鏂瑰紡鍖呮嫭鐩搁偦瀛楃浜ゆ崲锛屾坊鍔犮€佸垹闄ゃ€佹浛鎹㈠瓧绗︿互鍙婂鏌愪簺瀛楃杩涜閲嶅绛夋柟寮忋€� 鍗曡瘝绾у埆鏀诲嚮 ~~~~~~~~~~~~~~~~~~~~~~~~ 鍗曡瘝绾у埆鏀诲嚮鏂瑰紡鍖呮嫭鍚屼箟璇嶆浛鎹紝娣诲姞鍗曡瘝锛屽垹闄ゅ崟璇嶇瓑鏂瑰紡銆� 鍙ュ瓙绾у埆鏀诲嚮 ~~~~~~~~~~~~~~~~~~~~~~~~ 鍙ュ瓙绾у埆鏀诲嚮鏂瑰紡鍖呮嫭娣诲姞鍙ュ瓙锛岃皟鎹㈠彞瀛愮殑浣嶇疆锛屾垨鑰呮敼鍐欙紙paraphrasing锛夌瓑鏂瑰紡銆� 鏂囨湰鏀诲嚮闄愬埗鏉′欢 ------------------------- 鏂囨湰鐢变簬鍏剁壒娈婃€э紝鍦ㄥ妯″瀷杩涜鏀诲嚮锛岀敓鎴愬鎶楁牱鏈殑鏃跺€欙紝涓嶄粎浠呴渶瑕佹弧瓒虫壈鍔ㄨ冻澶熷皬锛屽嵆瀵瑰悜閲忓寲鐨勫師杈撳叆鏂囨湰 :math:`x` 涓庡鎶楁牱鏈� :math:`x'` 鏈� .. math:: \lVert x - x' \rVert \leqslant \epsilon 鍏朵腑 :math:`\epsilon` 鏄壈鍔ㄥぇ灏忥紝杩橀渶瑕佹弧瓒充竴浜涢澶栫殑鎷煎啓銆佽娉曘€佽涔夌瓑鏉′欢锛屼緥濡傦紙寰呯粏鍖栵級锛� 1. 璇嶆浛鎹㈡瘮渚嬶紙word modification rate锛� 2. 璇嶅祵鍏ュ悜閲忚窛绂伙紙word embedding distance锛� 3. 璇嶆€т竴鑷存€э紙part-of-speech consistency锛� 4. 缂栬緫璺濈锛坋dit (Levenshtein) distance锛� 5. 璇硶閿欒鏁帮紙涓€鑸彲浠ラ€氳繃璇硶妫€鏌ュ櫒杩涜璇硶閿欒妫€鏌ヤ笌璁℃暟锛� 6. 璇箟鐩镐技搴︼紙涓€鑸彲浠ラ€氳繃鍙ョ紪鐮佹ā鍨嬶紝渚嬪Universal Sentence Encoder锛屽悜閲忓寲涔嬪悗鐨勪綑寮﹁窛绂昏繘琛屽害閲忥級 鏂囨湰鏀诲嚮鏁堟灉璇勪环鏂规硶 --------------------- 闄や簡 **鏀诲嚮鎴愬姛鐜�** 锛坅ttack success rate锛夛紝浠ュ強 **鏀诲嚮鏁堢巼** 锛堝钩鍧囧崟娆℃敾鍑诲皾璇曟鏁颁互鍙婅姳璐规椂闂达級涔嬪锛屾枃鏈敾鍑婚檺鍒舵潯浠朵腑澶ч儴鍒嗛」鐩篃鑳戒綔涓烘枃鏈敾鍑绘晥鏋滆瘎浠风殑鏂规硶銆�