﻿{"id":65,"date":"2017-07-25T00:00:00","date_gmt":"2018-05-23T09:55:10","guid":{"rendered":""},"modified":"2018-12-14T11:01:40","modified_gmt":"2018-12-14T03:01:40","slug":"chinese-lexical-analysis-using-hierarchical-hidden-markov-model","status":"publish","type":"post","link":"http:\/\/www.nlpir.org\/wordpress\/2017\/07\/25\/chinese-lexical-analysis-using-hierarchical-hidden-markov-model\/","title":{"rendered":"Chinese Lexical Analysis Using Hierarchical Hidden Markov Model"},"content":{"rendered":"<p><P><FONT face=\"Times New Roman\">Hua-Ping ZHANG, Qun LIU, Xue-Qi CHENG, Hao Zhang, Hong-Kui Yu. Chinese Lexical Analysis Using Hierarchical Hidden Markov Model, Second SIGHAN workshop affiliated with 41st ACL; <?XML:NAMESPACE PREFIX = ST1 \/><ST1:CITY>Sapporo<\/ST1:CITY> <ST1:PLACE><ST1:COUNTRY-REGION>Japan<\/ST1:COUNTRY-REGION><\/ST1:PLACE>, July, 2003, pp. 63-70<\/FONT><\/P><br \/>\n<P><A href=\"http:\/\/www.nlpir.org\/wordpress\/attachments\/2011\/04\/Chinese Lexical Analysis Using Hierarchical Hidden Markov Model.pdf\" target=_blank><IMG border=0 src=\"http:\/\/www.nlpir.org\/images\/base\/attachment.gif\"> Chinese Lexical Analysis Using Hierarchical Hidden Markov Model.pdf(213 KB)<\/A><\/P><br \/>\n<P><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US>This paper presents a<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US> unified<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US> approach for Chinese lexical analysis using hierarchical hidden Markov model (HHMM), which aims to incorporate Chinese word segmentation, Part-Of-Speech tagging, disambiguation and unknown words recognition into a whole theoretical frame. A class-based HMM is applied in word segmentation, and in this <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US>level<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US> unknown words are treated in the same way as common words listed in the lexicon. Unknown words are recognized with reliability in role-based HMM. As for disambiguation, the authors bring forth an n-shortest-path strategy that, in the early stage, reserves top N segmentation results as candidates and covers more ambiguity. Various experiments show that each level <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US>in<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US> HHMM contributes to lexical analysis. An HHMM-based system ICTCLAS was accomplished<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US>. T<\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US>he <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US>recent <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US>official evaluation indicates that ICTCLAS is one of the best Chinese lexical analyzers. In a word, HHMM is <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: \u5b8b\u4f53; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA\" lang=EN-US>effective to <\/SPAN><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-SIZE: 10pt; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA\" lang=EN-US>Chinese lexical analysis.<\/SPAN><\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hua-Ping ZHANG, Qun LIU, Xue-Qi CHENG, H &hellip; <a href=\"http:\/\/www.nlpir.org\/wordpress\/2017\/07\/25\/chinese-lexical-analysis-using-hierarchical-hidden-markov-model\/\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/65"}],"collection":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/comments?post=65"}],"version-history":[{"count":1,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/65\/revisions"}],"predecessor-version":[{"id":1519,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/65\/revisions\/1519"}],"wp:attachment":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/media?parent=65"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/categories?post=65"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/tags?post=65"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}