﻿{"id":6653,"date":"2019-01-15T21:06:43","date_gmt":"2019-01-15T13:06:43","guid":{"rendered":"http:\/\/www.nlpir.org\/wordpress\/?p=6653"},"modified":"2019-03-03T21:06:43","modified_gmt":"2019-03-03T13:06:43","slug":"research-and-implementation-of-chinese-text-automatic-proofreading-system","status":"publish","type":"post","link":"http:\/\/www.nlpir.org\/wordpress\/2019\/01\/15\/research-and-implementation-of-chinese-text-automatic-proofreading-system\/","title":{"rendered":"Research and Implementation of Chinese Text Automatic Proofreading System"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" style=\"text-align:center\" id=\"mce_0\">NLPIR SEMINAR Y2019#3<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"mce_1\"><strong>INTRO\ufeff<\/strong><\/h3>\n\n\n\n<p> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; In the new semester, our Lab, Web Search Mining and Security Lab, plans to hold an academic seminar every Wednesdays, and each time a keynote speaker will share understanding of papers published in recent years with you. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"mce_3\"><strong>Arrangement<\/strong><\/h3>\n\n\n\n<p> This week&#8217;s seminar is organized as follows:<br>1. The seminar time is <strong>1.pm, Wed.<\/strong>, at Zhongguancun Technology Park ,Building 5, 1306. <br>2. The lecturer is <strong>Jinjing Wan<\/strong>, the paper&#8217;s title is <strong>Research and Implementation of Chinese Text Automatic Proofreading System<\/strong>. <br>3. The seminar will be hosted by  WangGang. <br>4. Attachment is the paper of this seminar, please download in advance.  <br><br><\/p>\n\n\n\n<p> Anyone interested in this topic is welcomed to join us. the following is the abstract for this week\u2019s paper.<\/p>\n\n\n\n<p>\n\t<div style=\"border:dotted windowtext 1.0pt;padding:1.0pt 4.0pt 1.0pt 4.0pt;\">\n\t\t<p class=\"MsoNormal\" align=\"center\" style=\"text-align:center;\">\n\t\t\tResearch and Implementation of Chinese Text Automatic Proofreading\nSystem\n\t\t<\/p>\n\t\t<p class=\"MsoNormal\" align=\"center\" style=\"text-align:center;\">\n\t\t\tYonggang Gong,\nJunying Fu, Xiaoqin Lian and Yuying Li\n\t\t<\/p>\n\t\t<p class=\"MsoNormal\" align=\"center\" style=\"text-align:center;\">\n\t\t\tAbstract\n\t\t<\/p>\n\t\t<p class=\"MsoNormal\">\n\t\t\t<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The news media platform has a huge amount\nof original news releases every day, it is impractical to use manual review of\ntext typos. This paper designed and implemented a Chinese text automatic\nproofreading system for large-scale text content and high-speed processing. The\nproofreading content is first analyzed and classified: typos and sensitive\ninformation. Firstly, the system used the n-gram model to statistically analyze\nthe corpus after segmentation to form a 2-gram model library and a contextual\ncontext library; secondly, builded a typo confusion set, and then calculated\nthe probability of the target word in the knowledge base to realize automatic\nerror detection and correction of&nbsp;\nChinese&nbsp; text. The system has been\nsuccessfully applied to the error of the content of many government news media\nplatforms, each server can handle one million articles every day. The results\nshow that the recall rate of the article is 78.9% and the accuracy rate is\n85.1%. It meets the demand of high&nbsp;\nspeed&nbsp; and&nbsp; accurate processing of massive text error,\nand has important practical significance and application fields.<\/span>\n\t\t<\/p>\n\t<\/div>\n<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"814\" height=\"673\" src=\"http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/2019115-\u4e0b\u5348-092235.jpg\" alt=\"\" class=\"wp-image-6661\" srcset=\"http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/2019115-\u4e0b\u5348-092235.jpg 814w, http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/2019115-\u4e0b\u5348-092235-300x248.jpg 300w, http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/2019115-\u4e0b\u5348-092235-768x635.jpg 768w, http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/2019115-\u4e0b\u5348-092235-80x66.jpg 80w\" sizes=\"(max-width: 814px) 100vw, 814px\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-file aligncenter\"><a href=\"http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/Research-and-Implementation-of-Chinese-Text-Automatic-Proofreading-System.pdf\">Research and Implementation of Chinese Text Automatic Proofreading System<\/a><a href=\"http:\/\/www.nlpir.org\/wordpress\/wp-content\/uploads\/2019\/01\/Research-and-Implementation-of-Chinese-Text-Automatic-Proofreading-System.pdf\" class=\"wp-block-file__button\" download>\u4e0b\u8f7d<\/a><\/div>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\" style=\"text-align:center\"><strong>NLPIR\nSEMINAR 16th ISSUE COMPLETED<\/strong><\/h2>\n\n\n\n<p>        On January 16, <strong>Jinjing Wan<\/strong> gave a presentation about the paper, <strong>Research and Implementation of Chinese Text Automatic Proofreading System<\/strong>, and shared some opinion on it.<\/p>\n\n\n\n<p>          After the presentation, several questions were asked. The Q&amp;As are listed as following:<\/p>\n\n\n\n<p>Q: How to determine absolute error?<br>A: According to the segmentation result.<\/p>\n\n\n\n<p>Q: What is the difference between 2-gram and Contextual Context?<br>A: The construction of the contextual context\u2019s binary is the similar as that of the binary of 2-gram. The difference is that the context statistics is the probability that the word appears at the same time as the left and right words.<\/p>\n\n\n\n<p>Q: What does \u2018sites of various new media\u2019 mean?<br>A: Websites like weibo.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>NLPIR SEMINAR Y2019#3 INTRO\ufeff &nbsp;&#038;nbsp &hellip; <a href=\"http:\/\/www.nlpir.org\/wordpress\/2019\/01\/15\/research-and-implementation-of-chinese-text-automatic-proofreading-system\/\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":862,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37,38],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/6653"}],"collection":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/users\/862"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/comments?post=6653"}],"version-history":[{"count":3,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/6653\/revisions"}],"predecessor-version":[{"id":6715,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/6653\/revisions\/6715"}],"wp:attachment":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/media?parent=6653"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/categories?post=6653"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/tags?post=6653"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}