In the new semester, our Lab, Web Search Mining and Security Lab, plans to hold an academic seminar every Wednesdays, and each time a keynote speaker will share understanding of papers published in recent years with you.
This week’s seminar is organized as follows: 1. The seminar time is 1.pm, Wed., at Zhongguancun Technology Park ,Building 5, 1306. 2. The lecturer is Jinjing Wan, the paper’s title is Research and Implementation of Chinese Text Automatic Proofreading System. 3. The seminar will be hosted by WangGang. 4. Attachment is the paper of this seminar, please download in advance.
Anyone interested in this topic is welcomed to join us. the following is the abstract for this week’s paper.
Research and Implementation of Chinese Text Automatic Proofreading
Junying Fu, Xiaoqin Lian and Yuying Li
The news media platform has a huge amount
of original news releases every day, it is impractical to use manual review of
text typos. This paper designed and implemented a Chinese text automatic
proofreading system for large-scale text content and high-speed processing. The
proofreading content is first analyzed and classified: typos and sensitive
information. Firstly, the system used the n-gram model to statistically analyze
the corpus after segmentation to form a 2-gram model library and a contextual
context library; secondly, builded a typo confusion set, and then calculated
the probability of the target word in the knowledge base to realize automatic
error detection and correction of
Chinese text. The system has been
successfully applied to the error of the content of many government news media
platforms, each server can handle one million articles every day. The results
show that the recall rate of the article is 78.9% and the accuracy rate is
85.1%. It meets the demand of high
speed and accurate processing of massive text error,
and has important practical significance and application fields.