The Web as a Parallel Corpus

热度2047票  浏览854次 【共0条评论】【我要评论 时间:2011年5月08日 22:23

VR-E8IY d:Q0The Web as a Parallel Corpus
N)Z,EMf$U0Philip Resnik∗ Noah A. Smith†自然语言处理与信息检索共享平台2C*G~@1dV
University of Maryland Johns Hopkins University
6]]f8`kv4JPn0Parallel corpora have become an essential resource for work in multilingual natural language
~$n9oh/@4l%|6r t0processing. In this article, we report on our work using the STRAND system for mining parallel
DY7GLHFS3W0text on theWorldWideWeb, first reviewing the original algorithm and results and then presenting
(_HnS%l0a set of significant enhancements. These enhancements include the use of supervised learning
c,GC$g9tr\0based on structural features of documents to improve classification performance, a new contentbased自然语言处理与信息检索共享平台K;\}8D1jhx$A
measure of translational equivalence, and adaptation of the system to take advantage of the
*]0H*p6\-C,PX`{0Internet Archive for mining parallel text from theWeb on a large scale. Finally, the value of these
&C*v%qJ5J0techniques is demonstrated in the construction of a significant parallel corpus for a low-density
1R/{W&H#chR0language pair.自然语言处理与信息检索共享平台*~%|D*k"q cn i B!zT

m xp2s,D#r4T:S0Philip Resnik,Noah A. Smith2007 Computational Linguistics自然语言处理与信息检索共享平台F}$j{o%S%\5Hm \k

9kz/Pp7_ tc0  The Web as a Parallel Corpus.pdf(430 KB)

ej.~ RC"X X*D0
TAG: Corpus Web
顶:122 踩:163
对本文中的事件或人物打分:
当前平均分:-0.2 (613次打分)
对本篇资讯内容的质量打分:
当前平均分:-0.26 (555次打分)
【已经有594人表态】
125票
感动
58票
路过
57票
高兴
66票
难过
70票
搞笑
69票
愤怒
73票
无聊
76票
同情
上一篇 下一篇
发表评论
换一张

网友评论仅供网友表达个人看法,并不表明本网同意其观点或证实其描述。

查看全部回复【已有0位网友发表了看法】