The Web as a Parallel Corpus

热度2025票  浏览802次 【共0条评论】【我要评论 时间:2011年5月08日 22:23

(S"O lb Z]%`0The Web as a Parallel Corpus自然语言处理与信息检索共享平台4wv%Y2J Q_
Philip Resnik∗ Noah A. Smith†
c(v5U!IX&e0University of Maryland Johns Hopkins University
\L;^I,m~5d\0Parallel corpora have become an essential resource for work in multilingual natural language
!~Dix| i9]Bk0processing. In this article, we report on our work using the STRAND system for mining parallel
qP6rO$CC p:{ g0text on theWorldWideWeb, first reviewing the original algorithm and results and then presenting自然语言处理与信息检索共享平台_4c4Z|JJB
a set of significant enhancements. These enhancements include the use of supervised learning
[!hD m|0based on structural features of documents to improve classification performance, a new contentbased
/[0C@Q;Z ^(@0measure of translational equivalence, and adaptation of the system to take advantage of the
r:Bb%xo.Co0Internet Archive for mining parallel text from theWeb on a large scale. Finally, the value of these自然语言处理与信息检索共享平台*@UTR-G[
techniques is demonstrated in the construction of a significant parallel corpus for a low-density自然语言处理与信息检索共享平台)M*`1y(c.d8Q
language pair.

0v5].wR&K0

CH ]/i/W,AY0Philip Resnik,Noah A. Smith2007 Computational Linguistics

fn4XVFS0

w t6d3b6b7f&j+p w0  The Web as a Parallel Corpus.pdf(430 KB)自然语言处理与信息检索共享平台!U0d)w.w)PE

TAG: Corpus Web
顶:120 踩:161
对本文中的事件或人物打分:
当前平均分:-0.17 (607次打分)
对本篇资讯内容的质量打分:
当前平均分:-0.25 (549次打分)
【已经有588人表态】
123票
感动
57票
路过
56票
高兴
65票
难过
70票
搞笑
68票
愤怒
73票
无聊
76票
同情
上一篇 下一篇
发表评论
换一张

网友评论仅供网友表达个人看法,并不表明本网同意其观点或证实其描述。

查看全部回复【已有0位网友发表了看法】