Structured POI data Extraction from Internet News

热度1645票  浏览389次 【共0条评论】【我要评论 时间:2011年4月17日 20:03
Hua-Ping Zhang, Qian Mo,He-Yang Huang,Structured POI data Extraction from Internet News,In Proceedings of the 4th International Universal Communication Symposium (IUCS 2010) in Beijing, China,2010.10p115-120(特邀报告)

Abstract:  POI (Point of Interest) data is key resources for GPS application. Manual POI collection is expensive and time consuming. This paper presents a novel approach that automatically extracts structured POI data from Internet news articles. The procedure includes erasing noisy news document with POI linguistic features, making lexical analysis on the remaining texts using ICTCLAS2010, identifying time expression and the full name of POI location and organization, extracting the relationship between entities, and getting structured data given a POI event based on extraction modeling. The POI extraction model is computed with the term frequency and word distance, without any syntax analysis, scenario template or relationship induction. Consistency and validity check were employed to optimize result. Open testing with experiment conducted on 1,000 news articles, the precision is 97.30% and recall is 75.48%. The approach has been applied in industrial POI collection. POI oriented event extraction is effective.

#iO)w/[C:P#E9~Du0

Keywordsinformation extraction; extraction model;relation extraction;POI ICTCLAS2010

}nS szA0

论文: POI Extraction.pdf(146 KB)自然语言处理与信息检索共享平台g"TO F"|:^a S:Kw

研究ppt: Structured POI data Extraction from Internet News.ppt(1.43 MB)

5IYOq)J-oN1n |l0
TAG: Internet
顶:106 踩:129
对本文中的事件或人物打分:
当前平均分:-0.41 (483次打分)
对本篇资讯内容的质量打分:
当前平均分:-0.54 (458次打分)
【已经有469人表态】
92票
感动
38票
路过
49票
高兴
63票
难过
57票
搞笑
50票
愤怒
64票
无聊
56票
同情
上一篇 下一篇
发表评论
换一张

网友评论仅供网友表达个人看法,并不表明本网同意其观点或证实其描述。

查看全部回复【已有0位网友发表了看法】