Structured POI data Extraction from Internet News

热度1632票  浏览381次 【共0条评论】【我要评论 时间:2011年4月17日 20:03
Hua-Ping Zhang, Qian Mo,He-Yang Huang,Structured POI data Extraction from Internet News,In Proceedings of the 4th International Universal Communication Symposium (IUCS 2010) in Beijing, China,2010.10p115-120(特邀报告)

Abstract:  POI (Point of Interest) data is key resources for GPS application. Manual POI collection is expensive and time consuming. This paper presents a novel approach that automatically extracts structured POI data from Internet news articles. The procedure includes erasing noisy news document with POI linguistic features, making lexical analysis on the remaining texts using ICTCLAS2010, identifying time expression and the full name of POI location and organization, extracting the relationship between entities, and getting structured data given a POI event based on extraction modeling. The POI extraction model is computed with the term frequency and word distance, without any syntax analysis, scenario template or relationship induction. Consistency and validity check were employed to optimize result. Open testing with experiment conducted on 1,000 news articles, the precision is 97.30% and recall is 75.48%. The approach has been applied in industrial POI collection. POI oriented event extraction is effective.自然语言处理与信息检索共享平台(iD a}1I D!~;tO

Keywordsinformation extraction; extraction model;relation extraction;POI ICTCLAS2010自然语言处理与信息检索共享平台U-r wD P t

论文: POI Extraction.pdf(146 KB)

_+O O1x3w{(^'{c-k'?0

研究ppt: Structured POI data Extraction from Internet News.ppt(1.43 MB)

-?:u8NNz XzCPds0
TAG: Internet
顶:106 踩:130
对本文中的事件或人物打分:
当前平均分:-0.38 (479次打分)
对本篇资讯内容的质量打分:
当前平均分:-0.62 (455次打分)
【已经有462人表态】
87票
感动
36票
路过
49票
高兴
59票
难过
57票
搞笑
46票
愤怒
74票
无聊
54票
同情
上一篇 下一篇
发表评论
换一张

网友评论仅供网友表达个人看法,并不表明本网同意其观点或证实其描述。

查看全部回复【已有0位网友发表了看法】