{"id":76,"date":"2017-08-05T00:00:00","date_gmt":"2017-08-04T16:00:00","guid":{"rendered":""},"modified":"2018-12-14T11:00:15","modified_gmt":"2018-12-14T03:00:15","slug":"structured-poi-data-extraction-from-internet-news","status":"publish","type":"post","link":"http:\/\/www.nlpir.org\/wordpress\/2017\/08\/05\/structured-poi-data-extraction-from-internet-news\/","title":{"rendered":"Structured POI data Extraction from Internet News"},"content":{"rendered":"

\n
\n
\n
\n
\n
Hua-Ping Zhang, Qian Mo,He-Yang Huang,Structured POI data Extraction from Internet News,In Proceedings of the 4th International Universal Communication Symposium (IUCS 2010) in Beijing, China,2010.10<\/FONT>\uff0c<\/FONT>p115-120(<\/FONT>\u7279\u9080\u62a5\u544a<\/FONT>)<\/FONT><\/TD><\/TR><\/TBODY><\/TABLE>
\n
Abstract<\/SPAN><\/SPAN>:<\/SPAN><\/SPAN><\/EM> <\/SPAN><\/SPAN>POI (Point of Interest) data is key resources for <\/SPAN>GPS <\/SPAN>application. <\/SPAN>M<\/SPAN>anual POI <\/SPAN>collection<\/SPAN> is expensive and time consuming. This paper presents a <\/SPAN>novel<\/SPAN> <\/SPAN>approach<\/SPAN> that automatically extracts structured POI data from Internet news articles. The procedure includes erasing noisy news document with POI linguistic features, making lexical analysis <\/SPAN>on the remaining texts using ICTCLAS2010<\/SPAN>, identifying time expression<\/SPAN> and the full name of POI <\/SPAN>location and organization, extracting <\/SPAN>the relationship between entities, and getting structured data given a POI <\/SPAN>event based on <\/SPAN>extraction modeling<\/SPAN>.<\/SPAN> The POI extraction model is computed with the term frequency and word distance, without any syntax analysis, scenario template or relationship induction. C<\/SPAN>onsistency<\/SPAN> and validity check were employed to optimize <\/SPAN>result. Open testing with experiment conducted on 1,000 news<\/SPAN> articles<\/SPAN>, the precision is 9<\/SPAN>7<\/SPAN>.<\/SPAN>3<\/SPAN>0% and recall is 75.48%. The <\/SPAN>approach <\/SPAN>has been applied in industrial POI collection<\/SPAN>. <\/SPAN>POI oriented event extraction is effective.<\/SPAN><\/STRONG><\/FONT><\/P>
\n
Keywords<\/FONT><\/SPAN>\uff1a<\/SPAN>information extraction; <\/SPAN>extraction model<\/SPAN>;relation extraction;POI ICTCLAS2010<\/SPAN><\/FONT><\/EM><\/STRONG><\/P>
\n
\u8bba\u6587\uff1a POI Extraction.pdf(146 KB)<\/A><\/SPAN><\/FONT><\/EM><\/STRONG><\/P>
\n
\u7814\u7a76ppt\uff1a Structured POI data Extraction from Internet News.ppt(1.43 MB)<\/A><\/SPAN><\/FONT><\/EM><\/STRONG><\/P><\/p>\n","protected":false},"excerpt":{"rendered":"
Hua-Ping Zhang, Qian Mo,He-Yang Huang,St … \u7ee7\u7eed\u9605\u8bfb →<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76"}],"collection":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/comments?post=76"}],"version-history":[{"count":1,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76\/revisions"}],"predecessor-version":[{"id":1508,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76\/revisions\/1508"}],"wp:attachment":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/media?parent=76"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/categories?post=76"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/tags?post=76"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}