﻿{"id":76,"date":"2017-08-05T00:00:00","date_gmt":"2017-08-04T16:00:00","guid":{"rendered":""},"modified":"2018-12-14T11:00:15","modified_gmt":"2018-12-14T03:00:15","slug":"structured-poi-data-extraction-from-internet-news","status":"publish","type":"post","link":"http:\/\/www.nlpir.org\/wordpress\/2017\/08\/05\/structured-poi-data-extraction-from-internet-news\/","title":{"rendered":"Structured POI data Extraction from Internet News"},"content":{"rendered":"<p><TABLE style=\"WIDTH: 651pt; BORDER-COLLAPSE: collapse\" border=0 cellSpacing=0 cellPadding=0 width=868 x:str><br \/>\n<COLGROUP><br \/>\n<COL style=\"WIDTH: 651pt; mso-width-source: userset; mso-width-alt: 27776\" width=868><br \/>\n<TBODY><br \/>\n<TR style=\"HEIGHT: 35.25pt; mso-height-source: userset\" height=47><br \/>\n<TD style=\"BORDER-BOTTOM: #ffffff; BORDER-LEFT: #ffffff; BACKGROUND-COLOR: transparent; WIDTH: 651pt; HEIGHT: 35.25pt; BORDER-TOP: #ffffff; BORDER-RIGHT: #ffffff\" class=xl24 height=47 width=868><FONT face=\"Times New Roman\">Hua-Ping Zhang, Qian Mo,He-Yang Huang,Structured POI data Extraction from Internet News,In Proceedings of the 4th International Universal Communication Symposium (IUCS 2010) in Beijing, China,2010.10<\/FONT><FONT class=font7 face=\u5b8b\u4f53>\uff0c<\/FONT><FONT class=font5 face=\"Times New Roman\">p115-120(<\/FONT><FONT class=font7 face=\u5b8b\u4f53>\u7279\u9080\u62a5\u544a<\/FONT><FONT class=font5 face=\"Times New Roman\">)<\/FONT><\/TD><\/TR><\/TBODY><\/TABLE><br \/>\n<P style=\"MARGIN: 0cm 0cm 10pt\" class=Abstract><EM><SPAN class=StyleAbstractItalicChar><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-WEIGHT: normal\" lang=EN-US>Abstract<\/SPAN><\/SPAN><SPAN class=StyleAbstractItalicChar><SPAN style=\"FONT-FAMILY: 'Times New Roman'; FONT-WEIGHT: normal; mso-fareast-language: ZH-CN\" lang=EN-US>:<\/SPAN><\/SPAN><\/EM><FONT face=\"Times New Roman\"><STRONG><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US> <SPAN style=\"mso-spacerun: yes\">&nbsp;<\/SPAN><\/SPAN><SPAN lang=EN-US>POI (Point of Interest) data is key resources for <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>GPS <\/SPAN><SPAN lang=EN-US>application. <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>M<\/SPAN><SPAN lang=EN-US>anual POI <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>collection<\/SPAN><SPAN lang=EN-US> is expensive and time consuming. This paper presents a <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>novel<\/SPAN><SPAN lang=EN-US> <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>approach<\/SPAN><SPAN lang=EN-US> that automatically extracts structured POI data from Internet news articles. The procedure includes erasing noisy news document with POI linguistic features, making lexical analysis <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>on the remaining texts using ICTCLAS2010<\/SPAN><SPAN lang=EN-US>, identifying time expression<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US> and the full name of POI <\/SPAN><SPAN lang=EN-US>location and organization, extracting <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>the relationship between entities, and getting structured data given a POI <\/SPAN><SPAN lang=EN-US>event based on <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>extraction modeling<\/SPAN><SPAN lang=EN-US>.<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US> The POI extraction model is computed with the term frequency and word distance, without any syntax analysis, scenario template or relationship induction. C<\/SPAN><SPAN lang=EN-US>onsistency<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US> and validity check were employed to optimize <\/SPAN><SPAN lang=EN-US>result. Open testing with experiment conducted on 1,000 news<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US> articles<\/SPAN><SPAN lang=EN-US>, the precision is 9<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>7<\/SPAN><SPAN lang=EN-US>.<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>3<\/SPAN><SPAN lang=EN-US>0% and recall is 75.48%. The <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>approach <\/SPAN><SPAN lang=EN-US>has been applied in industrial POI collection<\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>. <\/SPAN><SPAN lang=EN-US>POI oriented event extraction is effective.<\/SPAN><\/STRONG><\/FONT><\/P><br \/>\n<P style=\"MARGIN: 0cm 0cm 6pt\" class=keywords><STRONG><EM><SPAN lang=EN-US><FONT face=\"Times New Roman\">Keywords<\/FONT><\/SPAN><SPAN style=\"FONT-FAMILY: \u5b8b\u4f53; mso-ascii-font-family: 'Times New Roman'; mso-hansi-font-family: 'Times New Roman'\" lang=EN-US>\uff1a<\/SPAN><FONT face=\"Times New Roman\"><SPAN lang=EN-US>information extraction; <\/SPAN><SPAN style=\"mso-fareast-language: ZH-CN\" lang=EN-US>extraction model<\/SPAN><SPAN lang=EN-US>;relation extraction;POI ICTCLAS2010<\/SPAN><\/FONT><\/EM><\/STRONG><\/P><br \/>\n<P style=\"MARGIN: 0cm 0cm 6pt\" class=keywords><STRONG><EM><FONT face=\"Times New Roman\"><SPAN lang=EN-US>\u8bba\u6587\uff1a<A href=\"http:\/\/www.nlpir.org\/wordpress\/attachments\/2011\/04\/POI Extraction.pdf\" target=_blank><IMG border=0 src=\"http:\/\/www.nlpir.org\/images\/base\/attachment.gif\"> POI Extraction.pdf(146 KB)<\/A><\/SPAN><\/FONT><\/EM><\/STRONG><\/P><br \/>\n<P style=\"MARGIN: 0cm 0cm 6pt\" class=keywords><STRONG><EM><FONT face=\"Times New Roman\"><SPAN lang=EN-US>\u7814\u7a76ppt\uff1a<A href=\"http:\/\/www.nlpir.org\/wordpress\/attachments\/2011\/04\/Structured POI data Extraction from Internet News.ppt\" target=_blank><IMG border=0 src=\"http:\/\/www.nlpir.org\/images\/base\/attachment.gif\"> Structured POI data Extraction from Internet News.ppt(1.43 MB)<\/A><\/SPAN><\/FONT><\/EM><\/STRONG><\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hua-Ping Zhang, Qian Mo,He-Yang Huang,St &hellip; <a href=\"http:\/\/www.nlpir.org\/wordpress\/2017\/08\/05\/structured-poi-data-extraction-from-internet-news\/\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76"}],"collection":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/comments?post=76"}],"version-history":[{"count":1,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76\/revisions"}],"predecessor-version":[{"id":1508,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/posts\/76\/revisions\/1508"}],"wp:attachment":[{"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/media?parent=76"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/categories?post=76"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nlpir.org\/wordpress\/wp-json\/wp\/v2\/tags?post=76"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}