{"id":687,"date":"2011-11-05T18:52:22","date_gmt":"2011-11-05T17:52:22","guid":{"rendered":"http:\/\/visurus.wordpress.com\/?p=687"},"modified":"2013-12-10T11:27:33","modified_gmt":"2013-12-10T11:27:33","slug":"scraping-tabular-data-from-the-web","status":"publish","type":"post","link":"https:\/\/www.ralphstraumann.ch\/blog\/2011\/11\/scraping-tabular-data-from-the-web\/","title":{"rendered":"Scraping tabular data from the web"},"content":{"rendered":"<figure id=\"attachment_689\" aria-describedby=\"caption-attachment-689\" style=\"width: 500px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.flickr.com\/photos\/t_buchtele\/3422507814\/\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-689 \" title=\"Needle in haystack\" alt=\"\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2011\/11\/needleinhaystack1.jpg\" width=\"500\" height=\"332\" \/><\/a><figcaption id=\"caption-attachment-689\" class=\"wp-caption-text\">&#8220;Needle in haystack&#8221; CC-BY-NC-ND by t_buchtele<\/figcaption><\/figure>\n<p>I&#8217;ve been looking for a quick and easy solution to scrape <a href=\"http:\/\/www.parlamentswahlen-2011.ch\/resultate-a-z.html\">an HTML table<\/a> into a usable format. Of course, there are numerous solutions to do that in some small Perl\/PHP\/Python programme, but I found another path especially elegant. It turns out, Google Docs has an <em>importHTML()<\/em> function in Spreadsheets:<\/p>\n<p><em>=importHTML(\u201chttp:\/\/www.parlamentswahlen-2011.ch\/resultate-a-z.html\u201d,\u201dtable\u201d,1)<\/em><\/p>\n<p><em><\/em>scrapes the first (<em>1<\/em>) HTML table element (<em>&#8220;table&#8221;<\/em>) from\u00a0<em>http:\/\/www.parlamentswahlen-2011.ch\/resultate-a-z.html<\/em> into your Google spreadsheet. Very nice!<\/p>\n<p>Hat tips to <a href=\"http:\/\/blog.ouseful.info\/2008\/10\/14\/data-scraping-wikipedia-with-google-spreadsheets\">OUseful.Info<\/a> for this trick :)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been looking for a quick and easy solution to scrape an HTML table into a usable format. Of course, there are numerous solutions to do that in some small Perl\/PHP\/Python programme, but I found another path especially elegant. It turns out, Google Docs has an importHTML() function in Spreadsheets: =importHTML(\u201chttp:\/\/www.parlamentswahlen-2011.ch\/resultate-a-z.html\u201d,\u201dtable\u201d,1) scrapes the first (1) &hellip; <a href=\"https:\/\/www.ralphstraumann.ch\/blog\/2011\/11\/scraping-tabular-data-from-the-web\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Scraping tabular data from the web<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":689,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[7],"tags":[33,58,63,99],"class_list":["post-687","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-note","tag-data-processing","tag-google-docs","tag-html","tag-scraping"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2011\/11\/needleinhaystack.jpg","jetpack_shortlink":"https:\/\/wp.me\/p3pPwF-b5","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/comments?post=687"}],"version-history":[{"count":2,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/687\/revisions"}],"predecessor-version":[{"id":1525,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/687\/revisions\/1525"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media\/689"}],"wp:attachment":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media?parent=687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/categories?post=687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/tags?post=687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}