{"id":878,"date":"2012-05-04T07:00:00","date_gmt":"2012-05-04T05:00:00","guid":{"rendered":"http:\/\/visurus.wordpress.com\/?p=878"},"modified":"2013-12-10T10:48:26","modified_gmt":"2013-12-10T10:48:26","slug":"how-to-work-with-excel-xls-files-outside-of-excel","status":"publish","type":"post","link":"https:\/\/www.ralphstraumann.ch\/blog\/2012\/05\/how-to-work-with-excel-xls-files-outside-of-excel\/","title":{"rendered":"How to work with Excel XLS files outside of Excel"},"content":{"rendered":"<p>I&#8217;ve previously talked very briefly about <a href=\"http:\/\/visurus.wordpress.com\/2011\/11\/05\/scraping-tabular-data-from-the-web\/\">scraping tabular data from the web<\/a>. That post pointed to a nice shortcut by Tony Hirst of\u00a0<a href=\"http:\/\/blog.ouseful.info\">OUseful.Info<\/a>\u00a0to import a HTML table into a Google Spreadsheet. In one of his most recent posts Tony Hirst goes into some detail with regards to using tabular data in Excel&#8217;s XLS format (not the more recent XML-based XLSX format) \u2013 without actually owning a copy of Excel (this applies to me, at least privately). The post describes how to use <a href=\"http:\/\/code.google.com\/p\/google-refine\/\">Google Refine<\/a>\u00a0or the <a href=\"https:\/\/secure.simplistix.co.uk\/svn\/xlrd\/trunk\/xlrd\/doc\/xlrd.html\">xlrd Python package<\/a> to digest and manipulate XLS files.<\/p>\n<figure id=\"attachment_879\" aria-describedby=\"caption-attachment-879\" style=\"width: 414px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-879\" title=\"Refining data\" alt=\"\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2012\/05\/refine.jpg\" width=\"414\" height=\"206\" srcset=\"https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2012\/05\/refine.jpg 414w, https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2012\/05\/refine-300x149.jpg 300w\" sizes=\"auto, (max-width: 414px) 100vw, 414px\" \/><figcaption id=\"caption-attachment-879\" class=\"wp-caption-text\">Refining data: From coal to diamond<\/figcaption><\/figure>\n<p>Google Refine is an offline-product (so your data does not need to be sent to Google&#8217;s servers) which offers some powerful functionality to comb through and correct data. You can for example correct individual typos in category names and similar things very quickly. Note however, that this is a semi-specialist tool, watching the introductory movies on the website is advisable.<\/p>\n<p>xlrd is a Python package that can be used offline like any other Python package. However, as an added value Tony Hirst describes how you can combine xlrd with <a href=\"https:\/\/scraperwiki.com\/\">Scraperwiki<\/a> to deploy a cloud-hosted data scraper for online XLS files.<\/p>\n<p>So, recommended read if the above sounds interesting to you: <a href=\"http:\/\/blog.ouseful.info\/2012\/04\/30\/working-with-excel-files-without-using-excel\">Working with Excel Files without Using Excel<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve previously talked very briefly about scraping tabular data from the web. That post pointed to a nice shortcut by Tony Hirst of\u00a0OUseful.Info\u00a0to import a HTML table into a Google Spreadsheet. In one of his most recent posts Tony Hirst goes into some detail with regards to using tabular data in Excel&#8217;s XLS format (not &hellip; <a href=\"https:\/\/www.ralphstraumann.ch\/blog\/2012\/05\/how-to-work-with-excel-xls-files-outside-of-excel\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">How to work with Excel XLS files outside of Excel<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":879,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"How to work with Excel XLS files outside of Excel: http:\/\/wp.me\/p1qYOj-ea #data #tools","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7],"tags":[47,99,109],"class_list":["post-878","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-note","tag-excel","tag-scraping","tag-tabular-data"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2012\/05\/refine.jpg","jetpack_shortlink":"https:\/\/wp.me\/p3pPwF-ea","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/comments?post=878"}],"version-history":[{"count":2,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/878\/revisions"}],"predecessor-version":[{"id":1464,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/878\/revisions\/1464"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media\/879"}],"wp:attachment":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media?parent=878"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/categories?post=878"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/tags?post=878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}