{"id":1810,"date":"2014-09-21T15:45:00","date_gmt":"2014-09-21T15:45:00","guid":{"rendered":"http:\/\/www.ralphstraumann.ch\/blog\/?p=1810"},"modified":"2016-06-06T06:41:59","modified_gmt":"2016-06-06T06:41:59","slug":"the-data-workers-manifesto","status":"publish","type":"post","link":"https:\/\/www.ralphstraumann.ch\/blog\/2014\/09\/the-data-workers-manifesto\/","title":{"rendered":"The Data Worker&#8217;s Manifesto"},"content":{"rendered":"<p><em>This article is a re-post of an article that first appeared on <a href=\"http:\/\/geo.ebp.ch\/2014\/09\/19\/the-data-workers-manifesto\/\">www.geo.ebp.ch<\/a>.<\/em><\/p>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2233\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-1-300x215.png\" alt=\"straumann-geobeer8-slide-1\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Last week I gave\u00a0a talk at the 8th instalment of the <a href=\"http:\/\/www.geobeer.ch\">GeoBeer series<\/a>\u00a0on EBP&#8217;s Zurich-Stadelhofen premises and sponsored by <a href=\"http:\/\/www.ebp.ch\/en\">EBP<\/a> and <a href=\"http:\/\/www.crosswind.ch\">Crosswind<\/a>. It was\u00a0titled <strong>State of the Union: Data as Enabling Tech\u203d<\/strong><\/p>\n<p>You can check out the <a href=\"http:\/\/www.ralphstraumann.ch\/downloads\/straumann_geobeer8_2014\">whole slidedeck<\/a> on my <a href=\"http:\/\/www.ralphstraumann.ch\">private website<\/a> (The slides are made with <a href=\"http:\/\/bartaz.github.io\/impress.js\">impress.js<\/a>\u00a0and\u00a0best viewed in Chrome. Please ignore my horrible inline CSS..)<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2234\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-2.png\" alt=\"straumann-geobeer8-slide-2\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>I&#8217;m quite sure it&#8217;s not best practice to give one&#8217;s\u00a0talk an unintelligible title. Nevertheless, that&#8217;s what I did, so let me explain what the different parts mean:<\/p>\n<p>I chose <strong>&#8220;state of the union&#8221;<\/strong> as a fancy way of expressing that I&#8217;m directing my talk primarily at fellow geoinformation and data people.<\/p>\n<p>With <strong>&#8220;data&#8221;<\/strong> we usually refer to\u00a0raw observations of some phenomenon. We&#8217;ll discuss later, how helpful\u00a0that definition turns out to be.<\/p>\n<p><strong>&#8220;Enabling tech&#8221;<\/strong> would usually expand\u00a0to &#8220;technology&#8221; and the term is\u00a0used to denote a technical development that makes novel applications possible in the first point. However, in the\u00a0context of this talk it may be worthwhile to keep the 2nd potential meaning of the stub &#8220;tech&#8221; \u2013 &#8220;technique&#8221; \u2013 in mind, as well.<\/p>\n<p>Finally, the\u00a0<strong>\u203d<\/strong> is called an interrobang and nicely reflects\u00a0the semantic ambivalence\u00a0of combining\u00a0<strong>?<\/strong>\u00a0and\u00a0<strong>!<\/strong>\u00a0into one punctuation mark.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2235\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-3.png\" alt=\"straumann-geobeer8-slide-3\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Sometime in\u00a0the last decade, we as a society have <strong>moved\u00a0from a situation where data was usually scarce<\/strong>\u00a0<strong>to one\u00a0where (many forms of) data are abundant<\/strong>. Where before,\u00a0the first step of analysis was often one of\u00a0interpolation between valuable data points, we now filter, subsample, and aggregate our data.\u00a0Not all domains are the same in this respect, obviously. But I think the generalisation pretty much holds, as (often ill-applied) labels such as &#8220;big data&#8221; or &#8220;<a href=\"http:\/\/cloudtweaks.com\/2014\/03\/cartoon-humongous-data\/\">humongous data<\/a>&#8221; indicate. (Well, the latter is obviously a joke; but think about why it works as such.)<\/p>\n<p>Big drivers of this development are a) the Web and its numerous branches and platforms and b) smartphones, tablets, phablets and what have you, or more broadly speaking:\u00a0embedded sensors, GPS loggers, tracking and fleet management systems, automotive sensors, wearables, &#8216;self-tracking&#8217; or &#8216;quantified-self&#8217; technology, networked hardware such as appliances\u00a0(think\u00a0Internet\u00a0of Things) and the like.<\/p>\n<p>In what follows I&#8217;m going to talk primarily on crowdsourced data. (In other contexts, crowdsourced (geographic) data is also called <a href=\"http:\/\/povesham.files.wordpress.com\/2013\/09\/haklaycrowdsourcinggeographicknowledge.pdf\">e.g.\u00a0<em>Volunteered Geographic Information<\/em><\/a>, VGI, (a term\u00a0fraught with problems), or <em>User-Generated Content<\/em>, UGC.) But some of the assertions also hold for data in general.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2213\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-4.png\" alt=\"straumann-geobeer8-slide-4\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2214\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-5.png\" alt=\"straumann-geobeer8-slide-5\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong><em>Crowdsourced<\/em> data<\/strong>, i.e. data that:<\/p>\n<p>\u2013 is gathered from many contributors,<\/p>\n<p>\u2013 in a decentralised fashion,<\/p>\n<p>\u2013 following (at best) informal rules and protocols,<\/p>\n<p>\u2013 voluntarily, unknowingly or with incentives,<\/p>\n<p><strong>has some issues<\/strong>.<\/p>\n<p>The large-scale advent of this crowdsourced data of course coincides with the development of the so-called\u00a0<em><a href=\"http:\/\/oreilly.com\/web2\/archive\/what-is-web-20.html\">Web 2.0<\/a><\/em>\u00a0(in German also referred to as the &#8216;participation Web&#8217;), where anybody could not just be a consumer, but also (<a href=\"http:\/\/geography.oii.ox.ac.uk\">at least, in theory<\/a>) a producer, or: a <em>produser<\/em>. Or so we were told.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2215\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-6.png\" alt=\"straumann-geobeer8-slide-6\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>But: crowdsourced data is biased<\/strong><\/p>\n<p>This map shows <a href=\"http:\/\/www.openstreetmap.org\/\">OpenStreetMap<\/a>\u00a0(OSM) node density normalised by inhabitants (compiled by my <a href=\"http:\/\/www.oii.ox.ac.uk\">OII<\/a> colleagues <a href=\"http:\/\/www.oii.ox.ac.uk\/people\/?id=321\">Stefano de Sabbata<\/a> and <a href=\"http:\/\/zerogeography.net\">Mark Graham<\/a>).<\/p>\n<p>Assuming (somewhat simplifying)\u00a0that the presence of\u00a0people effects the build-up of infrastructure, in an ideal world this map would feature a uniform\u00a0colour everywhere. However, there are regions where relative data density in OSM\u00a0exceeds that of other regions by 3\u20134 orders of magnitude! Compare this to the <a href=\"http:\/\/en.wikipedia.org\/wiki\/File:Geonames4.png\">density of placenames in the GeoNames Gazetteer<\/a>!<\/p>\n<p>Clearly, offering\u00a0an\u00a0&#8220;open platform&#8221; and encouraging\u00a0participation is not enough to really level the playing field in user-generatation of content. In some regions people might\u00a0not have the means (spare-time, economic freedom, hardware, software, education, technical skills, access to stable (broadband) Internet, motivation) to participate or they might e.g. have reservations\u00a0against this kind of project or the organisations\u00a0behind it.<\/p>\n<p>Spatially heterogeneous density\u00a0is\u00a0just one example of bias we find in crowdsourced data. Another one is termed\u00a0<a href=\"http:\/\/timogrossenbacher.ch\/2014\/04\/truth-and-beauty-in-social-media\/\"><em>user contribution bias<\/em><\/a>, where a very small proportion of contributors (think Twitter users, Flickr photographers, Facebook posters, &#8230;) creates a large proportion of the data. Depending on the platform we see very lopsided distributions with few percent of users being behind a large share\u00a0of the content. In his Master&#8217;s thesis, <a href=\"http:\/\/timogrossenbacher.ch\">Timo Grossenbacher<\/a> found that in his sample of Twitter, <a href=\"http:\/\/timogrossenbacher.ch\/2014\/04\/truth-and-beauty-in-social-media\/\">7% of the users created 50% of the tweets<\/a>. Despite all techno-optimism: clearly, not everyone is a <em>produser<\/em> and clearly not all contributors\u00a0create equal amounts of content!<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2216\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-7.png\" alt=\"straumann-geobeer8-slide-7\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Talking of different kinds of bias: OSM has also been found <strong>sexist<\/strong>, for example. OSM contributors (like in many crowdsourcing initiatives) are, as a tendency, young, male, technologically minded, with above\u00a0average education. Narrow groups of contributors may, inadvertently or consciously, favour their own interests in creating content.<\/p>\n<p>OSM&#8217;s &#8220;bottom-up data model&#8221; (basically, the community discusses and decides what is mapped how) gives contributors allocative power, i.e. what most people (or the most industrious\u00a0contributors?)\u00a0adopt as their practice has good chances\u00a0to\u00a0evolve into community (best?) practice.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-8.png\"><img decoding=\"async\" class=\"alignleft wp-image-2217\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-8.png\" alt=\"straumann-geobeer8-slide-8\" width=\"720\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2218\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-9.png\" alt=\"straumann-geobeer8-slide-9\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Further, some patterns in crowdsourced data may be <strong>very surprising<\/strong>.<\/p>\n<p>One example this talk has already touched upon is user contribution bias, where a small group dominates the crowdsourcing activity. A more complicated example of surprising insights hidden in crowdsourced data is in the figure on the left. Remember that in\u00a0Wikipedia, the self-declared repository for the <i>sum of all human knowledge<\/i> it&#8217;s well known, that the spatial distribution of geocoded and &#8220;geocode-able&#8221; articles is strongly biased. A map I made with my colleagues\u00a0at\u00a0the OII shows that <strong><a href=\"http:\/\/geography.oii.ox.ac.uk\/?page=the-geographically-uneven-coverage-of-wikipedia\">a part of Europe features as many Wikipedia articles as the rest of the world<\/a><\/strong>. (By the\u00a0way, there is <a href=\"http:\/\/en.wikipedia.org\/wiki\/Wikipedia:Systemic_bias\">this interesting Wikipedia page that discusses all kinds of biases<\/a> that affect Wikipedia.)<\/p>\n<p>Now, as the figure shows, despite this known severe lack of content e.g. in the <strong>Middle East and North Africa<\/strong> (MENA), only about a third of edits that are made by contributors in that region are about articles in the same region. Surprisingly, a large proportion\u00a0of MENA&#8217;s (in absolute terms low) editing activity is geared towards contributing to articles outside their own region, about phenomena in North America, Asia and Europe. If you expected, as many people do, that contributors edit mostly about phenomena in their immediate environment and that they tend to &#8220;fill in gaps&#8221; in content, this insight comes as a surprise.<\/p>\n<p>Cultural, personal (education, careers, family relations, travel, tourism, &#8230;), linguistic, historical, colonial, political, and many more reasons may play into this.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-10.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2219\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-10.png\" alt=\"straumann-geobeer8-slide-10\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>The new <strong>abundance of data<\/strong>, the proliferation of open (government) data, APIs and the current <strong>popularity of information or data visualisation<\/strong> (<em>infoviz\/dataviz<\/em>) <strong>as well as data-driven journalism<\/strong> (DDJ) has led to many more people and institutions obtaining, processing, analysing, visualising and disseminating data.<\/p>\n<p>While this may be welcomed by data-inclined people in general, unfortunately it sometimes leads to people <strong>attaching false meaning to data<\/strong> or to interpreting insights\u00a0into data that are not supported by it.<\/p>\n<p>This\u00a0example shows geocoded tweets in response to the release of a Beyonc\u00e9 album. In my opinion, while technologically interesting, the visualisation has severe flaws in terms of (re)presentation, cartography\u00a0and\u00a0infoviz best practices. But: even more importantly, it utterly fails to mention e.g., that a) Twitter users are a highly biased, small subgroup of the general population, that b) the proportion of geocoded tweets is estimated to be in the very low percent numbers (often, &lt; 3% is indicated!), that c) user contribution bias is likely at play, that d) geolocation may be faulty,\u00a0etc. etc.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-11.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2220\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-11.png\" alt=\"straumann-geobeer8-slide-11\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Finally, this figure shows the result of <a href=\"https:\/\/twitter.com\/achillean\/status\/505049645245288448\">&#8220;ping[ing] all the devices on the internet&#8221;<\/a> according to John Matherly of Shodan. This figure and story went viral, it appeared e.g. on <a href=\"http:\/\/gizmodo.com\/a-map-of-every-device-in-the-world-thats-connected-to-t-1628171291\">Gizmodo<\/a>, <a href=\"http:\/\/thenextweb.com\/shareables\/2014\/08\/29\/looks-like-ping-entire-internet\/\">The Next Web<\/a>, <a href=\"http:\/\/www.iflscience.com\/technology\/map-shows-all-devices-world-connected-internet\">IFLScience!<\/a>, and many more.<\/p>\n<p>Turns out, if you dig a bit deeper, there are <strong>some rather important disclaimers<\/strong>: e.g. a very limited window during which the analysis was reportedly carried out and, more importantly, only pinging devices addressed using IPv4, not considering IPv6. You can read about these on this <a href=\"http:\/\/www.reddit.com\/r\/dataisbeautiful\/comments\/2evjkz\/i_pinged_all_devices_on_the_internet_heres_a_map\/\">Reddit thread<\/a>.<\/p>\n<p>Turns out some countries in Asia that have recently invested heavily into broadband Internet infrastructure and also large parts of Africa where the Internet is mainly used on mobile devices, use IPv6 and thus show up as black holes or rather dark regions on this &#8220;map of the Internet&#8221;.<\/p>\n<p>Sadly, the relative lack of access to Internet, content and netizens in Africa is a truth (cf. the OII Wikipedia analyses mentioned above). However, the situation, at least in terms of connected devices is not as dire as this map makes you believe!<\/p>\n<p>However, I think the <strong>very fact that the map played into this common narrative<\/strong>\u00a0of unconnected, offline regions is an important factor in its massive proliferation (a.k.a. &#8216;going viral&#8217;). Unfortunately, it seems all this sharing happened without discussions\u00a0on the data source, data collection method, processing steps, and important disclaimers about the data&#8217;s validity and legitimacy \u2013 and, let&#8217;s face it, <strong>very little critical reception and reflection<\/strong> on part of the audience, i.e. us.<\/p>\n<p>The effects? \u2013 The original tweet has been retweeted more than 5,500 times! Go figure.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-12.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2221\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-12.png\" alt=\"straumann-geobeer8-slide-12\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>With these examples\u00a0in mind, let&#8217;s turn to the classic <strong>Data-Information-Knowledge-Wisdom<\/strong> <a href=\"http:\/\/en.wikipedia.org\/wiki\/DIKW_Pyramid\">workflow or pyramid<\/a>. In the DIKW mindset, data is composed of raw observations. Only structuring, pattern-detection, and asking the right questions turn data\u00a0into information. Memorised, recalled and applied in a suitable context, information becomes\u00a0knowledge. And finally, there&#8217;s the wisdom stage that is concerned with &#8216;why&#8217; rather than &#8216;what&#8217;, &#8216;when&#8217;, &#8216;where&#8217; and &#8216;how&#8217; etc.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-13.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2222\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-13.png\" alt=\"straumann-geobeer8-slide-13\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Well, turns out, one can argue rather well that <strong>&#8216;raw data&#8217; does not, in fact, exist<\/strong>.<\/p>\n<p>Data \u2013 and I would argue also crowdsourced data \u2013 is usually\u00a0collected with an intent, an application in mind or, if not that, at least with a specific\u00a0method, from a certain\u00a0group of people, by a defined\u00a0group of people, using a certain measuring device. Whether this happens implicitly or explicitly and willingly does not matter in this context. Clearly, however, these factors all potentially\u00a0affect the applications the data can sensibly be used for.<\/p>\n<p>So, there goes the title of my talk: &#8216;data&#8217; may not actually be &#8216;raw&#8217;. And overly focussing on <em>technology<\/em> and missing out on the underlying <em>technique<\/em>\u00a0can be\u00a0dangerous!<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-14.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2223\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-14.png\" alt=\"straumann-geobeer8-slide-14\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>Putting it bluntly: <strong>Unlike this car, data is never general-purpose.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-15.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2224\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-15.png\" alt=\"straumann-geobeer8-slide-15\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>For all these\u00a0reasons, and because I care about our profession and about what is being done with data in the\u00a0society at large (think: data-driven <span style=\"text-decoration: line-through;\">churnalism<\/span> journalism, evidence-based politics, etc.) I would like to propose:<\/p>\n<p><strong>The Data Worker&#8217;s Manifesto<\/strong>.<\/p>\n<p>It\u00a0consists\u00a0of only few, easily memorised principles:<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-16.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2225\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-16.png\" alt=\"straumann-geobeer8-slide-16\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>Know your data!<\/strong><\/p>\n<p>Know the sources of your data, collection methodology, the sample size and composition, consistency, pre-processing steps possibly carried out by others or by yourself, more generally: the lineage, biases, quality issues, limitations, legitimate appliations and use cases. Know all these very well. If you don&#8217;t, try to find out. If you can&#8217;t be sure, refrain from using the data.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-17.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2226\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-17.png\" alt=\"straumann-geobeer8-slide-17\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>Discuss data and how it&#8217;s being used.<\/strong><\/p>\n<p>The Internet and social media are wonderful things where thousands of links are shared. Ever so often you may see an analysis with un(der)-documented input data or methodology.<\/p>\n<p>Reflect critically what others may share blindly. If you have questions: remember, the Web is a two-way street these days. Gently but firmly ask them and make your sharing of, and investment into, any analysis dependent on the answer.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-18.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2227\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-18.png\" alt=\"straumann-geobeer8-slide-18\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>Create and share metadata!<\/strong><\/p>\n<p>If you do data-based analyses and produce visualisations, always keep track of what you have done with the data: Did you apply filters? Remove (suspected) outliers? Subsample, downsample, disaggregate, aggregate, combine, split, join, clean, purge, merge, &#8230; the data? Document your steps and assumptions\u00a0and share this\u00a0metadata to give your collaborators and your audience insight into data provenance\u00a0and\u00a0your methodology, along with the results.<\/p>\n<p>If you share your insights in a social media content (e.g. a map as a PNG file), I recommend burning the metadata into the result, i.e. put the metadata somewhere into the content so that it&#8217;s hard to remove. Because said content will \u2013 at some point \u2013 be taken, proliferated, received and analysed out of context. Guaranteed.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-19.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2228\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-19.png\" alt=\"straumann-geobeer8-slide-19\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>3b is very similar to 3: <strong>Create and share metadata!<\/strong><\/p>\n<p>Seriously: I know metadata is uncool and not sexy at all to maintain. But nothing good comes from not doing it!<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-20.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2229\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-20.png\" alt=\"straumann-geobeer8-slide-20\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>Experts are valuable.<\/strong><\/p>\n<p>While the <a href=\"http:\/\/archive.wired.com\/science\/discoveries\/magazine\/16-07\/pb_theory\">&#8220;end of theory&#8221;<\/a> has been proclaimed, I think the <a href=\"http:\/\/oupacademic.tumblr.com\/post\/48310773463\/misquotation-reports-of-my-death-have-been-greatly\">&#8220;report of [its] death has been greatly exaggerated&#8221;<\/a>.<\/p>\n<p>Being, or being in contact with, a domain specialist is still very valuable. Sometimes,\u00a0especially for harder, i.e. more interesting, analyses, it&#8217;s indispensible. In the very least, expert knowledge may save you from doing something silly with data you don&#8217;t completely understand.<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-21.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2230\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-21.png\" alt=\"straumann-geobeer8-slide-21\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><strong>We&#8217;re in this together.<\/strong><\/p>\n<p>I feel we are all still coming to terms with the new opportunities the Web and some of the data-related developments I mentioned provide to us (let alone methodological and computational improvements and societal developments). It can be a bumpy, but in any case an exciting, ride, so let&#8217;s buckle up, meet and talk and share our experiences \u2013 but that&#8217;s obviously why all of you have come to this GeoBeer in the first place!<\/p>\n<hr \/>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-22.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2231\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-22.png\" alt=\"straumann-geobeer8-slide-22\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p><a href=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-23.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2232\" src=\"http:\/\/geo.ebp.ch\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-23.png\" alt=\"straumann-geobeer8-slide-23\" width=\"400\" height=\"287\" \/><\/a><\/p>\n<p>I feel that despite all these potential pitfalls we should perceive the abundant data, especially new data types such as crowdsourced and open government data, as <strong>huge opportunities!<\/strong><\/p>\n<p>I&#8217;m convinced that, with the right people and\u00a0the right mindset, we can do great things, privately or politically, that have the potential to improve our respective environments ever so slightly.<\/p>\n<p>I feel that Switzerland as a democratic and affluent country provides us with an especially friendly\u00a0environment <strong>to get involved, in business, in research, and in societal goals.<\/strong><\/p>\n<p>Thank you all for your attention!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is a re-post of an article that first appeared on www.geo.ebp.ch. Last week I gave\u00a0a talk at the 8th instalment of the GeoBeer series\u00a0on EBP&#8217;s Zurich-Stadelhofen premises and sponsored by EBP and Crosswind. It was\u00a0titled State of the Union: Data as Enabling Tech\u203d You can check out the whole slidedeck on my private &hellip; <a href=\"https:\/\/www.ralphstraumann.ch\/blog\/2014\/09\/the-data-workers-manifesto\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The Data Worker&#8217;s Manifesto<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1815,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[8],"tags":[23,30,168,141,52,169,101,125,130],"class_list":["post-1810","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-piece","tag-cartography","tag-crowdsourcing","tag-data","tag-ddj","tag-geo","tag-infoviz","tag-social-media","tag-visualization","tag-wikipedia"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2014\/09\/straumann-geobeer8-slide-14.png","jetpack_shortlink":"https:\/\/wp.me\/p3pPwF-tc","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1810","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/comments?post=1810"}],"version-history":[{"count":7,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1810\/revisions"}],"predecessor-version":[{"id":2089,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1810\/revisions\/2089"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media\/1815"}],"wp:attachment":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media?parent=1810"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/categories?post=1810"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/tags?post=1810"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}