Grimoire for the Digital Humanities:
The Forbidden Atlas

2.1 Basic Magic Tricks

Authors: Dr Mitchell Harrop, Dr Jon Garber

This chapter is a gentle introduction to some of the tools, concepts and magic ingredients needed in later chapters to create maps. The chapter includes an introduction to structured data. An appreciation of structured data is needed in order to be able to create the kinds of files that mapping software often uses to render great visualisations. This section is followed by a reminder about spreadsheet concepts... ... PREVIEW ONLY ...

... PREVIEW ONLY ... ... quickest and easiest method to save information was to write each part of the information to a file, one after the other, like so:

1, Tilly, Aston, 100, 1 Bourke Street Melbourne,
2, Dr Vera, Brown, 1000, 123 Swanston Street Melbourne,
3, Annette, Crawford, 101, 246 Elizabeth Street Melbourne,
4, … etc.

And there would be some corresponding code written by a Software Engineer to read the file back into the system and do something useful with the information. Code to open files, code to save ...
... PREVIEW ONLY ...

If you were to open the file in a spreadsheet the last element would get a single column rather than be spread over multiple columns, i.e.:

  A B C D
1 Tragedy -3.4 1.0198 [-4,-4,-4,-4,-2,-3,-1,-4,-4,-4]
2 Villains -3.4 0.91652 [-4,-3,-4,-3,-4,-3,-4,-4,-1,-4]
3 Torturers -3.5 0.67082 [-4,-4,-3,-3,-4,-4,-4,-2,-3,-4]
4 Hell -3.6 0.66332 [-4,-4,-4,-4,-4,-2,-3,-4,-3,-4]
... ... ... ... ...

Column D is one column rather than multiple columns for each of the numbers inside the square brackets, despite the numerical values being separated by commas. This is because the file is a Tab Separated file, not a ...... PREVIEW ONLY ...

... ... PREVIEW ONLY ...
not a perfect job, as we don’t know, for example, the currency of the BankBalance. Thus a business probably won’t collapse when the curse manifests because the next Software Engineer can (more) easily pick up the mantle and understand more or less what on earth is going on by looking at the first line.

STEWARDSHIP - DATA DICTIONARIES

Author: Karen M Thompson

Lamp (Melbourne History Resources, 2019a)
Image source: Melbourne History Resources (2019a)

Hello, I’m Karen. In these boxes we will talk about Data Stewardship, which is the role of maximising the value of data for research purposes. These Stewardship boxes are an aside that keep an eye on the bigger research picture.

Good data stewardship is your responsibility as a researcher - only you can prevent poor data stewardship.

Why have we assigned an image of the lamps outside Parliament House in Melbourne as the visual marker for these boxes? Well, good data stewardship lights the way for other researchers, and your future self, to make best use of your research data. And if the lights go out, and it's dark, it sometimes can feel a bit spooky and you may temporarily lose your way. So lights felt appropriate.

In this first Stewardship box we’re considering ‘Data Dictionaries’.

You should consider building Data Dictionaries for your work. This is basically an explanation of each of the columns in your spreadsheet or database: its unique name (best not to have two things with the same, or easily confused, names); the meaning; if it’s optional or absolutely required; format; allowable values; if it’s raw data or made up from other data components; and more. It can be a simple sheet in your spreadsheet which describes the contents of another sheet. With more complex data, you’d also include descriptions of how each column relates to others.

But let’s consider the limitations of this flavour of in-built documentation for the second file example. The second file doesn’t need a first line specifying the names of each column of data because there just happens to be a peer reviewed paper (Hutto and Gilbert, 2014) describing in great detail all aspects of the system that uses this file. Unlike the CEO exercising poor management skills and demanding that the engineers consistently make tight deadlines and fully document everything at all times and do a million other things (sigh), the peer reviewed output is the deadline. The peer reviewed paper makes for great documentation.

But what would the headings for the second file be? Having read the Hutto and Gilbert paper, perhaps something like this:

Word → Score → Variance → HumanScores
Magnificently → 3.4 → 0.66332 → [3, 3, 3, 4, 4, 2, 4, 4, 3, 4]
Ecstasy → 3.3 → 1.18743 → [4, 4, 3, 4, 4, 0, 3, 3, 4, 4]
... etc.

If you were encountering such a file for the first time you might not be able to figure out what the HumanScores are all about. It rightly took a lot of ink to explain the concept in the Hutto and Gilbert paper. According to the paper, these are scores that humans made for each of the words in the file, ranging from negative to positive and representing the sentiment associated with them. That probably doesn’t make much sense if you haven’t read the paper. Just trust us. Or go read the Hutto and Gilbert paper yourself. Reading well into the paper one can find that the Variance was calculated from the HumanScores. But the HumanScores isn’t used anywhere within the code that uses this file! It is simply there for completeness and contestability by other researchers. Just imagine trying to work out what HumanScores means without access to or knowledge of the peer reviewed paper. You wouldn’t be able to find any use of HumanScores in the software because it isn’t used:
... PREVIEW ONLY ...

The nameEntry element contains two part elements. The part elements are said to be the children of the nameEntry parent. This particular standard uses attributes (e.g. localType="familyname") to define the data. Looking at the first part element, it has an attribute of localType which has the value of familyname. Inside the element is Barry, the last name of the person in question. Looking at the second part element, it has an attribute of givenname. The element contains the first name of the person in question (Redmond). The standard this file uses defines what constitutes a family name and what constitutes a given name. Again, don’t worry if all this terminology goes in one ear and out the other. We’ll remind you on an as-needed basis.

YOUR TURN

  • What other information is in the example above? Read through it.
    • When did Redmond Barry live?
    • Where was he born?
    • Where did he die?
    • What was his occupation?
  • See, it’s not so scary when you look at individual elements and take the time to see what’s going on.

However, there could just as easily be a different standard which, instead of ...
... PREVIEW ONLY ...

2.3 A Gentle Reminder about Spreadsheets

Spreadsheets are the original “killer app”. They are the software that prompted people and businesses to go out and get their very own personal computers. They were and still are a revolution. Although not a very exciting revolution.

This Grimoire does NOT use Microsoft Excel. You need to pay for Microsoft Excel or be a member of an institution that pays for you. Instead, we use the free Google Sheets in our examples. Google Sheets and Excel are very comparable and compatible spreadsheet programs, but with a few quirky differences. As such, we endeavour, wherever possible, to be spreadsheet agnostic in examples. The examples we provide will often work without any changes in Microsoft Excel.

Google Sheets exists within Google Docs. Google Docs is Google’s answer to the Microsoft Office Suite. Instead of a Microsoft Word document, there’s a Google Doc. Instead of Microsoft Excel, there’s Google Sheets. PowerPoint? Google Slides. And so on. It’s an invasion of doppelgangers. But instead of being a desktop application you install, Google Docs runs in your web browser, so you can use the programs anywhere that you’ve got a computer and the internet. Brilliant, huh? Furthermore, the files are all saved in the cloud, also known as Google Drive. You need an account on Google and to activate Google Sheets. You can sign up here or use a gmail account if you already have one:

https://www.google.com.au/docs/about/

Now it’s time for a quick refresher on the general spreadsheet skills you ought to have coming into this book. If you already use spreadsheets extensively in your day to day work, you can go ahead and skip the rest of this section.

Recall spreadsheets have rows and columns. The rows have numbers and the columns have letters:

  A B C D
1 Wands Broomsticks    
2 100 200    
3 350 450    

Recall that there are cells. A1 is a cell. It contains the value Wands in the example above. Cell B1 contains the value Broomsticks. Just like the game Battleships, right? Cell A2 is 100. To make sure you are paying attention, what’s in cell B3?

Yep, 450. You sunk my broomstick.

Spreadsheets can have multiple sheets. Sometimes a sheet is called a worksheet or workbook. Generally, access to multiple sheets is via tabs that are found at the bottom of the interface, in this example this is where it says Sheet 3:

  A B C D E
1 Crystal Ball        
2          
3          
... ... ... ... ... ...
  Sheet 1 Sheet 2 Sheet 3  

Recall that spreadsheets have functions and formulas for doing things like calculations and that they start with an equal ( = ) sign:

  A B C D
1 Wands Broomsticks    
2 100 200 =A1+B2  
3 350 450 =SUM(A3,B2,A4)  
4 575 675 =AVERAGE(A2:A4)  
5 ... ... ... ...

The formula in cell C2 above adds the values in cells A1 and B2 together. As such, the values of 100 and 200 are added together. The formula runs when you hit enter after typing in the formula. Cell C1 will then appear to be 300, but the formula will still be there, underneath it all, waiting to strike.
... PREVIEW ONLY ...

Next: 3. The Ley Lines and Contours of Melbourne