This chapter aims to improve the detailed XML datafiles behind eMelbourne encyclopedic entries (eMelbourne.net.au) by deriving location data from the existing text entries. For example, from the entry “ACDC Lane” determining that the location for that entry ought to be somewhere midway along the AC/DC lane (named for the band) in the Melbourne central business district.
The eMelbourne website has been fairly popular over many years. It’s worth a brief digression into how to measure the popularity of sites as many projects need to report on this either as part of grant reporting requirements or to get new grants.
According to Google Analytics, from the 1st of January 2010 through to the 30th of April 2020 the eMelbourne website had 2,699,418 page views. However, page views can be a deceptive measure at the best of times. Instead, a “session” more accurately reflects user engagement. A session is defined as a group of user interactions on a website that take place within a given time frame (in this case 30 minutes) and is a standard measure within Google Analytics to track website usage. Thus a single session can contain multiple page views within the timespan. Sessions are a much more authentic way of reporting numbers because, for example, some page “views” are caused by a user clicking back on their browser or closing a tab by mistake and then opening it again. Some page “views” are also caused by poor navigation structure, i.e. a user has to click around everywhere to find what they are looking for because the menu structure is counter intuitive and poorly designed. We have unfortunately seen too many projects over the years that have inflated views because of their poor navigation.
For eMelbourne, 93.39% of page views were for encyclopedic entries. The remaining page views were largely made up of the homepage and search pages. This implies page views are not artificially inflated by a poorly designed searching or browsing interface. The site is popular because of the resources it houses. People want the articles!
The eMelbourne website had 1,507,271 sessions. The average pages per session was 1.79 pages. The site had an average of 12,155 sessions per month, with consistent performance over the 125 month timeframe (see figure 8.0.1 below).
The regular sharp low points are in December of each year, while the regular less sharp drops occuring during the mid-year semester break for Australian universities. This indicates usage associated with teaching can be estimated at roughly 5000 sessions during the semesters. Other usages are a consistent 7000 sessions a month, every month. Again, we’ve seen projects report on the overall success of their sites in general, but neglect to disclose how much of the success can be explained by regular usage of the site in assignments for large first year undergraduate classes. While engaging undergraduates is important, it’s important to disclose and understand what primarily drives a site’s popularity.
Page views and sessions on eMelbourne are driven by those who have just stumbled on the site and by heavy users and everyone in between. This is shown in the table below. Although users did come back to the site again and again and again, a small number of users did not account for a huge percentage of the page views. In other words, a few dozen users did not account for most of the site traffic.
How many times did they come to the site?
Sessions |
(How many sessions were accounted for by users coming back this many times?) Page Views |
(How many page views were accounted for by users coming back this many times?) |
---|---|---|
1 | 1,208,734 | 2,020,560 |
2 | 147,257 | 304,600 |
3 | 49,279 | 113,543 |
4 | 24,302 | 59,441 |
5 | 14,570 | 36,393 |
6 | 9,664 | 26,365 |
7 | 6,924 | 19,068 |
8 | 5,197 | 13,523 |
9-14 | 15,674 | 41,397 |
15-25 | 9,786 | 24,880 |
26-50 | 6,934 | 18,694 |
51-100 | 4,290 | 11,918 |
101-200 | 2,518 | 5,922 |
201+ | 2,142 | 3,114 |
82% of visits to eMelbourne came as a result of organic searches (i.e. they used Google or Bing and an encyclopedic entry page came up in the search results). 13% of visits were from users going directly to some page on eMelbourne (i.e. they entered the URL from memory or had it bookmarked or even copied the link from an email directly into their browser). Referrals from other sites accounted for 3.4% of visits and social media links accounted for 1% of visits.
There is variety within these referrals. Wikipedia articles draw on eMelbourne articles as a source and people follow those sources/links from Wikipedia to eMelbourne. Links from Wikipedia using a desktop computer accounted for 11,050 sessions, with an average of 2.5 pages viewed per session (meaning users came for one page but looked at other pages too - which might indicate they were not disappointed by what they found). While people using Wikipedia on their mobile phones accounted for 1534 sessions and an average of 1.83 pages per session. Those mobile phone users just don’t have a great attention span!
Other notable referrals (>1000 referrals) came from OnlyMelbourne.com.au (5657 sessions), Buzzfeed.com (1325 but with users only staying for a single page on average - those Buzzfeed readers clearly have a shorter attention span) and libguides.caulfieldgs.vic.edu.au (1111).
But these absolute numbers need context. Are these numbers of views equivalent to a popular home cooking blog? Or are they equivalent to what a newspaper would get? Maybe the finance section of a newspaper? So for context The University of Melbourne’s Institutional Repository for research publications had 6,289,125 views for items for the same time period as the analysis above (see Unimelb, 2020). That means that eMelbourne was 42.92% as popular as the university’s entire Institutional Repository of publications. eMelbourne was almost half as popular! That’s pretty impressive.
So eMelbourne resources have been pretty popular over the years. In the next section we will see how to download the data behind these resources in order to work on them and create maps.
This section starts by downloading a single encyclopedia entry into a spreadsheet. It then semi-automates the process to work for multiple entries. The single entry shall be the entry for the Young & Jackson Hotel:
http://www.emelbourne.net.au/biogs/EM01672b.htm
It’s time to teach you more about how to be a hacker. We’re going to look at the source code behind the web page to see what the inner workings are and what secrets can be found. The source behind web pages is HTML. It is the language that describes the visuals of the site in plain text. You can see the source by unplugging from the matrix. Alternatively, you can right click on a web page and “View Page Source”. This involves a lot less fight scenes. ... PREVIEW ONLY ...
Next: References