Open licenses for archaeological data matter: the case of AustArch
A few days ago Internet Archaeology published a superb data paper featuring AustArch, a database of more than 5000 radiocarbon and other archaeometric dates from Australia. To my surprise, while the data paper is published as open access under a CC-BY license, the dataset itself is available from the Archaeology Data Service under their custom license, that is not open.
I want to take this example as an opportunity to revisit the ADS terms of use. I hope I to show that both in this specific case and in general standard open licenses offer significant advantages. The ADS is a cornerstone of data sharing in worldwide archaaeology both directly as a repository and indirectly as a leading organisation for other, newer data archives (such as tDAR, DANS and MOD). Their role for the entire community is so important that I feel we’re really missing a lot by having antiquate (no pun intended) terms of use in place.
I will start quoting some relevant excerpts from the data paper: Williams, A.N., Ulm, S., Smith, M. and Reid J. (2014). AustArch: A Database of 14C and Non-14C Ages from Archaeological Sites in Australia – Composition, Compilation and Review (Data Paper). Internet Archaeology, (36). http://dx.doi.org/10.11141/ia.36.6
The dataset includes all radiocarbon and non-radiocarbon ages associated with archaeological deposits published in the last 60 years of research (Figure 3). The dataset also includes extensive, but not comprehensive, unpublished/grey literature data, mainly from New South Wales and Queensland. Some unpublished/grey literature from Victoria and Western Australia is also included through personal communication and/or other databases
The data sources for AustArch:
Overall, information has been obtained from 1,067 publications in the development of the dataset, with several hundred more being examined but failing to contain pertinent data. Of these publications, 583 (55%) were journal articles; 51 (5%) were books; 159 (15%) were book chapters; 100 (9%) were unpublished undergraduate or postgraduate theses; 164 (15%) were unpublished consulting/commercial reports; and 10 (1%) came from other sources.
Please note that 24% comes from unpublished literature.
What use has been made of the dataset so far?
Since the development and release of various parts of the dataset, it has proved a well-used resource for a range of research and consulting/commercial works, however its main application has been in the development of time-series or summed probability analyses.
The dataset could be improved by incorporating more data from the commercial sector:
In the short-term, the dataset can be significantly improved by the incorporation of all unpublished data, particularly produced in the commercial/consulting sector. The data are not readily available, often contained in State or local repositories and/or by individual companies. Commercial/consulting work has been extensive in the last decade, most notably in Victoria and Western Australia, and the incorporation of data from these States would provide a significant increase in ages for both arid and temperate regions.
Now take all the above and put it together with the non-commercial license used at the Archaeology Data Service: what is wrong? I see two main issues here: standardisation and ethics.
Creative Commons open licenses (that is, CC-BY, CC-BY-SA and CC0) are well known and internationally recognised as marking content that is readily usable for any purpose. They’re immediately recognisable by users, who know the Creative Commons brand. They’re machine readable, allowing automated tools to retrieve metadata about permissions and restrictions (e.g. share-alike). My hero Colleen Morgan says that archaeologists should use Creative Commons (open) licenses for everything. Using anything else, even for content or data that is available for downloading, is not unlike putting it on display without allowing bystanders any actual interaction. While it may seem “more permissive”, I can’t see how requiring signed […] Students’ Undertaking Form is in any way better than a boring, boilerplate Creative Commons deed.
In this respect, the ADS terms of use are slightly worse than a (standard) non commercial license: instead of restricting reuse and distribution, they only allow you
1. To use and to make personal copies of any part of the Data Collections only for the purposes of non-commercial research or teaching, as specified in the accompanying application.
which from an ethics point of view is a very slippery slope, that’s basically prohibiting any professional archaeologist from even downloading the data (make personal copies). And don’t forget that teaching is not necessarily non-commercial (even when done by universities). I remember having this discussion years ago with the ADS staff, at least as early as 2010, and I’m sorry we haven’t been moving forward much. What is more embarassing, however, is that data papers as a new form of academic literature were introduced for the purpose of encouraging open data, not as a mere academic exercise in multiplying the number of (open access) journals. Of course there are more “insane” requirements in the following points but that’s not really the point. Instead, can we move on to Creative Commons open licenses, please?
As Colleen Morgan succinctly put it:
What about the professional archaeologists among us? They need media [and data] too.
For the record, and to put my own observations about AustArch in a wider perspective, a short and incomplete list of databases of radiocarbon dates follows. None of the databases below is available as a download at the moment I am writing this.
- http://www.archeometrie.mom.fr/banadora/
- http://www.crt.state.la.us/archaeology/radiocarbonDB.aspx
- http://www.canadianarchaeology.ca/
- http://www.waikato.ac.nz/cgi-bin/nzcd/search.pl
- http://pidba.utk.edu/dating.htm
- http://www.museumwales.ac.uk/en/radiocarbon/database/
- http://context-database.uni-koeln.de/
- http://www.gla.ac.uk/centres/nercrcl/results.htm
- https://sites.google.com/site/matthewboulanger/research/vermont-radiocarbon-database
- http://ees.kuleuven.be/geography/projects/14c-palaeolithic/
- http://c14.kikirpa.be/
- https://www.radiocarbon.org/Info/index.html#databases
This is a great observation about the impact of choice of data license on CRM workers. Victoria Stodden has also dealt with the question of licenses for research output in her ‘Reproducible Research Standard’ which recommends CC-BY for media (text and figures), MIT for code (data analysis scripts) and CC-0 for data. See here for more details: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1362040
There’s a fairly extended discussion of the issues presented here on the working group mailing list: https://lists.okfn.org/pipermail/open-archaeology/2014-July/thread.html with input from several members and the ADS staff.