Webinars    Forthcoming workshops    Previous workshops    Office Hours


Our webinar series is a joint initiative with the Language Technology and Data Analysis Laboratory (LADAL), (School of Languages and Cultures, University of Queensland). LADAL sponsored webinars take place in the alternate months.

All webinars take place at 8:00PM Brisbane time which is UTC+10. Zoom links will be available one week prior to the event.

October 3 2022 - Paweł Kamocki: European Union Data Protection initiatives and their consequences for research

Abstract: The European Union, with its large population and GDP, is a leading force in regulatory globalisation. This webinar will discuss recent developments in legal frameworks affecting research data in Europe. Apart from the General Data Protection Regulation which, since its entry into application in 2018, has become an international standard of personal data protection, the recent introduction of statutory copyright exceptions for Text and Data Mining will also be discussed. Moreover, the webinar will also include a presentation of the most recent changes in EU law, such as the Data Governance Act and the Artificial Intelligence Act, which are expected to enter into application in the coming years.

Paweł Kamocki is a legal expert in Leibniz-Institut für Deutsche Sprache, Mannheim. He studied linguistics and law, and in 2017 obtained his doctorate in law from the universities of Paris and Münster for a thesis on legal aspects of data-intensive university research, with a focus on Knowledge Commons. He worked as a research and teaching assistant at the Paris Descartes university (now: Université de Paris), then also in the private sector. He is certified to work as an attorney in France. An active member of the CLARIN community since 2012, he currently chairs the CLARIN Legal and Ethical Issues Committee. He also worked with other projects and initiatives in the field of research data policy (RDA, EUDAT) and co-created several LegalTech tools for researchers. One of his main research interests are legal issues in Machine Translation.

Zoom link

August 1 2022 - Václav Cvrček: The Czech national Corpus

Václav Cvrček is a linguist who deals with the description of the Czech language, especially with the use of large electronic corpora and quantitative methods. In 2013-2016 he worked as the director of the Czech National Corpus project, since 2016 he has been the deputy director. Recently, he has been focusing on research on textual variability and corpus-based discourse analysis with a focus on online media.

June 6 2022 - Barbara McGillivray: The Journal of Open Humanities Data

Barbara McGillivray is a Turing Research Fellow at The Alan Turing Institute, and Editor in Chief of the Journal of Open Humanities Data. Since September 2021 she is also a lecturer in Digital Humanities and Cultural Computation at the Department of Digital Humanities of King’s College London. Before joining the Turing, she was language technologist in the Dictionary division of Oxford University Press and data scientist in the Open Research Group of Springer Nature. Her research at the Turing is on how words change meaning over time and how to model this change in computational ways. She works on machine-learning models for the change in meaning of words in historical times (Ancient Greek, Latin, eighteen-century English) and in contemporary texts (Twitter, web archives, emoji). Her interdisciplinary contribution covers Data Science, Natural Language Processing, Historical Linguistics and other humanistic fields, to push the boundaries of what academic disciplines separately have achieved so far on this topic.

4 April 2022 - Keoni Mahelona: A practical approach to Indigenous data sovereignty

Keoni Mahelona is the Chief Technical Officer of Te Hiku Media where he is a part of the team developing the Kaitiakitanga Licence. This licence seeks to balance the importance of publicly accessible data with the reality that indigenous peoples may not have access to the resources that enable them to benefit from public data. By simply opening access to data and knowledge, indigenous people could be further colonised and taken advantage of in a digital, modern world. Therefore Keoni is committed to devising data governance regimes which enable Indigenous people to reclaim and maintain sovereignty over indigenous data.

Forthcoming workshops

Exploring Digital Text Collections with Juxtorpus: A Taster Webinar on the Latest ATAP Text Analysis Tool

Join us for a hybrid taster webinar on the latest addition to the suite of ATAP text analysis tools - Juxtorpus. Developed to provide a unified framework for managing and exploring text contents and metadata, Juxtorpus offers a Corpus package that enables flexible building, exploration, and slicing of your corpus while maintaining its shape, and a Jux package that allows for easy comparison and highlighting of differences between any two corpora with tools and visualisations that come off-the-shelf. During the webinar, we’ll also show you how to use other ATAP tools in combination with the Corpus to create a reusable workflow that will boost your analysis capabilities.

This 1.5-hour webinar will come with minimal hands-on opportunities, and we invite anyone interested in learning how to handle and analyse their digital text collections to join us. No programming knowledge or skills are required.

When: 10:30am – 12pm, Thursday, 25th May 2023

How: Hybrid (In-person for University of Sydney participants, Online for other participants)

Where: Zoom link and location will be sent a few days before the event.


Workshop on Language Corpora in Australia

Over decades of work in Australia, significant collections of language data have been amassed, including of varieties of Australian English, Australian migrant languages, Australian Indigenous languages, sign languages and others. These collections represent a trove of knowledge not only of language in Australia, but also of Australia’s social and cultural history. And yet, not all are well known and many lack published descriptions. The purpose of this workshop is to provide an opportunity to share information about existing language corpora in Australia, with a view to producing a special issue of the Australian Journal of Linguistics that introduces a selection of these corpora, explores how they can contribute to our understanding of language, society, and history in Australia, and considers avenues that such corpora open up for future research.

This workshop is being run as part of the Language Data Commons of Australia (LDaCA), which is working to build national research infrastructure for the Humanities and Social Sciences, facilitating access to and use of digital language corpora for linguists, scholars across the Humanities and Social Sciences, and non-academics.

Abstract submission

For a 20 min presentation, please submit a 250-300 word abstract in English (excluding references). The presentation should include the following information:

  • Speech community/fieldsite: Describe the location of the community and/or their brief history in Australia, the languages spoken and/or signed, and their current status.
  • Corpus design principles: Specify the sample size, sociolinguistic background of the participants, method of data collection and/or genre (e.g. sociolinguistic interviews, natural conversations, oral histories, elicited data, etc.); data format (written/spoken/audio/video, etc.) and where it is stored.
  • Corpus findings and implications: Summarise some key findings from the corpus and discuss other insights that might be obtained from the data in current or future work.

Important dates

22 May Abstracts due

5 June Notification of acceptance

3 July Workshop

How to Submit: Please submit your abstract by 22 May on


Please contact either

Office Hours

We invite Australian researchers working with linguistics, text analytics, digital and computational methods, social media and web archives, and much more to attend our regular online office hours, jointly hosted with the Digital Observatory. Bring your technical questions, research problems and rough ideas and get advice and feedback from the combined expertise of our ARDC research infrastructure projects. No question is too small, and even if we don’t know the answer we are likely to be able to point you to someone who does.

These sessions run over Zoom from 2-3pm (Australia/Sydney time) every second Tuesday - details.