Webinars    Forthcoming workshops    Previous workshops    Office Hours


Our webinar series is a joint initiative with the Language Technology and Data Analysis Laboratory (LADAL), (School of Languages and Cultures, University of Queensland). LADAL sponsored webinars take place in the alternate months.

All webinars take place at 8:00PM Brisbane time which is UTC+10. Zoom links will be available one week prior to the event.

October 3 2022 - Paweł Kamocki: European Union Data Protection initiatives and their consequences for research

Abstract: The European Union, with its large population and GDP, is a leading force in regulatory globalisation. This webinar will discuss recent developments in legal frameworks affecting research data in Europe. Apart from the General Data Protection Regulation which, since its entry into application in 2018, has become an international standard of personal data protection, the recent introduction of statutory copyright exceptions for Text and Data Mining will also be discussed. Moreover, the webinar will also include a presentation of the most recent changes in EU law, such as the Data Governance Act and the Artificial Intelligence Act, which are expected to enter into application in the coming years.

Paweł Kamocki is a legal expert in Leibniz-Institut für Deutsche Sprache, Mannheim. He studied linguistics and law, and in 2017 obtained his doctorate in law from the universities of Paris and Münster for a thesis on legal aspects of data-intensive university research, with a focus on Knowledge Commons. He worked as a research and teaching assistant at the Paris Descartes university (now: Université de Paris), then also in the private sector. He is certified to work as an attorney in France. An active member of the CLARIN community since 2012, he currently chairs the CLARIN Legal and Ethical Issues Committee. He also worked with other projects and initiatives in the field of research data policy (RDA, EUDAT) and co-created several LegalTech tools for researchers. One of his main research interests are legal issues in Machine Translation.

Zoom link

August 1 2022 - Václav Cvrček: The Czech national Corpus

Václav Cvrček is a linguist who deals with the description of the Czech language, especially with the use of large electronic corpora and quantitative methods. In 2013-2016 he worked as the director of the Czech National Corpus project, since 2016 he has been the deputy director. Recently, he has been focusing on research on textual variability and corpus-based discourse analysis with a focus on online media.

June 6 2022 - Barbara McGillivray: The Journal of Open Humanities Data

Barbara McGillivray is a Turing Research Fellow at The Alan Turing Institute, and Editor in Chief of the Journal of Open Humanities Data. Since September 2021 she is also a lecturer in Digital Humanities and Cultural Computation at the Department of Digital Humanities of King’s College London. Before joining the Turing, she was language technologist in the Dictionary division of Oxford University Press and data scientist in the Open Research Group of Springer Nature. Her research at the Turing is on how words change meaning over time and how to model this change in computational ways. She works on machine-learning models for the change in meaning of words in historical times (Ancient Greek, Latin, eighteen-century English) and in contemporary texts (Twitter, web archives, emoji). Her interdisciplinary contribution covers Data Science, Natural Language Processing, Historical Linguistics and other humanistic fields, to push the boundaries of what academic disciplines separately have achieved so far on this topic.

4 April 2022 - Keoni Mahelona: A practical approach to Indigenous data sovereignty

Keoni Mahelona is the Chief Technical Officer of Te Hiku Media where he is a part of the team developing the Kaitiakitanga Licence. This licence seeks to balance the importance of publicly accessible data with the reality that indigenous peoples may not have access to the resources that enable them to benefit from public data. By simply opening access to data and knowledge, indigenous people could be further colonised and taken advantage of in a digital, modern world. Therefore Keoni is committed to devising data governance regimes which enable Indigenous people to reclaim and maintain sovereignty over indigenous data.

Forthcoming workshops

Jefferson Transcript Search Tool

The Search Tool project uses programming to explore how to easily search and manipulate transcripts without the need to ‘clean’ the transcript. A browser-based tool has been developed, designed to be used by researchers unfamiliar with programming.

The workshop will include a presentation about the process of development, and an interactive technical demonstration where you will be able to use the tool with a transcript of your own. After using the tool, there will be a facilitated discussion about future work on the tool and implications for using the tool as part of analytic workflow, particularly for collections-building.

You won’t need any technical knowledge to enjoy this workshop, but familiarity with basic computer usage will be helpful. It is intended for Ethnomethodology / Conversation Analysis practitioners familiar with Jefferson transcripts, however anyone with an interest is welcome.

The workshop will be presented by Evelyn Ansell and is an outcome of her Career Development placement with Australia’s Academic and Research Network (AARNET). The Jupyter Notebook tool and this workshop have been developed during that placement.

When: Friday 17 March 2023, 1:30PM - 3:30PM (AEST)

Where: University of Queensland, Chamberlain Building 35, Room 104


A hands-on guide to Semantic Tagger for your text data analysis

The Australian Text Analytics Platform (ATAP) project is a project that aims to provide researchers with the tools and training for analysing, processing, and exploring text. As part of this project, we have adapted with permission, a Semantic Tagger, developed by the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University. This tool uses the Python Multilingual UCREL Semantic Analysis System (PyMUSAS) to tag your text data so that you can extract token level semantic tags from your text. In addition to the USAS tags, this tool can also recognize Multi Word Expressions (MWE), i.e., expressions formed by two or more words that behave like a unit such as ‘South Australia’, and identifies lemmas and Part-of-Speech (POS) tags in the text. For example, in the sentence ‘President Joe Biden attended two meetings today’, the tool will tag each token with its semantic tag like this -> ‘President Joe Biden’: MWE of [Personal names], ‘attended’: [Participating], ‘two’: [Number], ‘meetings’: [Participating] and ‘today’: [Time: Present; simultaneous]. This tool is available in both English and multi-lingual (Chinese, Italian and Spanish) versions and supports saving the results locally for further analysis, enabling you to gain meaningful insights into your research questions.

When: Wednesday 22 March 2023, 3:00PM - 4:30PM (AEDT)

Where: Online

Details and Registration

Office Hours

We invite Australian researchers working with linguistics, text analytics, digital and computational methods, social media and web archives, and much more to attend our regular online office hours, jointly hosted with the Digital Observatory. Bring your technical questions, research problems and rough ideas and get advice and feedback from the combined expertise of our ARDC research infrastructure projects. No question is too small, and even if we don’t know the answer we are likely to be able to point you to someone who does.

These sessions run over Zoom from 2-3pm (Australia/Sydney time) every second Tuesday - details.