Workshop on Data Migration Skills
The Language Data Commons of Australia infrastructure is based on widely-accepted standards such as Research Object Crates (RO-Crate) and the Oxford Common File Layout (OCFL). The project has been running for almost two years and the processes and tools for aligning data with these standards are now well-developed. This workshop aims to show the application of those tools to data in a variety of formats to efficiently migrate material to the LDaCA standards.
The workshop is intended for data librarians and other professionals who are interested in these issues, but participation is open to anyone. We hope that participants will have some experience of working with code and/or metadata and that they will be able to bring datasets which they work with (or are responsible for) to use in the practical exercises which will make up a large part of the workshop. However, we will have example datasets available and we also are open to the possibility of participants working in teams based on complementary skills.
The workshop will run over two days; the first day will be a hybrid event and the second day will be for in-person attendees only.
Day 1: Presentations / demonstrations (in person preferred, but hybrid available)
- Overview of tech stack
- Survey of tools and demos of individual components
- Intro to PARADISEC practices
- Using spreadsheets for metadata
- Show and tell of datasets from participants
Day 2: Practical (in person): We will run practical sessions with materials provided by the participants where possible, or choose example datasets and provide customised training.
When: July 20 and 21, 2023
Where: Engma Room, Coombs Building, Australian National University
Registration: If you wish to attend both days, you need to register for each day separately.
Our webinar series is a joint initiative with the Language Technology and Data Analysis Laboratory (LADAL), (School of Languages and Cultures, The University of Queensland).
October 3 2022 - Paweł Kamocki: European Union Data Protection initiatives and their consequences for research
Abstract: The European Union, with its large population and GDP, is a leading force in regulatory globalisation. This webinar will discuss recent developments in legal frameworks affecting research data in Europe. Apart from the General Data Protection Regulation which, since its entry into application in 2018, has become an international standard of personal data protection, the recent introduction of statutory copyright exceptions for Text and Data Mining will also be discussed. Moreover, the webinar will also include a presentation of the most recent changes in EU law, such as the Data Governance Act and the Artificial Intelligence Act, which are expected to enter into application in the coming years.
Paweł Kamocki is a legal expert in Leibniz-Institut für Deutsche Sprache, Mannheim. He studied linguistics and law, and in 2017 obtained his doctorate in law from the universities of Paris and Münster for a thesis on legal aspects of data-intensive university research, with a focus on Knowledge Commons. He worked as a research and teaching assistant at the Paris Descartes university (now: Université de Paris), then also in the private sector. He is certified to work as an attorney in France. An active member of the CLARIN community since 2012, he currently chairs the CLARIN Legal and Ethical Issues Committee. He also worked with other projects and initiatives in the field of research data policy (RDA, EUDAT) and co-created several LegalTech tools for researchers. One of his main research interests are legal issues in Machine Translation.
August 1 2022 - Václav Cvrček: The Czech national Corpus
Václav Cvrček is a linguist who deals with the description of the Czech language, especially with the use of large electronic corpora and quantitative methods. In 2013-2016 he worked as the director of the Czech National Corpus project, since 2016 he has been the deputy director. Recently, he has been focusing on research on textual variability and corpus-based discourse analysis with a focus on online media.
June 6 2022 - Barbara McGillivray: The Journal of Open Humanities Data
Barbara McGillivray is a Turing Research Fellow at The Alan Turing Institute, and Editor in Chief of the Journal of Open Humanities Data. Since September 2021 she is also a lecturer in Digital Humanities and Cultural Computation at the Department of Digital Humanities of King’s College London. Before joining the Turing, she was language technologist in the Dictionary division of Oxford University Press and data scientist in the Open Research Group of Springer Nature. Her research at the Turing is on how words change meaning over time and how to model this change in computational ways. She works on machine-learning models for the change in meaning of words in historical times (Ancient Greek, Latin, eighteen-century English) and in contemporary texts (Twitter, web archives, emoji). Her interdisciplinary contribution covers Data Science, Natural Language Processing, Historical Linguistics and other humanistic fields, to push the boundaries of what academic disciplines separately have achieved so far on this topic.
4 April 2022 - Keoni Mahelona: A practical approach to Indigenous data sovereignty
Keoni Mahelona is the Chief Technical Officer of Te Hiku Media where he is a part of the team developing the Kaitiakitanga Licence. This licence seeks to balance the importance of publicly accessible data with the reality that indigenous peoples may not have access to the resources that enable them to benefit from public data. By simply opening access to data and knowledge, indigenous people could be further colonised and taken advantage of in a digital, modern world. Therefore Keoni is committed to devising data governance regimes which enable Indigenous people to reclaim and maintain sovereignty over indigenous data.
We invite Australian researchers working with linguistics, text analytics, digital and computational methods, social media and web archives, and much more to attend our regular online office hours, jointly hosted with the Digital Observatory. Bring your technical questions, research problems and rough ideas and get advice and feedback from the combined expertise of our ARDC research infrastructure projects. No question is too small, and even if we don’t know the answer we are likely to be able to point you to someone who does.
These sessions run over Zoom from 2-3pm (Australia/Sydney time) every second Tuesday - details.