From (variationist) linguistic research to a science of linguistic variation – LIÖ and DiÖ from the lens of reproducibility and replicability


Reproducibility, a concept popularized within the field of computation studies, serves as an extension to the concept of replicability, which refers to being able to yield the same or similar results when an experiment is conducted again. Replicability is an important bench mark of the scientific method, and is a gold-standard to ensure that results can be considered accurate. However, as has been pointed out before (see cf. Bisang 2011) replicability is unattainable in empirical linguistics, as “the examples that constitute the data basis are part of a text that is unique and thus is beyond reproducibility1(Bisang 2011, 253). However, as laid out by a position paper by several linguists in 2018 (Berez-Kroeker et al.), there exists a valid alternative to replicability, which yields itself perfectly for (empirical) linguistics – that of reproducibility. In contrast to replicable research methods, which are methods “which can be recreated elsewhere by other scientists, leading to new data” (Berez-Kroeker et al. 2018, 4), reproducible research provides the original research data upon which conclusions are derived from with others, allowing them to conduct independent analysis on the original research data to see if they reach the same conclusions.  

Gezelter (2009) highlighted the need for reproducibility when he argued for open coding practice in computer science: commercial code, which is not open-source but is used in experiments, means that an experiment is only reproducible in theory – it would require tremendous amounts of time to recreate the code. A result stemming from non-open-source-code therefore – in his words – “may be research, and it may be important, but unless enough details of the experimental methodology are made available so that it can be subjected to true reproducibility tests by skeptics, it isn’t Science”. This argument is transferable to (empirical) linguistics – data collection is either difficult, expensive, or – in the case of historical data – downright impossible. And even replicable linguistic research methods are still not fully replicable, as Bisang (2011) points out.

The projects “Deutsch in Österreich/German in Austria” (DiÖ) (cf. Budin/Elspaß/Lenz/Newerkla/Ziegler 2018a) as well as the “Wörterbuch der bairischen Mundarten in Österreich/Dictionary of Bavarian Dialects in Austria” (WBÖ) (cf. Stöckle 2021) and its publication platform LIÖ (“Lexikalischs Informationssystem Österreich/Lexical Information System Austria”) (LIÖ) have similar objects of investigation, namely linguistic diversity in Austria. However, there are almost 100 years between the start of the two projects. This has a direct influence on the way, how standards of reproducibility or replicability can be taken into account. This is partly due to the methodological-historical background of the two projects. How both projects deal with research data in light of the concepts of replicability and reproducibility will be the subject of the presentation.

Bisang (like many others) uses the term ‘reproducability‘ in the sense of ‘replicability’


