Published September 7, 2022 | Version v1
Conference paper Open

Topic modeling and classification of scientific disciplines

  • 1. Czech Academy of Sciences
  • 2. Observatoire sociologique du changement

Description

The outcomes of both experiments suggest that topics derived from purely textual data implicitly capture information about disciplines. This quality of topic modelling can be of great benefit when dealing with datasets where disciplinary information is unavailable or unreliable and where citation records are absent (as it remains the case especially in the Humanities). Even if topics cannot fully reconstruct originally assigned disciplines, they still provide reliable pointers about the broader field to which a document belongs. Another advantage is that after training a topic model, no further models need to be built because the deterministic divergence measure offers an adequate solution. However, a wider range of metrics and experiments with other datasets will be needed to fully assess the performance of topics as predictors of disciplines.

The results of this preliminary analysis allow for two possible conjunctures. A conservative perspective posits that topic models provide a representation of scholarly documents that researchers can use either to deterministically predict disciplinary labels or to develop clustering solutions to plausibly mirror a system of scientific disciplines. A more radical implication is that proportional assignment of scholarly work to empirically constructed topics provides a viable alternative to strict disciplinary classifications of various provenience. Topic models seem to be well-suited for capturing the inevitable fluidity and interdisciplinarity of scientific knowledge production. Methodologically, we propose that nested systems of classification offer an expedient framework for the evaluation of automated predictions of labels with fuzzy boundaries, such as scientific disciplines.

Files

223.pdf

Files (494.4 kB)

Name Size Download all
md5:6a4f014842988b1459079cfb33010aee
494.4 kB Preview Download