OAPEN usage statistics

Author

matti.lassila@tuni.fi

Modified

February 16, 2024

Reproduction of analysis published in Snijder, R., 2023. Measured in a context: making sense of open access book data. Insights: the UKSG journal, 36(1), p.20. https://doi.org/10.1629/uksg.627

Source code available at https://github.com/mjlassila/oapen-usage-stats.

Data downloaded from https://doi.org/10.5281/zenodo.7799222.

Overview of the data

Let’s create some basic descriptive tables to check the structure of the data. Based on result tables published in the article, we know the range of values variables should have.

Variable language_simple should have values English, German and Other. so there should be three rows in the resulting dataset when summarizing the data by language_simple. It seems that there might be errors in the data, because the summary table has 457 rows.

Let’s glimpse how language_simple looks like.

language_simple n
English 11029
German 4521
Other 1108
720 721
40 6
113 5
239 5
47 5
84 5
110 4

Apparently some rows have erroneus values at least in language_simple variable.

Classification (eg. general subject of the publication) variable should have 13 possible values

classification n
A The arts 1205
C Language 643
D Literature & literary studies 1446
G Reference, information & interdisciplinary subjects 534
H Humanities 3482
J Society & social sciences 5361
K Economics, finance, business & management 1477
L Law 602
M Medicine 602
P Mathematics & science 668
R Earth sciences, geography, environment, planning 463
T Technology, engineering, agriculture 591
Total_Item_Requests 636
U Computing & information technology 304

Surprisingly, there is value Total_Item_Requests in the midst of actual classification codes.

Let’s see how the erroneous rows look like. In total, 1356 rows might have erroneous data.

Sample of erroneous dataset rows
title doi total_2022 classification language_simple
The Prague School and Theories of Structure 10.14220/9783862347049 Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 Total_Item_Requests 1800
Financiranje projektov in inovacij v pametnih občinah 10.4335/978-961-6842-38-9 46 J Society & social sciences 720
Minder pretentie, meer ambitie: ontwikkelingshulp die verschil maakt 10.5117/9789089642264 8 J Society & social sciences 720
La storia del MOICA come storia delle casalinghe italiane. Un’analisi storico-sociale del lavoro familiare NA 236 J Society & social sciences 720
Commentaar op de nota Contouren van een toekomstig onderwijsbeste 10.26530/OAPEN_438949 13 J Society & social sciences 720
Van maakbaar naar betekenisvol bestuur - 63 NA 113 J Society & social sciences 720
Language-Learner Computer Interactions 10.1075/lsse.2 Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 Total_Item_Requests 804
Spreading the Written Word: Mikael Agricola and the Birth of Literary Finnish 10.21435/sflin.19 Helsinki University Library and SKS,2bce7b2b-181b-47a2-a1b1-2fe3ca87467d Total_Item_Requests 402
Origins of Human Language 10.3726/b12405 Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 Total_Item_Requests 599
«Candide», «La fée carabine» et les autres 10.3726/978-3-0352-0258-8 71 J Society & social sciences 720

It seems that in erroneus rows, variable total_2022 seems to have information which looks like funding data or is empty and language_simple has numerical information.

Data from Snijder, R., 2023

oapen_downloads_per_group %<>% relocate(classification,language_simple,titles_in_lang_subject_grp,min,max,median)


  knitr::kable(
    oapen %>%
      group_by(classification, language_simple) %>%
      mutate(n = n()) %>%
      reframe(
        number_of_titles = n,
        median_downloads = median(total_2022),
      ) %>% distinct_all,
    col.names = c("Subject", "Language", "N", "Median"),
    caption = "OAPEN Library titles by language and subject, calculated using cleaned dataset"
  )
OAPEN Library titles by language and subject, calculated using cleaned dataset
Subject Language N Median
A The arts English 814 326.5
A The arts German 286 164.5
A The arts Other 105 158.0
C Language German 494 101.5
C Language Other 149 186.0
D Literature & literary studies English 821 266.0
D Literature & literary studies German 460 115.5
D Literature & literary studies Other 165 245.0
G Reference, information & interdisciplinary subjects English 408 431.0
G Reference, information & interdisciplinary subjects German 90 156.0
G Reference, information & interdisciplinary subjects Other 36 127.5
H Humanities English 2350 325.0
H Humanities German 785 148.0
H Humanities Other 347 170.0
J Society & social sciences English 3249 382.0
J Society & social sciences German 1392 209.5
K Economics, finance, business & management English 900 347.0
K Economics, finance, business & management German 467 115.0
K Economics, finance, business & management Other 110 33.0
L Law English 324 334.5
L Law German 240 87.5
L Law Other 38 135.0
M Medicine English 513 130.0
M Medicine German 75 238.0
M Medicine Other 14 138.0
P Mathematics & science English 560 202.0
P Mathematics & science German 80 123.0
P Mathematics & science Other 28 185.5
R Earth sciences, geography, environment, planning English 343 316.0
R Earth sciences, geography, environment, planning German 62 133.5
R Earth sciences, geography, environment, planning Other 58 172.5
T Technology, engineering, agriculture English 462 173.0
T Technology, engineering, agriculture German 74 155.0
T Technology, engineering, agriculture Other 55 238.0
U Computing & information technology English 285 308.0
U Computing & information technology German 16 260.5
U Computing & information technology Other 3 95.0

Comparing results calculated using cleaned data to Table 1 of the published article it seems that erroneous rows (n=1356) belong to the C Language + English language and J Society & social sciences + Other language -groups.

In the Table 1 of published article there are 636 published titles in the C Language + English language grouping and 720 titles in the in the J Society & social sciences + Other language group and in tables created using data cleaned from errors, there are none.

Comparing publishers

It might be interesting to make comparisons between publishers with similar publishing profiles.

At first we must check whether erroneus part of the dataset has rows we might need in comparison.

In this case, we are interested titles published by Tampere University Press and Cappellen Damm Akademisk.

title publisher language classification
Vitenskapelig (u)redelighet Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Uncoded languages J Society & social sciences
Chapter 7 Hva sier de til seg selv? Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences
Sosiaalipolitiikan lumo Tampere University Press Finnish J Society & social sciences
Fra kollektiv til konnektiv handling? Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences
Sosiaaliturvariippuvuus Tampere University Press Finnish J Society & social sciences
Statlig politikk og lokale utfordringer Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences
Hvorfor vokser steder? Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Uncoded languages J Society & social sciences
Chapter 9 Sammenhengen mellom mindfulness og eksekutiv funksjon hos profesjonelle fotballspillere Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences
Folkevalgt og politisk leder Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences
Konflikt, fellesskap og forandring Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) Norwegian J Society & social sciences

Because there is 76 rows to data related to either Tampere University Press or Cappellen Damm Akademisk, we must fix these rows and include them to our analysis to have a full view of available data.

Published OAPEN data for Cappelen Damm Akademisk/NOASP 2022

Year 2022 downloads per subject group, items published by NOASP
Subject N Min Max Median downloads Total downloads
A The arts 6 14 221 166.5 799
C Language 4 54 276 122.0 574
D Literature & literary studies 2 73 214 143.5 287
G Reference, information & interdisciplinary subjects 3 109 549 122.0 780
H Humanities 21 23 448 118.0 3183
J Society & social sciences 53 24 1148 97.0 10929
K Economics, finance, business & management 5 94 300 255.0 1143
M Medicine 2 166 480 323.0 646
R Earth sciences, geography, environment, planning 1 451 451 451.0 451

Published OAPEN data: Tampere University Press 2022

Year 2022 downloads per subject group, items published by TUP
Subject N Min Max Median downloads Total downloads
C Language 1 209 209 209 209
D Literature & literary studies 1 49 49 49 49
G Reference, information & interdisciplinary subjects 1 442 442 442 442
H Humanities 5 47 462 117 1007
J Society & social sciences 23 36 4030 204 11378
K Economics, finance, business & management 5 105 417 202 1183
M Medicine 1 246 246 246 246

Data from publisher statistics dashboard

Let’s create comparison statistics for Tampere University Press using data downloaded from OAPEN publisher statistics dashboard.

Year 2020 downloads per subject group, items published by TUP
Subject N Min Max Median downloads Total downloads
C Language 1 105 105 105.0 105
D Literature & literary studies 2 29 124 76.5 153
G Reference, information & interdisciplinary subjects 1 300 300 300.0 300
H Humanities 6 24 523 135.0 1143
J Society & social sciences 26 14 887 107.5 4980
K Economics, finance, business & management 5 17 1531 188.0 2061
M Medicine 1 107 107 107.0 107
Year 2021 downloads per subject group, items published by TUP
Subject N Min Max Median downloads Total downloads
C Language 1 226 226 226.0 226
D Literature & literary studies 2 43 459 251.0 502
G Reference, information & interdisciplinary subjects 1 486 486 486.0 486
H Humanities 6 50 976 259.5 2281
J Society & social sciences 27 23 1851 221.0 10999
K Economics, finance, business & management 6 101 2925 203.5 4114
M Medicine 1 359 359 359.0 359
Year 2022 downloads per subject group, items published by TUP
Subject N Min Max Median downloads Total downloads
C Language 1 209 209 209 209
D Literature & literary studies 2 49 439 244 488
G Reference, information & interdisciplinary subjects 1 442 442 442 442
H Humanities 6 47 900 209 1907
J Society & social sciences 27 36 4030 204 12977
K Economics, finance, business & management 6 105 3029 251 4212
M Medicine 1 246 246 246 246
Year 2023 downloads per subject group, items published by TUP
Subject N Min Max Median downloads Total downloads
C Language 1 131 131 131.0 131
D Literature & literary studies 2 40 306 173.0 346
G Reference, information & interdisciplinary subjects 1 496 496 496.0 496
H Humanities 6 27 785 180.5 1548
J Society & social sciences 27 37 3092 197.0 12007
K Economics, finance, business & management 6 104 2917 168.5 3840
M Medicine 1 217 217 217.0 217