OAPEN usage statistics
Reproduction of analysis published in Snijder, R., 2023. Measured in a context: making sense of open access book data. Insights: the UKSG journal, 36(1), p.20. https://doi.org/10.1629/uksg.627
Source code available at https://github.com/mjlassila/oapen-usage-stats.
Data downloaded from https://doi.org/10.5281/zenodo.7799222.
Overview of the data
Let’s create some basic descriptive tables to check the structure of the data. Based on result tables published in the article, we know the range of values variables should have.
Variable language_simple
should have values English, German and Other. so there should be three rows in the resulting dataset when summarizing the data by language_simple
. It seems that there might be errors in the data, because the summary table has 457 rows.
Let’s glimpse how language_simple
looks like.
language_simple | n |
---|---|
English | 11029 |
German | 4521 |
Other | 1108 |
720 | 721 |
40 | 6 |
113 | 5 |
239 | 5 |
47 | 5 |
84 | 5 |
110 | 4 |
Apparently some rows have erroneus values at least in language_simple
variable.
Classification (eg. general subject of the publication) variable should have 13 possible values
classification | n |
---|---|
A The arts | 1205 |
C Language | 643 |
D Literature & literary studies | 1446 |
G Reference, information & interdisciplinary subjects | 534 |
H Humanities | 3482 |
J Society & social sciences | 5361 |
K Economics, finance, business & management | 1477 |
L Law | 602 |
M Medicine | 602 |
P Mathematics & science | 668 |
R Earth sciences, geography, environment, planning | 463 |
T Technology, engineering, agriculture | 591 |
Total_Item_Requests | 636 |
U Computing & information technology | 304 |
Surprisingly, there is value Total_Item_Requests in the midst of actual classification codes.
Let’s see how the erroneous rows look like. In total, 1356 rows might have erroneous data.
title | doi | total_2022 | classification | language_simple |
---|---|---|---|---|
The Prague School and Theories of Structure | 10.14220/9783862347049 | Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 | Total_Item_Requests | 1800 |
Financiranje projektov in inovacij v pametnih občinah | 10.4335/978-961-6842-38-9 | 46 | J Society & social sciences | 720 |
Minder pretentie, meer ambitie: ontwikkelingshulp die verschil maakt | 10.5117/9789089642264 | 8 | J Society & social sciences | 720 |
La storia del MOICA come storia delle casalinghe italiane. Un’analisi storico-sociale del lavoro familiare | NA | 236 | J Society & social sciences | 720 |
Commentaar op de nota Contouren van een toekomstig onderwijsbeste | 10.26530/OAPEN_438949 | 13 | J Society & social sciences | 720 |
Van maakbaar naar betekenisvol bestuur - 63 | NA | 113 | J Society & social sciences | 720 |
Language-Learner Computer Interactions | 10.1075/lsse.2 | Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 | Total_Item_Requests | 804 |
Spreading the Written Word: Mikael Agricola and the Birth of Literary Finnish | 10.21435/sflin.19 | Helsinki University Library and SKS,2bce7b2b-181b-47a2-a1b1-2fe3ca87467d | Total_Item_Requests | 402 |
Origins of Human Language | 10.3726/b12405 | Knowledge Unlatched,b818ba9d-2dd9-4fd7-a364-7f305aef7ee9 | Total_Item_Requests | 599 |
«Candide», «La fée carabine» et les autres | 10.3726/978-3-0352-0258-8 | 71 | J Society & social sciences | 720 |
It seems that in erroneus rows, variable total_2022
seems to have information which looks like funding data or is empty and language_simple
has numerical information.
Data from Snijder, R., 2023
%<>% relocate(classification,language_simple,titles_in_lang_subject_grp,min,max,median)
oapen_downloads_per_group
::kable(
knitr%>%
oapen group_by(classification, language_simple) %>%
mutate(n = n()) %>%
reframe(
number_of_titles = n,
median_downloads = median(total_2022),
%>% distinct_all,
) col.names = c("Subject", "Language", "N", "Median"),
caption = "OAPEN Library titles by language and subject, calculated using cleaned dataset"
)
Subject | Language | N | Median |
---|---|---|---|
A The arts | English | 814 | 326.5 |
A The arts | German | 286 | 164.5 |
A The arts | Other | 105 | 158.0 |
C Language | German | 494 | 101.5 |
C Language | Other | 149 | 186.0 |
D Literature & literary studies | English | 821 | 266.0 |
D Literature & literary studies | German | 460 | 115.5 |
D Literature & literary studies | Other | 165 | 245.0 |
G Reference, information & interdisciplinary subjects | English | 408 | 431.0 |
G Reference, information & interdisciplinary subjects | German | 90 | 156.0 |
G Reference, information & interdisciplinary subjects | Other | 36 | 127.5 |
H Humanities | English | 2350 | 325.0 |
H Humanities | German | 785 | 148.0 |
H Humanities | Other | 347 | 170.0 |
J Society & social sciences | English | 3249 | 382.0 |
J Society & social sciences | German | 1392 | 209.5 |
K Economics, finance, business & management | English | 900 | 347.0 |
K Economics, finance, business & management | German | 467 | 115.0 |
K Economics, finance, business & management | Other | 110 | 33.0 |
L Law | English | 324 | 334.5 |
L Law | German | 240 | 87.5 |
L Law | Other | 38 | 135.0 |
M Medicine | English | 513 | 130.0 |
M Medicine | German | 75 | 238.0 |
M Medicine | Other | 14 | 138.0 |
P Mathematics & science | English | 560 | 202.0 |
P Mathematics & science | German | 80 | 123.0 |
P Mathematics & science | Other | 28 | 185.5 |
R Earth sciences, geography, environment, planning | English | 343 | 316.0 |
R Earth sciences, geography, environment, planning | German | 62 | 133.5 |
R Earth sciences, geography, environment, planning | Other | 58 | 172.5 |
T Technology, engineering, agriculture | English | 462 | 173.0 |
T Technology, engineering, agriculture | German | 74 | 155.0 |
T Technology, engineering, agriculture | Other | 55 | 238.0 |
U Computing & information technology | English | 285 | 308.0 |
U Computing & information technology | German | 16 | 260.5 |
U Computing & information technology | Other | 3 | 95.0 |
Comparing results calculated using cleaned data to Table 1 of the published article it seems that erroneous rows (n=1356) belong to the C Language + English language and J Society & social sciences + Other language -groups.
In the Table 1 of published article there are 636 published titles in the C Language + English language grouping and 720 titles in the in the J Society & social sciences + Other language group and in tables created using data cleaned from errors, there are none.
Comparing publishers
It might be interesting to make comparisons between publishers with similar publishing profiles.
At first we must check whether erroneus part of the dataset has rows we might need in comparison.
In this case, we are interested titles published by Tampere University Press and Cappellen Damm Akademisk.
title | publisher | language | classification |
---|---|---|---|
Vitenskapelig (u)redelighet | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Uncoded languages | J Society & social sciences |
Chapter 7 Hva sier de til seg selv? | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Sosiaalipolitiikan lumo | Tampere University Press | Finnish | J Society & social sciences |
Fra kollektiv til konnektiv handling? | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Sosiaaliturvariippuvuus | Tampere University Press | Finnish | J Society & social sciences |
Statlig politikk og lokale utfordringer | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Hvorfor vokser steder? | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Uncoded languages | J Society & social sciences |
Chapter 9 Sammenhengen mellom mindfulness og eksekutiv funksjon hos profesjonelle fotballspillere | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Folkevalgt og politisk leder | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Konflikt, fellesskap og forandring | Cappelen Damm Akademisk/NOASP (Nordic Open Access Scholarly Publishing) | Norwegian | J Society & social sciences |
Because there is 76 rows to data related to either Tampere University Press or Cappellen Damm Akademisk, we must fix these rows and include them to our analysis to have a full view of available data.
Published OAPEN data for Cappelen Damm Akademisk/NOASP 2022
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
A The arts | 6 | 14 | 221 | 166.5 | 799 |
C Language | 4 | 54 | 276 | 122.0 | 574 |
D Literature & literary studies | 2 | 73 | 214 | 143.5 | 287 |
G Reference, information & interdisciplinary subjects | 3 | 109 | 549 | 122.0 | 780 |
H Humanities | 21 | 23 | 448 | 118.0 | 3183 |
J Society & social sciences | 53 | 24 | 1148 | 97.0 | 10929 |
K Economics, finance, business & management | 5 | 94 | 300 | 255.0 | 1143 |
M Medicine | 2 | 166 | 480 | 323.0 | 646 |
R Earth sciences, geography, environment, planning | 1 | 451 | 451 | 451.0 | 451 |
Published OAPEN data: Tampere University Press 2022
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
C Language | 1 | 209 | 209 | 209 | 209 |
D Literature & literary studies | 1 | 49 | 49 | 49 | 49 |
G Reference, information & interdisciplinary subjects | 1 | 442 | 442 | 442 | 442 |
H Humanities | 5 | 47 | 462 | 117 | 1007 |
J Society & social sciences | 23 | 36 | 4030 | 204 | 11378 |
K Economics, finance, business & management | 5 | 105 | 417 | 202 | 1183 |
M Medicine | 1 | 246 | 246 | 246 | 246 |
Data from publisher statistics dashboard
Let’s create comparison statistics for Tampere University Press using data downloaded from OAPEN publisher statistics dashboard.
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
C Language | 1 | 105 | 105 | 105.0 | 105 |
D Literature & literary studies | 2 | 29 | 124 | 76.5 | 153 |
G Reference, information & interdisciplinary subjects | 1 | 300 | 300 | 300.0 | 300 |
H Humanities | 6 | 24 | 523 | 135.0 | 1143 |
J Society & social sciences | 26 | 14 | 887 | 107.5 | 4980 |
K Economics, finance, business & management | 5 | 17 | 1531 | 188.0 | 2061 |
M Medicine | 1 | 107 | 107 | 107.0 | 107 |
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
C Language | 1 | 226 | 226 | 226.0 | 226 |
D Literature & literary studies | 2 | 43 | 459 | 251.0 | 502 |
G Reference, information & interdisciplinary subjects | 1 | 486 | 486 | 486.0 | 486 |
H Humanities | 6 | 50 | 976 | 259.5 | 2281 |
J Society & social sciences | 27 | 23 | 1851 | 221.0 | 10999 |
K Economics, finance, business & management | 6 | 101 | 2925 | 203.5 | 4114 |
M Medicine | 1 | 359 | 359 | 359.0 | 359 |
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
C Language | 1 | 209 | 209 | 209 | 209 |
D Literature & literary studies | 2 | 49 | 439 | 244 | 488 |
G Reference, information & interdisciplinary subjects | 1 | 442 | 442 | 442 | 442 |
H Humanities | 6 | 47 | 900 | 209 | 1907 |
J Society & social sciences | 27 | 36 | 4030 | 204 | 12977 |
K Economics, finance, business & management | 6 | 105 | 3029 | 251 | 4212 |
M Medicine | 1 | 246 | 246 | 246 | 246 |
Subject | N | Min | Max | Median downloads | Total downloads |
---|---|---|---|---|---|
C Language | 1 | 131 | 131 | 131.0 | 131 |
D Literature & literary studies | 2 | 40 | 306 | 173.0 | 346 |
G Reference, information & interdisciplinary subjects | 1 | 496 | 496 | 496.0 | 496 |
H Humanities | 6 | 27 | 785 | 180.5 | 1548 |
J Society & social sciences | 27 | 37 | 3092 | 197.0 | 12007 |
K Economics, finance, business & management | 6 | 104 | 2917 | 168.5 | 3840 |
M Medicine | 1 | 217 | 217 | 217.0 | 217 |