First listings on arXiv are more cited. However, …

Summary

In grad school I heard that papers submitted to arXiv right after the submission opens get the first position in the listing and are thus cited more. Using hep-th submissions from 2017-2020, I looked into whether this is true.

It turns out that papers that appeared as the first listing are indeed more cited:

Excess citations as funciton of arXiv announcement position

However, if we look only at papers submitted in the first minute after the submission opens there is only a slight dependence on where in the listing the paper appeared:

Excess citations as funciton of arXiv announcement position for submissions
          in the first minute

Also, the papers that appeared as the first listing but were not submitted in the first minute after the submission opens receive 4% less citations than expected and not 30+% more as the first graph suggests.

So overall it seems that the "first listing" effect exists, but is not actually driven by at which arXiv position the paper appears but by the demographics of the people who submit their papers right after the submission opens. Possibly, these submitters who were exposed to the "first listing" lore are better networked than the rest or are additionally exposed to other similar (and more useful) advice about how to promote their papers, which gives them advantage.

Code here.

Analysis details

We decided to focus on papers from hep-th because information about the number of citations is easily available and we avoid having to deal with simultaneous submissions of many papers by highly cited collaborations. We avoid dealing with papers from 2021 as those would not have enough time to gather citations. The metadata for all arXiv submissions are readily available at Kaggle. We need arXiv IDs and submission information.

With arXiv IDs, we can download number of citations from the INSPIRE database. We did this on March 13, 2022.

Summary statistics:

Number of papers per year Average number of citations for papers from given year

From now on we only work with papers from 2017 onwards, because the arXiv submission deadline changed on Jan 2, 2017.

To compare across different years, for each paper we calculate a percentage gain (or loss) in the number of citations relative to the average for the year the paper was submitted. We tried more accurate normalizations, but the results did not significantly change.

For each paper we use the submission time to determine the position on which it appeared on arXiv. Averaging the citations gains/losses over the papers listed as first, second, … we find (the drop in the first plot is due to draws)

Number of papers listed at n-th position Average number of citations of papers listed at n-th position

This looks interesting! But is it really the case that the papers listed as first on arXiv get a strong citation boost? It turns out that many people submit right after the deadline:

Number of papers submitted at given hour

Because so many people are trying to have the first listing, we can focus on submissions received within the first minute after the submission opens. Oftentimes there will be multiple submissions and how high they end up on the arXiv listing will be to a large extent determined by chance. Comparing papers submitted within the first minute and ranked as first/second/… will thus allow us to determine whether the "first listing" effect is real. What we find:

Number of papers listed at n-th position, first minute submissions only Average number of citations of papers listed at n-th position, first minute submissions only

So there is not much of an effect here, which suggests that what affects the resulting number of citations is something about "submitting right after the deadline", not being the top listing. This argument is further strengthened if we look at papers that ended up being the top listing despite not being submitted in the first minute of submissions. These 330 papers actually show a 4% deficit in the number of citations, not a boost.

It may be argued that there is a slight, few percent, advantage in having the first arXiv listing, but the excess citations seen in the plot above are clearly coming from something else. Possibly because the people who know about the "first listing" effect have stronger professional networks (that told them about the effect in the first place) and are thus more cited, or because people exposed to this lore were simultaneously exposed to other (and more useful) pieces of how to promote scientific work.

Just for fun we also looked at which day of the week is the best for submissions, splitting each day for before and after the 2 pm deadline:

Number of papers submitted on individual days of the week, split by before/after 2pm Average number of citations of papers submitted on individual days of the week, split by before/after 2pm

The best day turns out to be Monday after 2 pm and the papers submitted over the weekend are the least cited. Interestingly, the submissions after 2 pm have consistently higher numbers of citations than those before 2 pm. Presumably, this all again reflects something about the demographics of the submitters at these particular times.

P.S.: If you know about a job opening in Toronto that might be a good match, please let me know. Thank you!

15 Mar 2022