Explained | What is arXiv, and why is it still relevant for scientific research?
Premium

Around 2015, the machine learning community too adopted arXiv with great enthusiasm.

September 13, 2023 12:32 pm | Updated 04:20 pm IST

The science and technology community has been accustomed to reading research papers that are yet to be peer-reviewed. (File)

The science and technology community has been accustomed to reading research papers that are yet to be peer-reviewed. (File) | Photo Credit: AP

The story so far: The science and technology community has been accustomed to reading research papers that are yet to be peer-reviewed. These so-called preprints help scientists keep up with latest findings. The practice started over six decades ago when two biochemists and an administrator at the National Institute of Health (NIH) developed a method for researchers to share early work with their peers. While it was largely a paper-based circulation at that point, by the early 90s, this system switched online. In 1989, Joanne Cohn, a physicist at Los Alamos National University started an e-mail list of preprints for her colleagues, to serve this purpose.

Over time, the volume of papers became unmanageable as recipients’ mailboxes began to fill up. Paul Ginsparg, a co-worker of Cohn, had a bright idea. Ginsparg turned the mailing list into a central repository that anyone could access. Physicists took to the idea quickly. Within two years, any new paper on particle physics first showed up on the portal - what we now know as arXiv. 

What is arXiv?

The preprint server is a research hub for physicists, computer scientists, mathematicians, astronomers and anyone involved in serious research of any kind. Around 2015, the machine learning community too adopted arXiv with great enthusiasm. And by January 3, 2021, the repository had grown to a collection of more than two million research papers. Unlike traditional journals which take half year or more to publish a paper, arXiv is lightning fast it does not have a peer review process. So, developments are shared fast and at scale.

But arXiv became a bone of contention last week after Emily M. Bender, a prominent professor of natural language processing (NLP) and computational linguistics from the University of Washington, called it “cancer.” Bender tweeted saying that arXiv was disseminating “junk ‘science’ in a format that is indistinguishable from real publications.” She also pointed out that it promotes a hectic “can’t keep up” plus “anything older than 6 months is irrelevant [in computer science]” culture. 

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

Is ArXiv bad for research?

According to Bender, researchers were rushing to publish a piece to simply “flag-plant” or claim a research area because of the immense hype around AI/ML. She decried this practice noting, “If you’re doing work that you’re worried will get scooped, you’re probably not asking very interesting or original questions.”

Bender also claimed that arXiv has become so popular that it had diminished the value for peer review. She said that she had often heard researchers bragging about papers which had been rejected from conferences but went on to become “highly influential” on arXiv. It’s a “fallacy,” Bender noted, to say that every rejected paper was “worthy.” 

Additionally, she singled out arXiv’s endorsement as being biased in favour of “big names and big labs”. 

ArXiv’s endorsement policy maintains that it may give people automatic endorsements based on “their subject area, topic, past work and academic affiliation.” Anyone who wants to endorse a fresh submission can verify if the paper is appropriate for the area of research. 

The new author sends a six-character alphanumeric endorsement code to the endorser, who enters this on an endorsement form if they wish to endorse this person. If they do not wish to endorse them, they can give the reason explicitly in the form, which will be then registered as a negative vote of endorsement. They can also choose to stay away from the endorsement process completely, and not submit the form at all. 

How does arXiv’s approval system work?

ArXiv follows a moderation system that publishes a paper if there are no flags raised by a moderator within a day after the paper is submitted.  And submissions have bloated to as much as 1,200 per day. There are only about 200 volunteer moderators from across 150 categories to check these papers. Researchers had earlier complained that the moderation process was called slow, opaque and inconsistent. 

Some of the censure levelled against arXiv is reasonable. The library has been struggling to keep up with the pace of user growth for some time. To cope, Ginsparg had moved the server within Cornell in 2011. And the handful of moderators simply weren’t enough. 

ArXiv’s scientific director Steinn Sigurdsson told the Scientific American in January 2022 that arXiv had been “understaffed and underfunded for years.” In 2021, Ginsparg himself called arXiv’s daily turnaround of submissions “unforgiving.“

There were lists of moderator names for various sections available to the public, but not always. (Most moderators for many physics sections are unnamed)

ArXiv also holds the ground to reject papers without any clarification. 

What are the arguments in favour of arXiv?

Despite the cracks, arXiv remains invaluable for most independent researchers.

“It’s very, very important. I am on it every day. Just free access for students and people like me, means a lot. Especially in AI research which is moving so fast. These accusations are surprising to me,” Rahul Dandwate, a researcher working with Generative AI platform, rephrase.ai, said. 

Even if over publishing is a symptom, to some, arXiv is not the source of it for some users. “It’s very important for AI researchers because it shortens wait cycles. I don’t think that arXiv is responsible for overpublishing frankly, because all we’re looking for is a citation,” Thomas Scialom, a research scientist with Meta AI stated.

Top News Today

Sign in to unlock member-only benefits!
  • Access 10 free stories every month
  • Save stories to read later
  • Access to comment on every story
  • Sign-up/manage your newsletter subscriptions with a single click
  • Get notified by email for early access to discounts & offers on our products
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.

We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.