Characteristics of Papers that are Highly Cited
In scholarly research, authors write papers that are either lowly cited or highly cited. To get the coveted title of “highly cited,” a paper must rank in the top 1% of researched articles in that subject over the past decade. The title of “highly cited” indicates that the research performed is considered a standard in scientific excellence and other researchers can use that information as a benchmark for their own studies. The author(s) created a paper that is considered to have had maximum impact.

With so many papers published daily in any given subject area, it’s important to have a high standard to look towards. Certain factors help an author receive the “highly cited” title, beyond the research involved. The advent of machine learning has aided researchers in understanding more clearly what they need to do in order to obtain this ultimate status. While there are some quantitative measurements that authors can aim for, the overall title still has qualitative elements to it.

Machine Learning to Understand Highly Cited

Machine learning through programs such as Google Scholar and Web Science has made it possible for an algorithm to be used to help determine the impact of a paper and follow it along its trajectory. These algorithms and other forms of bibliometrics have been studied to determine common threads amongst multiple highly cited papers. 

Factors such as the h-index and g-index are used to account for the academic impact of a paper. The h-index measures quality and quantity, looking at the publications of the article and the number of times it was cited by other researchers. The g-index covers the performance of the author by following their top articles and ranking the scholar accordingly. This puts the focus on the author rather than the research paper.

The i10 index is the important algorithm for those seeking “highly cited” levels. This formula breaks down online published articles, the citations, and how much of your article was cited by other published authors once the citations reach a minimum of ten. Beyond ten, the paper begins to compete with other articles on the same topic for that coveted 1% mark.

Traits of a Highly Cited Paper

The same machine learning that helps authors to follow the impact of their paper also allows analysts to break down the factors common to the majority of highly cited papers. While none of these characteristics are common to every paper that reaches this level, it’s a guideline for authors to keep in mind as they explore their research and compile their paper for publication. 

●      The number of words in the title. As in the well-known fairy tale Goldilocks, titles can be too short, too long, and then just right. Titles are used to grab the reader’s attention and explain in succinct terms exactly what the article is about. According to Google Scholar, Web of Science, and Altmetric, the perfect length to shoot for is between 7 and 13 words. 

●      The words chosen in the title. How many words you use is one thing; what those words are is another. Research and algorithms have shown that when the title is catchy or interesting, it is more likely to be downloaded, used, and cited.

●      Common themes of words in the paper. Regardless of the subject, scientifically researched papers frequently have words that are used consistently. They may vary slightly based on the topic and focus of the paper itself, but the themes remain. Words denoting methods of treatment or study, scientific analysis, and associations with health are almost unanimously in all highly cited papers. This is likely because the paper itself would have a wide audience impacted by its topic.

●      How many authors published the research. There is a high correlation to the number of citations a paper receives and the number of authors who wrote the paper. There could be many reasons behind this, but the most widely held thought is that when there are more people authoring a study, the quality of the methods used, the accuracy of the experiment, the number of funding sources, and overall quality of the paper are higher.

●      The number of characters (text) in the paper. There is something to be said for longer papers receiving more citations. Highly cited papers have more than 33,600 characters, without spaces. This averages out to be between 5,000 and 6,000 words. 

●      How many figures and tables are used. Readers understand data differently when it is displayed in multiple formats. It makes sense, then, that papers that display more tables and figures are downloaded and cited more frequently than those with text-only formats.

With an understanding of these quantitative measures, authors can improve their readership by the numbers. The focus then becomes ensuring the quality of the content matches the quantitative algorithmic results.

