The General Index consists of 3 tables derived from 107,233,728 journal articles. A table of n-grams, ranging from unigrams to 5-grams, is extracted using SpaCy. Each of the 355,279,820,087 rows of the n-gram table consists of an n-gram coupled with a journal article id. A second table is constructed using Yake and consists of 19,740,906,314 rows, each with a keywords and an article id. A third table associates an article id with metadata.
- The metadata, readme, and sample files are available on this item in the data downloads area.
- The README.txt file contains more information.
- The corpus of articles has been split into 16 slices, plus an update with additional language files. The keywords and n-grams files, one for each of the slices, are kept separate items to avoid overloading the servers. Be careful, these files are big and inflate greatly upon unzipping.
- The keywords are located here: [ 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : a : b : c : d : e : f ]
- The n-grams are located here: [ 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : a : b : c : d : e : f : supp1 ]
- You can see all the items at this URL: https://archive.org/search.php?query=%22general%20index%22%20AND%20collection%3Amulticasting
Public Resource has also made available The TDM Today Show and an early release of The Florilegium: A Special Index to Plants. A previous article about our work in this area appeared in Nature. The General Index was also the subject of a more recent article in Nature. There is also a Special Index to Species available.
Declaration of Support for the General Index
“Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.
The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.
Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.
Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants. We applaud the release of the General Index and look forward to the progress of this worthy endeavor.”
Signatories to the Declaration of Support
- Dr. Vinton G. Cerf, Internet Pioneer
- Dr. Gitanjali Yadav, National Institute of Plant Genome Research and Cambridge University
- Dr. Ross Mounce, Arcadia
- Dr. Ian T. Foster, University of Chicago
- Dr. Amitabh Joshi, J.C. Bose National Fellow; Jawaharlal Nehru Centre for Advanced Scientific Research
- Heather Joseph, Executive Director, SPARC
- Dr. Corynne McSherry, Legal Director, Electronic Frontier Foundation
- Dr. Lawrence Liang, Ambedkar University School of Law
- Dr. Dinesh Singh, Former Vice Chancellor, University of Delhi
- Dr. Pamela Samuelson, University of California, Berkeley School of Law
- Alexander B. Howard, Director, The Digital Democracy Project
- Blair MacIntyre, Professor, Georgia Tech
- Kaylea Champion, PhD Student, University of Washington
- Samuel Klein, Curator, Knowledge Futures Group
- Eric Brunner, Oregonian
- Peter C. Richardson, Associate Technical Fellow, The Boeing Company
- Federico Leva, Wikimedia Italia
- Nick Shockey, Director of Programs & Engagement, SPARC
- Christof Schöch, Professor of Digital Humanities, Trier University, Germany
- Dave Hansen, Librarian, Duke University
- Lambert Heller, Librarian, Leibniz Information Centre for Science and Technology
- Cameron Neylon, Professor, Curtin University
- Roger Levy, Professor, Massachusetts Institute of Technology
- Lingfei Wu, Professor, The University of Pittsburgh
- Onur Varol, Professor, Sabanci University
- Matthew Elvey, Medical Researcher, Yale University
- Daniel Stökl Ben Ezra, Professor, Ecole Pratique des Hautes Etudes
- James Evans, Professor, University of Chicago
- Peter Suber, Office for Scholarly Communication, Harvard University
- Philip Young, Librarian, Virginia Tech
- Gavin Moodie, Doctor, University of Toronto
- Memo Cordova, Librarian, Boise State University
- Oscar Perea Rodriguez, Lecturer, University of San Francisco
- Kyle K. Courtney, Copyright Advisor, Harvard University
- Agitha T.G, Professor, Retired
- Subbiah Arunachalam , Professor, Indian Institute of Science
- Rochelle Pinto, Independent Researcher
- Rahul Siddharthan, Professor “G”, The Institute of Mathematical Sciences, Chennai
- Dr. Himender Bharti, Professor, Puniabi University, Patiala
- Fernando Gonzalez-Candelas, Professor, University of Valencia, Spain
- Jasjeet Singh Bagla, Professor, IISER Mohali
- Anirudh Gupta, Data Scientist, Thoughtworks
- Michael Travers, Software Engineer, Parker Institute for Cancer Immunotherapy
- M P Gururajan, Professor, IIT Bombay
- Tim O’Reilly, CEO, O’Reilly Media
- Chris Mills, Engineering Manager, Indeed
- Mark Johnson, Technologist and Adjunct Professor, North Carolina State University
- Jeff Cox, Lawyer, UniCourt
- Cable Green, Director of Open Education, Creative Commons
- Ashutosh Sharma, Research Student, The University of Trans-Disciplinary Health Sciences and Technology, India
- David S. Reed, Founder, Center for Public Administrators
- Jean-Claude Guédon, Professor (retired), Université de Montréal
- Chris Hartgerink, Director, Liberate Science GmbH
- Khaeruddin Kiramang, Student, Curtin University
- Ramy Arnaout, Professor, Beth Israel Deaconess Medical Center
- Ian Connor, Industry Supervisor, QUT
- Martin R. Lucas, Lawyer, Chambers of Martin R. Lucas
- Jorge Cortell, Former Associate Professor of Intellectual Property, Polytechnic University of Valencia
- Lane Rasberry, Research Scientist, School of Data Science, University of Virginia
- James Clement, Research Scientist, Betterhumans Inc
- Uri Hasson, Professor of Cognitive Neuroscience, University of Trento Italy
- LJ Eads, Data Scientist, The MITRE Corporation
- Jerry Goldman, Professor Emeritus, Northwestern University
- Alex O. Holcombe, Professor, The University of Sydney
- Rajarshi Das, Research, FatBrain.AI
- M Madhan, Librarian, Jindal Global University
- Mark Hahnel, CEO, Figshare
- Nidhal Selmi, Software Engineer, Arizona State University
- John J. Murphy, Physician/Clinical Informaticist, Veterans Health Administration
- Allen Riddell, Assistant Professor, Indiana University Bloomington
- Derek Hefley, Graduate Student, Missouri S&T
- Antonio Max, Author, Independent Researcher
- Bethaney Hatch, Executive Assistant, ArrantaBio
- Deborah Salerno, Independent Medical Writer, Salerno Scientific
- Geethanjali Sreenivasarao Pavar, PhD Researcher, University of Edinburgh
- Dr. Nimal Chandrasena, Former Associate Professor, University of Colombo
- Carlos Denner, Professor, University of Brasilia
- Álvaro Saladén Roa, Professor, Universidad de Cartagena
- NAFIUL, Student, Mymensingh Medical College
- Vincent Raymond, Professor, Université Laval (Ville de Québec)
- Dr. O. O. Ilori, Head of Department , Obafemi Awolowo University
- Gina Santos Itchon, Professor, Xavier University – Ateneo de Cagayan
- Dr. Marjorie J. Hinds, Independent Researcher
- Rafael Lairet, Professor, Universidad Simón Bolívar
- C. Mitchell Clark, University of Nebraska-Lincoln
- Oladimeji Oluwalasinu, Student, Obafemi Awolowo University Ile-Ife
- Zhiwen Hu, Professor, Zhejiang Gongshang University
- Sahaya G. Selvam, Associate Professor, Marist International University College
- Dr. Johannes Kabisch, Professor, NTNU
- Philip Meier, CTO, Maila Health
- Filip Vukovinski, Researcher, Staatsinstitut
- Catherine Demoliou, Professor, University of Nicosia
- Daniel Mietchen, Researcher, Ronin Institute
- Marc Robinson-Rechavi, Professor, University of Lausanne
- Tristan Henderson, Graduate Student, Mississippi State University
- Ivan Arisi, Scientist, European Brain Research Institute (EBRI)
- V. Jithin, Student, Wildlife Institute of India
- Amos Bairoch, Professor, Swiss Institute of Bioinformatics
- Peter Murray-Rust, Dr, University of Cambridge
- Robert H’obbes’ Zakon, Founding Principal, Zakon Group LLC
- Addeddate
- 2021-10-07 17:52:39
- Color
- Indigo
- Identifier
- GeneralIndex
- In_memoriam
- Shamnad Basheer ; Aaron Swartz
- Rights
- There are no rights reserved.
- Scanner
- Scanning is the new spinning. ia cli 2.1.0
- Sound
- Do (Make Your Bread)
- Year
- 2021