Tentative outline of a body of knowledge

Kragen Javier Sitaker, 02020-06-06 (updated 02020-10-28) (10 minutes)

A possible ambition for Derctuo is to include all the background information needed to understand it, if I can find freely-licensed sources. So, for example, Pandemic Collapse talks about geography (the US, Tenochtitlán, Cambodia), historical events (the Vietnam War, the 1918 flu, the Bronze Age Collapse), economic concepts (unemployment, insurance, banks), and other institutions (the US DoD, the Mormon church, major corporations). Solar furnace CPC talks about physical properties of common materials, the Stefan–Boltzmann law, manufacturing processes of ceramics, thermodynamics, units of measurement, basic optics, and the structure of the solar system. CCN Streams talks about networked systems architecture, hashing, SHA-256, TCP/IP, disks, telephone networks, and all kinds of programming stuff.

What is the body of knowledge that would be needed to make sense of all this stuff? Consider the Stefan–Boltzmann law. To make any sense of the statement j = σT⁴ you need to know algebraic notation and what energy and temperature are, including the concept of absolute temperature. And you need to understand how solid objects have surface areas.

Geographic and historical knowledge in particular is sort of endless. Tenochtitlán is Mexico City today, with 8.8 million people, 0.11% of the world’s population; Mexico City’s Wikipedia page is 213kB, 33000 words; the destruction of Tenochtitlán (what is referenced in Pandemic Collapse) is mentioned briefly after 9% of the page. If you divided the world into, say, 2048 regions of equal population (4 million or so), and included 4096 words or so on each of these regions, you’d probably cover most of the geographic facts of importance comparable to the ruin of Tenochtitlán, in about 8.3 million words, about 30,000 pages; you could read it all, once, in three to six months.

Vital Articles

Wikipedia’s “Vital Articles” constitutes an attempt to codify such a general-purpose body of knowledge. There are ten Level 1 Vital Articles, including “Human History” (21000 words, 137kB, mentions Mexico and the Aztecs, and has a couple of sentences on the European conquest of the Americas); 100 Level 2 Vital Articles, including 10 articles on history (the “early modern period” article has a couple of sentences on the European conquest out of 18000 words and 120kB and mentions the Aztecs, and so does “civilization”) and 11 on geography (the “North America” article’s 18000 words in 123kB does explain, “The Mayan culture was still present in southern Mexico and Guatemala when the Spanish conquistadors arrived, but political dominance in the area had shifted to the Aztec Empire, whose capital city Tenochtitlan was located further north in the Valley of Mexico. The Aztecs were conquered in 1521 by Hernán Cortés.”); and 999 Level 3 Vital Articles, including 80 on history and 99 on geography.

How about the killing fields of Cambodia under the Khmer Rouge, also mentioned in the same note? Among the Level 2 Vital Articles we find “Late Modern Period” (19000 words, 123kB) which mentions the Cambodian genocide, but no more; and “Asia” (15000 words, 104kB) which mentions “the Cambodian Killing Fields”, but no more. We don’t find enough detail to understand the allusions in Pandemic collapse until Level 3, which sketches the history of the Khmer Rouge in Cambodia in its articles “Vietnam”, “Cold War” (36000 words, 233kB) including multiple paragraphs and a photo of a shelf full of skulls, “Mao Zedong”, “Theravada”, “Dictatorship”, and especially “Genocide” (17000 words, 109kB).

So we can infer that probably, at least when it comes to understanding my historical references, having read all of Wikipedia’s Level 3 Vital Articles are probably sufficient. This is not true for scientific knowledge; “Temperature”, “Fire”, “Electric light”, and “Electromagnetic radiation” mention black body radiation briefly but do not mention the Stefan–Boltzmann law.

Unfortunately the Level 3 Vital Articles are some 20 million words and would blow out the 20-megabyte download budget for Derctuo, even without any pictures. The thought above of having about 4096 words for every 4 million people would be more than adequate for Cambodia, though, since in the 16384 words on Cambodia, we could surely find space to mention the Khmer Rouge.

Reading Level 3 might take a year at a reasonable level of reading speed, a bit over 1000 hours if you read it like a novel.

Possible plethoras of sources

Possible sources include MIT OpenCourseware, Wikipedia, Wikibooks, cnx.org (before it shuts down), OpenStreetMap, Project Gutenberg, the Internet Archive etexts collections, and for recent things, PLoS and arXiv.org. Boundless used to have some open-content textbooks but they seem to have mostly been lost, though fragments like their definition of limits survive in part. OERCommons has a search engine over thousands of freely licensed educational resources, of which nearly a thousand few are textbooks, such as Jim Hefferon’s linear algebra book (CC-BY-SA, 7.5MB, 507pp.). (See also the section on “particular textbooks” below about Hefferon’s work.) They also link to OpenStax (which I’d forgotten about), Delft OCW, CMU OLI, and another dozen or so similar initiatives.

GWU has a guide to open textbooks which links to most of the above.

Wikipedia has a list of notable CC works, including Connexions (which I guess is cnx.org), Khan Academy (cc-by-nc-sa), OpenLearn, OCW, something called “The Saylor Foundation”, WikiEducator, 375000 CC0 artworks from the Metropolitan Museum of Art, deviantART, Flickr, Open Game Art, Openclipart, etc.

Many public-domain books, including some nonfiction, are in Project Gutenberg and Wikisource, as well as the Internet Archive’s books collection. Everything up to 1924 is PD in the US now, including Rhapsody in Blue. Parker Higgins collated many striking 1923 works in a zine last year, though I think a more striking work still is Kahlil Gibran’s The Prophet. Also, perhaps, the Russells’ The Prospects of Industrial Civilization. The Hathi Trust catalogues 53940 works published in 1923, of which 33105 are books.

A dismal assessment of OERCommons

The OERCommons textbooks mentioned earlier include 17 history textbooks, but most are too specific to include either of the events I was using as test points above. World Civilizations I (CC-BY) was the only one that seemed broad enough to mention Cambodia, but unfortunately has been lost. Western Civilization: A Concise History, Volume 3 (CC-BY-NC, 105k words, 10MB as .odt, 274 pp.) starts with Napoleon, too late to cover Cortés, but its volume 2 (CC-BY-NC, 87k words, 229 pp.) does devote a few paragraphs to the events.

Particular textbooks to check out

Jim Hefferon’s Linear Algebra, Theory of Computation, and Introduction to Proofs are cc-by-sa 3.0 disjunction GFDL, with LaTeX source. He says the linear algebra text is “a popular text”. I haven’t reviewed the books yet, but some people seem to like them, though others tar them as unrigorous. And they come with exercise solutions and video lectures.

SICP is under cc-by-sa 4.0. I think Structure and Interpretation of Classical Mechanics is under cc-nc-by-sa 4.0. It’s using MathJax.

Mathematics for Computer Science is a cc-by-sa 987-page PDF covering things like graphs, satisfiability, and linear recurrences.

I am greatly enjoying Reuleaux’s presentation of kinematics, which is in the public domain due to its age. However, the idea of reducing it to files of a manageable size seems daunting.

I really liked MacKay’s [Sustainable Energy Without the Hot Air]. Disappointingly, his book on information theory is not available under a free license, and neither is Without the Hot Air/SEWTHA as it turns out.

PLOS ONE has a systematic reviews category, but most of the 1507 reviews therein are pretty narrow: “Healthcare-associated infection and its determinants in Ethiopia: A systematic review and meta-analysis” and the like, although “Fecal microbiota transplantation in inflammatory bowel disease patients: A systematic review and meta-analysis” sounds pretty interesting.

On the topic of formal logic, Sean Palmer recommends forall x, Tree Proof Generator (usable online at https://www.umsu.de/trees/), and the whole Metamath website, which is in the public domain, including things like the proof that √2 is irrational.

Gwern licensed his entire site under CC0. It mostly discusses IQ, epistemology, pharmacology, IQ, deep learning, other aspects of AI, statistics, genetics, IQ, politics, psychology, biology, programming, economics, and IQ, but occasionally strays from that focus. Uses Mathjax. He explains his motivation:

The goal of these pages is not to be a model of concision, maximizing entertainment value per word, or to preach to a choir by elegantly repeating a conclusion. Rather, I am attempting to explain things to my future self, who is intelligent and interested, but has forgotten. What I am doing is explaining why I decided what I did to myself and noting down everything I found interesting about it for future reference. I hope my other readers, whomever they may be, might find the topic as interesting as I found it, and the essay useful or at least entertaining–but the intended audience is my future self.

The source code of his pages (in a Pandoc-implemented language derived from Markdown) is accessible by appending .page to the URL. He says the whole thing is kept in Git, but I don’t know where.

Topics

Derctuo (9 notes)
Archival (5 notes)
Wikipedia
Textbooks
Public domain
Creative commons