Chunk - ░ ░ ░ ░ ░

**Links to**: [[Language models]], [[Language]], [[04 Concepts as pre-dictions]], [[Semantic attractor]], [[Pattern]], [[Reference]], [[Metaphor]], [[Analogy]], [[Intuition]], [[Cognition]], [[Logic]], [[Invention]], [[Virtual]], [[Lapse]], etc. See also: [[Parse]]. &emsp; ### To chunk is to make patterns amenable to parsing, to _pattern them otherwise_. However, both can sometimes be confused (e.g., when we encounter new patterns). &emsp; We use this word, sometimes literally and sometimes metaphorically, to refer to the way(s) in which _unstructured or otherwise unmanageable_ (i.e., unpredictable) information is divided up, in order to make it _parsable_. The reason _chunk_ and _parse_ can be interchangeably used is because sometimes we may parse an already predicted chunk, and sometimes parsing implies novel chunking. >By _chunk_ is meant here the smallest significant language unit that (1) can exist in more than one context and (2) that, for practical purposes, it pays to insert as an entry by itself in an [mechanical thesaurus] dictionary. Extensive linguistic data are often required to decide when it is, and when it is not, worthwhile to enter a language unit by itself as a separate chunk. > >Masterman ((1956) 2005, p. 83). If, according to Masterman, science is the experimental creation of analogies ((1980) 2005, pp. 296-8), we take the liberty to expand on the concept of _chunk_—which she beautifully treats in reference to how much can be said in one breath—and apply beyond the context she refers to. All chunking a decidedly historicomaterial, evolutionary matter. For us, therefore, _chunk_ is significance; salience, in general. A poem is a chunk, an hour is a chunk, a person is a chunk. We also use the word _parse_ technically but flexibly (sometimes interchanged with _chunk_), it can be applied to any system which filters (language) data, human or otherwise. This means organizing tokens (e.g., as in the _chunking_ done through LLMs) and/or rendering a token (an event, a singularity) essentially predictable under a type (i.e., a conceptual category to parse reality). “The interpretation of encoding can take the form of a program that outputs instances of [a] type or alternatively the construction of a class of paths in a continuous space.” (Cavia 2024, p. 13). The metaphor is particularly apt in our current scenario: both human systems and LLMs have **information capacity limitations** that make chunking necessary, and parsing possible. Humans, in the flesh, have memory and physiological limits (I suspect _the infinite_ does not (always) make us go insane because we eventually always fall asleep and are mortal) and LLMs have “context windows” and “attention mechanisms” to enable parsing. Both humans and machines segment continuous streams of information into discrete units allowing for processing, and although the underlying mechanisms may differ significantly, the metaphor is employed in order to run into their differential effects and explore, precisely, how it is that they are different. This strategy borrows from the productive generativity that has been the “animal as machine” or “mind as machine” historical metaphor. Saying how they are alike signals, often, how they differ, leading to new concepts and ways to interact with reality. &emsp; ### Examples Please notice how interpretation changes in terms of **predictive** and **generative** effects when we switch out _chunk_ for _parse_ in the examples below. Chunking implies selecting something to focus on: inventing, discovering or establishing a standard as a pattern, whereas parsing implies consuming that pattern. 1. We chunk continuous speech-sounds into discrete phonemes, syllables, words, and sentences and music into notes, measures, phrases, and movements; text into paragraphs, sections, and chapters; conversations into turns, topics, and discourse segments; narratives into exposition, rising action, climax, and resolution; social interactions into scripts (greeting, small talk, parting); 2. We chunk visual scenes into foreground/background elements and gestalt patterns; 3. We chunk memories into episodic events with beginnings and endings; or intro abstractions that distinguish between semantic, schematic memory and genetic or evolutionary memory; 4. We chunk complex faces into features (eyes, nose, mouth) for recognition; 5. We chunk geographic space into territories; 6. We chunk the electromagnetic spectrum into distinguishable colors; or into RGB or CYMK values or hexadecimal codes; 7. We chunk continuous temperature into discrete sensations (cold, cool, warm, hot); 8. We chunk the human lifecycle into childhood, adolescence, adulthood, and old age; 9. We chunk astronomical events into spatial relations and measurable distances; 10. We chunk knowledge and know-how into disciplines and subdisciplines, specializations; 11. We chunk numerical information myriads of abstractions, spanning many fields; 12. We chunk continuous probabilities into qualitative assessments (unlikely, possible, certain); 13. We chunk complex systems into hierarchical layers of abstraction; 14. We chunk problem-solving into discrete steps and procedures; 15. We chunk data into bits, bytes, kilobytes, and gigabytes. 16. Etc. >What has happened, in the passage from the Heyday of Ideas to the Heyday of Sentences, is that philosophers with an exaggerated reverence for mechanism have tried at all costs to find something in language to mechanise. Grammatical transformation (Chomsky), propositional connection (Russell et al.), verificational systematisation as between fact-sentences and first-order or other predicative sentences (Tarski, Quine, Montague, Davidson), systematisation of speech-acts (Austin to the future through Searle): they have all done it. What all these philosophers have forgotten, when calling the resulting systematisation ‘a language’, **was that all the rest of what was really there in language – and all that really matters about it, once you are no longer doing logic – was still being fully and efficiently processed by them themselves, intuitively, subliminally, non-consciously**. But now, in the computer world of word processing, we put real language into a real machine; and this machine really is an inert mechanism: it has no sublimen. And the result of this, of course, is that all the semantically shifting layered and interlacing depths of language – all the most Coleridge-like features of this frightening and volatile phenomenon of human talk, the very foundation of thinking – are now progressively coming out into the light. > >Masterman (1980) 2005, p. 286, our emphasis in bold. %% Metrology : Sara walker working on it [[[[Assembly theory]]]] chemicalbonds as minimal units https://youtu.be/zhzxQraB2m0?si=bGqkJQgDcyBc-jHN