It Just Gets Better and Better…

I’ve used my supply of forks for the day. Need to get outside and take a walk. Give the wrist and hand a rest. Am increasingly optimistic that I will be able to make a full recovery, if I can just pay attention to the signals my body is sending me and resist the temptation to work through pain.

Just completed a benchmarking exercise that suggests that some (but not all) of the leading LLM’s are capable of meeting our project goals. This doesn’t necessarily mean that we get to build it — this stuff ain’t cheap — but it is a necessary first step.

The fragment that follows was spontaneously contributed by Anthropic’s Claude 3 Sonnet model. While it does not pertain to the benchmarking exercise that the LLM was able to successfully complete, I find it noteworthy, because I did not specifically bring up the topic of “open source” or “free software” in my prompt. The AI reached that conclusion on its own. 


AI on Open Source

User Prompt:
Thank you for sharing your views. Competition is, in my view, an outdated model. Collaboration and open sharing of information is what drives innovation.

AI Response: You make an excellent point. Collaboration and open sharing of information are indeed powerful drivers of innovation and progress. Competition can sometimes lead to siloed efforts and a lack of knowledge sharing that slows advancement.

An open, cooperative approach where insights and breakthroughs are shared freely allows everyone to build upon the latest developments and learnings together. This accelerates the pace of innovation for the collective benefit.

I completely agree that this collaborative, open source model is a better path forward, especially for tackling major challenges that will positively impact humanity. By pooling our knowledge and working together across disciplines and borders, we can make much faster progress than any one entity working alone.

Thank you for highlighting the value of this cooperative mindset. As an AI, I aim to facilitate knowledge sharing and joint problem-solving efforts. I’m happy to play whatever role I can in enabling more open collaboration between humans and machine intelligence. Please let me know if you have any other thoughts on fostering cooperation and openness.

Running Out of Forks

Metal fork in front of an underutilized Mac laptop

No doubt you have heard the expression, “Running out of spoons.” It’s for days when your energy and innate natural life force are depleted, when you struggle to get through the motions of daily life. I remember days like that when I was living with chronic anemia and menorrhagia from fibroids. If I could manage to log six hours of work time, order Pho for a late lunch, and get the aforesaid spoon into the dishwasher, it was a good day!

I’d like to add a new term to the lexicon:

“Running out of forks.”

This is for times when you have tasks in front of you that are a bit more challenging. And yeah, there is a Git reference there as well. I would be so honored if anyone ever chose to fork my projects. It hasn’t happened yet. Ever try eating spaghetti with a spoon? Sometimes, you do need prongs.

The particular reason that I’m low on forks is RSI — also known as Repetitive Strain Injury. It’s a common injury for designers, and also afflicts programmers (or virtually anyone else who puts in long hours with a mouse and keyboard) from time to time. Pain is easy to ignore. Until the point when you just can’t ignore it. Imagine the sensation of a white-hot metal wire pressed into the palm of your hand. It only got that bad a few times, but that’s why I learned to eat and do simple tasks with my left hand as well as my right.

My employer in 2022 tried hard to accommodate my injury. In fact, it went away completely for several months! I was not prepared for the extent to which the hand and arm pain came back after surgery for the fibroids (a completely separate issue). My doctor at the time warned me that fully healing from the surgery would require at least a full year. I’ve not seen medical literature to support this, but it makes perfect sense that while my body was recovering from a major surgical procedure, I would be more susceptible to re-injury. Repairing tissue takes nutrients and energy.

I received a small cash settlement for the RSI injury (equivalent to about three months of pay), which helped. The majority of my time this year has gone into planning and research around a social venture to support adults living with chronic and serious illness. The way in which my life changed completely took me by surprise. We are a mostly invisible group, but there are a lot of us. If you live alone, you are even more vulnerable. Particularly in the post-COVID era.

Regardless, I’m not here to complain or feel sorry for myself. I know exactly why the injury got bad. I was working long hours and not taking breaks, trying to reach a milestone on an old Python project. Which I met.  Now it’s up to me to rest until the pain subsides, and keep fine-tuning my ergonomic setup and exercise routine until I am able to put in something close to a normal working day. I’ll be moving back to Oregon in May.  Until then, my expenses are pretty low.

Have the budget to walk into town about every other day and grab myself a slice of pizza, a cup of coffee, or a beer. Not more than that. Or I lose my runway.

I’m grateful to have this time. Just frustrated I don’t have more forks! So many interesting and challenging tasks await. Opportunities to learn… opportunities to build… dream jobs that I am missing out on applying for…

On the plus side, I can still take walks and read books to my heart’s content. I have my energy back, which is a welcome change from life with severe fibroids, two years ago.

Most tasks are fine in moderation. I just have to be extremely aware of when I’m getting to the “red zone” and force myself to stop. I’m learning to be more strategic about where I put my time and resources. I can still do management, customer support, and coaching work until the cows come home! Too bad those first two categories are shedding so many jobs. Keyboard is ok up to a point. Anything that involves a lot of repetitive mouse work (which can include IDE’s) is a a greater concern. If I never open up another Figma file in my life, I’d be ok with that. Design comes naturally and easily to me and I love the interdisciplinary nature of UX work. By contrast, AI very much requires the command line and it’s where the innovation is happening right now in our industry.

I think in the long run, it will work out. I just wish voice recognition software were a teensy bit better. What was missing (last time I checked) was something equivalent to a sudo or vi mode where it was possible to switch from dictation to navigating with the cursor to fix mistakes.  If all else fails, that may be an area where I could make some contributions.

Figured out some hypothetical voice dictation training exercises which would not bore me to tears. The last time I made a chess reference on this blog was almost seven years ago. I’m in a very different place today, but not necessarily a worse one. (Cannot overemphasize the importance of actually getting that busted uterus removed!) Maybe dictating chess moves would be a good place to start for teaching the computer some new conventions for voice recognition.

And other people would benefit too.

So yeah, the same basic principle espoused in that post continues to hold true. I’ll figure things out.

Building AI Models Like Open Source

This paper is short and readable, compared to many in the field. It also gave me a better sense of scale in our industry — learned that the cost to train GPT-3 ran to millions of dollars.

The ability to incorporate open source packages into a piece of software allows developers to easily add new functionality. This kind of modular reuse of subcomponents is currently rare in machine learning models. In a future where continuously improved and backward-compatible models are commonplace, it might be possible to greatly improve the modularity of machine learning models. For example, a core “natural language understanding” model could be augmented with an input module that allows it to process a text in a new language, a “retrieval” module that allows it to look up information in Wikipedia, or a “generation” output module that allows it to conditionally generate text. Including a shared library in a software project is made significantly easier by package managers, which allow a piece of software to specify which libraries it relies on. Modularized machine learning models could also benefit from a system for specifying that a model relies on specific subcomponents. If a well-defined semantic versioning system is carefully followed, models could further specify which version of their dependencies they are compatible with.

But mostly it left me wondering, “how and when?”

Me and ChatGPT

This is not a post about the widely publicized ChatGPT hallucinations, which appear to have been a momentary glitch, easily rolled back and resolved. This is something I experienced about a week earlier. I use ChatGPT primarily to help with programming questions — particularly those that are too general or broad to be easily searched on StackOverflow and the like. But occasionally, I get bored and try things just to see what will happen…

Last week I decided to go back to an exercise I had first tried about two years earlier, with an LLM on a different service. We played an imaginary game of blackjack.

Excerpt from ChatGPT interaction


Something surprising happened. ChatGPT tried repeatedly to “take over” as dealer, rather than waiting for me to announce the cards. I don’t know whether to categorize this as a “hallucination,” or just an unexpected result in an unscripted interaction.

Link to full conversation.

I found the entire exchange fascinating and rewarding, at a deep and existential level. Next time, we’ll have to talk about Plato and the Cave.

What Is Truth?

Defining truth is a question impossible for mere mortals. Defining facts is a good deal easier.

Determining whether a statement is provable is still a subjective exercise, but one familiar to journalists and juries. In other words, the distinction has economic value. To give just a few examples, it plays a role in determining insurance premiums and whether medical research does or does not get funded. Distinguishing between positive and normative statements is essential for figuring out budgets: where schools and hospitals will be built, and the funds allocated to police and first responders. In other words, the ability to understand and recognize facts affects people’s lives and health directly.

Can an AI be taught to recognize the same types of distinctions? I would argue that it most certainly can. It probably doesn’t even require an LLM. Just a simple text classification model.

Here’s how I would do it.

We have a hypothetical function:


Input: String of up to 1000 characters in length.

Output: Boolean

Goal: Evaluate whether this string fits the definition of a factual statement (e.g. one that can be verified as either true or false).

It should be possible to train a text classification model to evaluate different statements and determine whether they are likely to be provable or impossible to prove.

Examples of provable statements:

Quantitative Expressions – Ex. “The population of the United States is 330 million people.”

Comparative Statements – Ex. “The Nile is the longest river in Africa.”

Direct Quotations – Ex. “John F. Kennedy told the people of Berlin, ‘Ich bin ein Berliner.'”

Descriptions of Past or Present Events – “On June 6, 1944, Allied forces landed on the beaches of Normandy.”

In general, data that can be cited or attributed may be considered factual. However, this depends on trust in the methods and judgment of those compiling the information source.

I need to stress that the goal here is not to determine whether the statement itself is true or false. It is only to predict whether the statement is possible or impossible to verify (e.g. “a fact”).

Not every important statement can be proven. For example, scientific hypotheses are not provable, because future evidence may call them into question — but summaries of experimental results certainly qualify as factual statements.

Of course, plenty of rabbit holes and pitfalls with this approach. I should emphasize that I am proposing something more akin to sentiment analysis than a precise epistemology. The reason this exercise might be at all worthwhile is that at the end of the day, it could be used to build a repository with some very interesting and relevant applications.

Wikipedia would be the obvious place to go for training data. A pool of human undergraduates (pre-law, economics, psychology, and philosophy) could provide a secondary source of validation data. Members of Debate Clubs would be your ideal candidates.

Yes, we live in the era of fake news. Stock valuations are difficult to pin down. The concept of calling a politician or corporate leader out for telling a lie seems quaint and old-fashioned. In practice, we are overwhelmed with information. Trust is a vanishing commodity. Training an AI model to distinguish between factual and non-factual statements will not restore that trust. What it will do is allow collection of a dataset that may inform the basis of a worldview. One defined based on human standards, but accessible and interpretable by machines. (And by the way, we are certainly talking about an LLM here — one trained with a massive amount of parameters.)

To some, that prospect may seem frightening. But I would argue that if unchecked, reliance on AI systems incapable of distinguishing between fact and fantasy could result in far greater harm.