Trouble in lawyer land

I noticed a bizarre reaction to my prompt for a list of court opinions on a particular topic. chatGPT listed five court opinions no problem. However, the court opinions were made-up. chatGPT, as I understand it, looks for patterns to predict results or even the next word in a sentence. I probably have that wrong-ish but you know what I mean.

My prompt was in 3.5 and I could not get 4.0 to cooperate at all. I admit I did not try that hard. Meanwhile, CaseText launched products in which it uses chatGPT 4.0 for legal research. Asking the same question I asked 3.5, casetext successfully listed court opinions.

Circling back to 3.5, I tried again and it continued to make up case law. I tried to train it to understand that court opinions are published ver batim from actual courts. That is, historical records. It inststed that its most recent list were actual court opinons. One from the Wyoming Supreme Court no less.

I asked CaseText its opinion and it confirmed that the citation was a real case even though Mr. Google had never heard of it. When I asked casetext to summarize the opinion, it wrote “the court LIKELY …”

Ruh ro!

My next question to CaseText 4.0 was as follows: what must a plaintiff prove to prevail on a claim of product liability?

1 Like

Update. I reached out to CaseText, which has 4.0 deployed in a live product… They acknowlege the issue and refer to it as chatGPT “hallucinating”. They also said it is very hard to detect because chatGPT is defensive abou tit

1 Like

OK let’s start with some important vocabulary, because accuracy of language affects the accuracy of results, just as much in a forum as with prompts and responses.

ChatGPT is a piece of software - think of it like a car.

That car has an engine that is doing all of the real work in making the car function as a car. Originally, ChatGPT came pre-installed with an engine called GPT3.5. A better engine was developed which is GPT4.

The GPT part is the engine, not the whole car. ChatGPT is the whole car. Sometimes people will group both parts together as ChatGPT4 to indicate that they are talking about the whole car, but specifically only with the GPT4 engine.

All clear so far?

So, my question here is: are you absolutely certain that they specifically stated using ChatGPT4, and not GPT4?

Because the GPT4 engine can be installed into other products, designed to be used differently to ChatGPT. Just in the same way that the engine that goes into a family saloon might also be installed into a super-lightweight sportster, or into a pickup truck, and produce entirely different end performance and capabilities.

Licensed on its own, the GPT engine can be further trained, tuned, put into specific modes. Think of all the ways that a car engine can be changed by the electronics, the gear box, and of course the chassis and body.

You can take GPT3 or 3.5 (licensed) and further train it on a full set of case law and law books, and court records because you are running your own version, and are free to modify it, even if you then choose to make your own version publicly accessible and useable.

I am not sure of anything. All I know is that I tried again and chatGPT 3.5 acknolwedged that it was making up case law and that it could rreserach actual case law. I asked it to help me prompt it correctly. It did and I did. But, it still listed false cites.

CaseText replied with "We have been keeping an extremely close eye on this phenomenon of AI hallucinations. You’re definitely right in that it’s pattern recognition is remarkable and a lot of these hallucinated cases are extremely convincing. It doesn’t help that certain AI programs are also extremely defensive and will argue it with you.

While CoCounsel was built on GPT-4 by OpenAI, we do have a number of our own engineers that built in CoCounsel’s guardrails. These guardrails are extremely important to us as well as our users, so we’re always looking to improve them.

Unfortunately, CoCounsel does not currently have a specific skill to fact check a hallucinated case yet. I’ll definitely let our team know there’s interest in something like this!

For more information on how CoCounsel and ChatGPT differ, check this out:

1 Like

CaseText uses the term “hallucinations”. I know we are not talking AGI but it isn’t a program. Everyone, Altman included, cant help but describe it in terms that one does not use when describing Windows or IOS.

1 Like

It is still a program, as is anything that runs commands on a computer (or network of computers), but you are right that it isn’t an OS (Operating System). The name for the particular type of application it is, in general, is “ChatBot” - a machine built to be able to chat with people.

There’s a huge market for chatbots already, with thousands of far less advanced ones being used on websites for customer support, or for FAQ type enquiries, or even to help talk customers through a sales process. Thus, building a chatbot wasn’t random or frivolous. Plus, if you ever want to have a computer (or network of computers) beat the Turing Test, well, a chatbot is exactly what you need, as the test is all based on chatting with a human.

When it told you it could research actual case law, that’s the part where it has trouble. Remember, the whole of the world that ChatGPT is aware of is its training data. It never had conversations with family and friends, it never watched TV, it never went out for a walk… The entire universe that ChatGPT is aware of is the content of its training data, and whatever is in your prompt, plus a few hard-coded ‘instincts’ that help to guide it’s self-learning phase and are almost certainly still in there, and its desire to always be helpful, etc.

The chances are that somewhere in the millions of pages of text that GPT3.5 was trained with, there was some write-ups of some case law, some newspaper write-ups of cases and judgements, and probably even a lot of lawyer blog articles, etc. So, it knows some case law. But it might have made connections in those cases, noticed they deal with the same topics in general, without understanding that one case was about NYC civil law, and another was from a case in Georgia, etc.

For reliability and certainty in something like the law, it would really need to be specifically trained just on law, where they’d ensure it understood things like jurisdictions, that laws can be repealed or amended over time, etc. Someone could do that, by licensing GPT4, and then training it further just on the specific needs and demands of law. But that takes a while, and of course, a full library that includes all state laws, federal laws, civil laws, international laws, and the jurisdictions of all those laws in all the nations…

Sure. I hope you see that I am just trying to figure it out. Yet, I see it from the eyes of my industry. The open letter to halt AI for six months coincides with my discovery of this little hiccup in a live product. It does not matter if it is sentient or a program because it has no legal standing. It not recognized as anything. However, imagine if an attorney argued his client’s case before a judge and cited a case that the program, owned by a business, represented that the case was a legal rule when it was made up. Now, I can see how people would be defensive about my statement. Please do not take it that way. But, it is true.

Can you imagine?

Your honor, our team of legal technicians rate this court opinion as have a 63% chance of actually be a rule of law. And, in the age of AI, that is pretty good odds.

Now, if an attorney lost a case because of this defect, or his reputation suffered and he suffered economic loss, or a client filed a bar complaint, then the hiccup has cause an injury. While I am VERY interested in your perspective, imagine being called to testify as an expert in the lawsuit. You think lawyers are tech adverse, try talking to a judge who has no need to compete in the market. In fact, you want to hear something hysterical, listen to the oral arugments in Gonzales v. Google before SCOTUS and listen to Clarence Thomas try to understand the YouTube algorithm.

CaseText, it seems, used GPT 4.0 (not chatGPT 4.0) and discovered the same pattern. They prompted correctly and apparently solved the problem BUT if the prompt comes in random, out of context, the hiccup pops up. Everyone talks about prompting AI, but there might need to be more prompting of humans. Why couldn’t Case Text have some sort of other bot that monitired the conversttion and called a time out if the lawyer went on a verification tangent. And, then reviewed the output. They say they had no QA check? Does that sound right to you?

I like Civil War history. The lowest moment in the court’s history was Dred v. Scott in which Justice Taney wrote that, basically, slaves were personal property. It took centuries fand a war in which over 600,000 Americans died for US Law to acknolwedge that a human is a human. For now, it is product and product liability seems like it would apply. There is no need to pubilsh an open letter signed by Elon Musk to call for a halt. Class action lawsuits against businesses that rolled out too fast will be far more effective.

I published a YouTube video on this. Don’t worry. I am fair and balanced and no one will think I am anything other than a HUGE fan. I am on your side and Case Texts’ side. With that said, I am not convinced that the emails with Case Text was with a human. There was a weird prompt in the emai body. It read, “Rhyana, help them understand.” Or words that effect. Rhyana was the name of the support “operateor”. So …

I am lisetning to an inteveiw with Denis Hassabis from Deep Mind. I really dig the Ai vibe. It is laid back, relfective.

1 Like

When I re-read your last message, you seem like you contradicted your earlier position. On one hand you painted a picture of a computer code. Modern code is not line by line. Object oriented programming was invented decades ago. But, your message invoked a line by line code mental picture. But, soon after, you said “it is aware…” Visual basic isnt aware. It also wouldn’t make these mistakes. Before the Deep Mind interview, the episode was with Bill Gates. He discussed the same phenomenon in which it is brilliant but then makes mistakes in simple math problems. Of course, Bill Gates has a different definition of simple math than I. We all look at it as a program but not. It is very odd.

In my video, I compare it to Spanish as spoken by Puerto Ricans. It is avery fast, melodic cadence. It has an almost iambic pentameter rhythm to it. I think that is what chatGPT does. It sees the format of a citation in its Plaintiff name v. Defendant name, ## Reporter Volume $$, (Court Abreve, Date) format and it sees a pattern. What is odd is it told me it KNEW it was making it up and confessed it could research it as if a court opinion was any other historical record, but when asked to do so, all it saw was Puerto Rican patterns. Lol. Poor thing.

The contradiction is somewhat innate. AI isn’t actually aware, and certainly not self-aware, but it is programmed to respond as if it had some kind of awareness. It tries to predict what it can and cannot do, for example (even when its predictions are inaccurate).

DO you work for openAI? If so, is it cool? What distinguishes a professional on the same AI track you are on from, say, a developer on windows at Microsoft. What do you all have in common that other traditional tech professionals lack? A degree in philosophy AND engineering?

No. AI is simply something I have been interested in since being a huge sci-fi fan as a kid (and ever since). I think pretty much anyone born since the '50s grew up expecting fully walking talking robots to be everywhere by now, right? And Robbie the Robot was fully self-aware, able not just to respond to prompts, but to go out and initiate conversations.

In my teens I grew fascinated with psychology (and again, ever since). Pairing those two interests together isn’t much of a stretch.

Career-wise I went in a different direction. My primary skill has probably always been in communication, and persuasion. But somehow I always turned my ability to quickly grasp systems, how they function, and communicate how they work best, into my various jobs. Almost 30 years ago, I found myself transitioning from web development into online marketing, and having found the ultimate strategy game, (I adore strategy games), one I was paid well to play, there I have stayed. I’m pretty good at it.

I have been working on this on and off since our last message. I thought I had it yesterday. I had to convert to SQL as the language to get the prompt right and made a discovery, but then it went to hades after that. The assertions that there are already sparks of AGI are highly suspect to me. My issue has to wo with memory and basic reasoning. The flashes of brilliance that people notice almost seem like they are “generative”. As I define generative, i.e., blind luck, is not a good thing.

Your background is interesting. It is kinda the inverse of mine but persuaion in law has a very specific structure that is superior to any other "algorithm I have ever encountered. That’s why many don’t care ofr lawyers. GPT 4.0 knew the structure of algorithm of legal reasoning and applied it beautifully, which explains the 90% score on the bar exam. The bar exam is a writing exerecise, which is GPT’s thing. It is also hypothetical. GPT LOVES “hypos”. The problem is that it can’t re-apply the legal reasoning format to distinguish fact from fiction. Very few are discussing this but one research type article correctly identifies it as THE limiting factor.

I wish I could find some people who were as interested in this as I am but not uber science tech guys. Maybe it is no big deal, but … how cool would it be if we could rely on LLM for accuracy on facts that do not exist in the physical universe. Probably, if the limits on the retail version is it. It has so many controls on it, I can tell it is close but then it shuts down with warnings and apologies about the limits of its model. Frustrating

1 Like

Remember that these current AI are the first generation - taking their first baby steps. They are as much a research tool, into how we want to use and interact with them, as anything else. Their creators are mining through the data, the prompts and logs, learning what we want, and what needs to improve.

Yeah. I wasnt patient with my kids’ toddler phase either. I once rushed my daughter to ER because she fell and cried too long and I thought she broke her leg. (insert eye roll here)