NEW: AIPRM Live Crawling

AIPRM is getting Live Crawling

:robot: ChatGPT only has info until mid-2021
:female_detective: Current/newer events are unknown
:spider_web: Live Crawling can solve this by crawling web pages in real-time
:new: The Live Crawling Feature allows for injecting fresh content from crawled websites.

Here’s a quick teaser video

I guess Im confused here. I just asked GPT 3.5 a similar questions and got a result not much different than the video.
I know the training data is 2021 but how can you tell that it is actually crawling the url versus reading data from 2021?


I’m 100% in agreement with @GradyMedia on this, in that I need to see that we are getting data that could not possibly have been predicted, and is radically different to that gained any other way.

I know it can be done, you just need to find the right topic to demonstrate that it is being done, and can’t be gained without AIPRM having extracted the data from the URL first to prompt with.

Isn’t it ChatGPT guessing the URL content ? (without the upcoming feature)


Yes. Without something providing freshly crawled data into a prompt all that ChatGPT can do is predict based on patterns what is statistically the most probable result. But this vide is, I think, meant to be showing AIPRM first taking a URL from the user, then crawling that url, extracting the text from the URL, putting that into a dynamic prompt, and only then passing that prompt, containing the crawled data, over to ChatGPT.

Incidentally, some kinds of URLs are far easier to predict. Anything to do with most standard formats, for example has a standard pattern - press releases, news stories, all then to have very clear, easy to predict patterns.

Take something like a news page about one company acquiring another. If the URL is something that included [Company 1] acquires [Company 2] then we can predict tons of information on that page just from the fact those types of stories have such a clear and standard format.

One company acquires another. The one doing the acquiring is obviously the bigger, because it could afford to buy out company 2.

The page will summarize who company 1 are and what they are famous for, which is likely to be the same stuff they were famous for (or working on) just 2 years ago.

The page will summarize who company 2 are in the same sort of way.

They’ll usually stress the thing that both companies are famous for as a shared interest, since statistically that common interest is why company 1 will have bought our company 2. Especially if it is an interest that is a main focus for company 2, and something that company 1 are less known for but known to be expanding in (2 years ago).

There’s always a statement about how Company 1 expects to expand its capabilities in [insert shared interest that was a primary fame point for Company 2] thanks to the acquisition.

See how that pattern is so predictable?

So to show that this is not just prediction, we need to see details that could not be predicted from data 2 years ago.


Ok, good points, so Live Crawling Feature will solve our concerns, right ?
(by extracting the text from the URL to the [CRAWLEDTEXT] area …)


You are all very right.

Even without this feature the inference in the earliest ChatGPT was so good that people STILL think that ChatGPT can crawl.

I was asked about it just yesterday in a message.

I wrote this explanation for it, two months ago

With the AIPRM Live Crawling it will be possible to inject actual web page data, and that will be especially useful combined with the now larger token context in GPT-4.

Chatgpt does not have internet access. However, it does have access to archives till 2021. So whatever data it is extracting from a given URL is not actually due to a prompts ability to live crawl. It is because it is trained on phenomenally huge data, where data from these publicly available URLs, is most likely, included. Therefore, it appears as if it is live crawling. The fact however is, that it is not. Hence, it is misleading to say that the prompt is live crawling.

Live Crawling is solving that bottleneck, AIPRM Live Crawling extracts data from that URL to fed into
the prompt, so ChatGPT doesn’t rely on its 2021 trained data.

Hence, Why it is misleading?

Sorry mate, not quite sure what you mean. Please help understand. Many thanks

Here’s an example:

with this prompt:

