Let ChatGPT Analyze Source Code from URL [Live Crawled]

Link to Prompt

What the Prompt does

  • Crawl the URL given in the Prompt (selected plans only)
  • Analyze the Source code by up 3 aspects important to the user
  • Specify three important aspects as VARIABLE1, VARIABLE2, VARIABLE3

Example Prompt Output GPT3.5

Input:

VARIABLE1 = security
VARIABLE2 = site speed
VARIABLE3 = code comment
PROMPT = https://www.aiprm.com/blog/does-aiprm-use-wifi/

Output:

Example Prompt Output GPT4

Input:

VARIABLE1 = security
VARIABLE2 = site speed
VARIABLE3 = code comment
PROMPT = https://www.aiprm.com/blog/does-aiprm-use-wifi/

Output:

Screenshot is only useful if it is about Emoji, Images or Mermaid charts rendered by ChatGPT,
in all other cases the text is better. Why? Because it’s indexable.

How I came up with it

If we’re crawling URLs, we can as well analyze their source code.

More insights

This will most likely always disappoint with GPT-3/3.5 from the first tests.

1 Like

Fascinating that GPT4 didn’t pick up the code comments that 3.5 did.

1 Like

Yes, exactly… it’s all in there.

But you know, it’s source code… not very nice to read :slight_smile:

1 Like

I’m so used to looking at source code that it doesn’t bother me. :rofl:

Neither one made any reference to a robots.txt or xrobots headers (or meta robots), and I’d certainly say that making sure they did (as it does pertain to data security) would be an improvement.

1/

robots.txt would be 2nd URL to fetch, which is not implemented at the moment

PLUS: if we really let ChatGPT interpret it, it would cause a lot of nonsense, too

2/ the HTTP headers would be interesting to inject as well

1 Like

Nice work.
I think you would use this prompt to fix the site optimization issues.

1 Like

I’m not web design novice, so I’m curious what other “aspects” would be useful other than “security, speed, and comments”. Maybe a list or queue for common aspects of websites that would be interesting to other professions might be a cool feature to make this tool even more powerful for the average user.

1 Like

I should also say well done, this is really cool.

1 Like

@KD7JHD

As I tried the prompt, I think the best aspects that can be considered are:

  • Clean Code
  • Performance
  • Layout / Design
  • Responsiveness
3 Likes

Keep them coming, we will be able to suggest these to users, soon

1 Like

Absolutely not. This is like having a set of bathroom scales to test your weight, and someone suggesting you could use that to diagnose illnesses, in place of MRIs, CT scans, Ultrasound, and years of study. :smiley:

@RealityMoez Thank you for the tips. I really appreciate it.

I have conducted experiments using various combinations of three variables, as well as permutations involving no variables or 1 to 2 variables. However, one of the major factors that requires attention is the issue of crawlability. This variable has a significant impact on the outcome and is strongly correlated with source code length and quality, among other factors.

In my opinion, if the source code is excessively lengthy, it may result in an incomplete analysis due to throttling or capping, leading to a compromise in the quality of output and results. In summary, the prompts I have used have proven to be effective within their crawlable limits. However, exceeding this limit results in a margin of diminishing returns as the program is unable to access anything beyond the sum of its parts, rather than the sum total.

3 Likes

Yes, true.

It’s about the context length limit ChatGPT can handle…
(which is 4000 tokens for GPT-3.5, and 8000-32000 for GPT-4)

I think this prompt would work great using GPT-4, for large code websites.

3 Likes

I was actually wondering that sort of thing, what with all the talk of control etc. I suspect there will eventually maybe be some sort of legal signatue required to be left somewhere on content, i.e on images? irrespecive of whther is pontless or not. BUt i actually digress, Robots.txt, thats a legal requirement o abide by and for all public intents it works, do you think we will have a robots no follow code for lllm’s?

1 Like

If you stood on the scales yesterday and are stone ligher today, it probably suggest illness tho right? so it can be a ‘ballpark estimate’ in some cases :slight_smile:

To an extent, a robots.txt can indeed block being a part of the ‘common crawl’ data that is used in some LLMs, including GPT. Google (and thus almost certainly DeepMind by extension) obviously tend to use their own crawl data since they already have it.

However, as more and more ‘AI powered’ tools come along, you can bet that more of them will disobey or completely ignore the Robots Exclusion Protocols, just as any of the tools that snapshot Google SERPs to track ranking positions do (all search results are blocked to all crawlers).

As for the scales - what if you simply wore more clothing, heavier boots, etc the previous day? Machines, of all kinds, are easily fooled, because they only ever check what they were told to check, and any safety precautions or integrity safeguards all have to be planned out in advance. They don’t just notice when something is wrong, unless they were specifically programmed to notice that exact kind of error.

2 Likes

good point… in fact sometimes they insist they are right even when you insist they are wrong lol

1 Like