Let ChatGPT Analyze Source Code from URL [Live Crawled]

aiprm-christophc · March 28, 2023, 12:40pm

Link to Prompt

What the Prompt does

Crawl the URL given in the Prompt (selected plans only)
Analyze the Source code by up 3 aspects important to the user
Specify three important aspects as VARIABLE1, VARIABLE2, VARIABLE3

Example Prompt Output GPT3.5

Input:

VARIABLE1 = security
VARIABLE2 = site speed
VARIABLE3 = code comment
PROMPT = https://www.aiprm.com/blog/does-aiprm-use-wifi/

Output:

Analyzing: “Does AIPRM use the WIFI in my house? · AIPRM”

Source URL: Does AIPRM use the WIFI in my house? · AIPRM

Upon analyzing the source code of the given URL, the following observations can be made:

Security

The website is using HTTPS protocol which ensures secure data transmission between client and server.

Content Security Policy (CSP) is implemented to restrict the sources from which the website allows resources to be loaded, thus reducing the risk of cross-site scripting (XSS) attacks.

A script-src CSP header is set which restricts the loading of scripts to only those hosted on the website’s own domain, ensuring that scripts from external sources are not executed on the website.

Site speed

The website appears to be using a responsive design, as the viewport meta tag is used to adjust the layout based on the device’s screen size.

The website is using a minified CSS file, which can improve page loading speed by reducing the file size.

The website is using a minified JavaScript file, which can also improve page loading speed by reducing the file size.

Code comments

The source code has a few comments which provide some information about the purpose of certain sections of code. For example:

: This comment explains the purpose of a link at the top of the page.

: This comment explains the purpose of some meta tags related to social media sharing.

: This comment explains the purpose of a meta tag used to verify ownership of the website in Google Search Console.

Example Prompt Output GPT4

Input:

VARIABLE1 = security
VARIABLE2 = site speed
VARIABLE3 = code comment
PROMPT = https://www.aiprm.com/blog/does-aiprm-use-wifi/

Output:

Analyzing: “Does AIPRM use the WIFI in my house? · AIPRM”

Security

HTTPS usage: The website is using HTTPS, which means the connection between the user and the website is encrypted and secure.

External scripts: The site uses several external scripts, such as:

https://productive-legendary.aiprm.com/script.js

https://cdn.dcodes.net/2/payment_icons/dc_payment_icons.css

Google Chrome Webstore, LinkedIn, YouTube, and Twitter Ensure that these external resources are trustworthy and secure to prevent potential security issues.

Site Speed

Minified resources: The website is using minified CSS and JavaScript files to reduce file size and improve loading time.

Image optimization: The site uses WebP format for images, which is a modern image format providing superior compression and quality compared to traditional formats like JPEG and PNG.

Code Comments

No code comments were found in the provided source code. Including comments in the source code can be helpful for other developers to understand the purpose and functionality of specific parts of the code.

Additional Observations

Meta tags: The site uses various meta tags for SEO purposes and proper display on social media platforms, such as Open Graph and Twitter Card tags.

Schema.org structured data: The site uses JSON-LD for structured data, providing information about the organization, website, and breadcrumbs for search engines.

Responsive design: The site uses the “viewport” meta tag and CSS media queries to adjust the layout for different screen sizes and devices.

Skip-to-content link: A skip-to-content link is provided, which helps improve accessibility for keyboard-only users and screen reader users.

Screenshot is only useful if it is about Emoji, Images or Mermaid charts rendered by ChatGPT,
in all other cases the text is better. Why? Because it’s indexable.

How I came up with it

If we’re crawling URLs, we can as well analyze their source code.

More insights

This will most likely always disappoint with GPT-3/3.5 from the first tests.

Ammon · March 28, 2023, 12:47pm

Fascinating that GPT4 didn’t pick up the code comments that 3.5 did.

aiprm-christophc · March 28, 2023, 12:49pm

Yes, exactly… it’s all in there.

But you know, it’s source code… not very nice to read

Ammon · March 28, 2023, 12:54pm

I’m so used to looking at source code that it doesn’t bother me.

Neither one made any reference to a robots.txt or xrobots headers (or meta robots), and I’d certainly say that making sure they did (as it does pertain to data security) would be an improvement.

aiprm-christophc · March 28, 2023, 1:14pm

1/

robots.txt would be 2nd URL to fetch, which is not implemented at the moment

PLUS: if we really let ChatGPT interpret it, it would cause a lot of nonsense, too

2/ the HTTP headers would be interesting to inject as well

RealityMoez · March 28, 2023, 2:29pm

Nice work.
I think you would use this prompt to fix the site optimization issues.

KD7JHD · March 28, 2023, 4:52pm

I’m not web design novice, so I’m curious what other “aspects” would be useful other than “security, speed, and comments”. Maybe a list or queue for common aspects of websites that would be interesting to other professions might be a cool feature to make this tool even more powerful for the average user.

KD7JHD · March 28, 2023, 4:52pm

I should also say well done, this is really cool.

RealityMoez · March 28, 2023, 4:58pm

@KD7JHD

As I tried the prompt, I think the best aspects that can be considered are:

Clean Code
Performance
Layout / Design
Responsiveness

aiprm-christophc · March 28, 2023, 5:21pm

Keep them coming, we will be able to suggest these to users, soon

Ammon · March 28, 2023, 10:13pm

Absolutely not. This is like having a set of bathroom scales to test your weight, and someone suggesting you could use that to diagnose illnesses, in place of MRIs, CT scans, Ultrasound, and years of study.

KD7JHD · March 28, 2023, 10:41pm

@RealityMoez Thank you for the tips. I really appreciate it.

Chandler_Cates · March 31, 2023, 4:17pm

I have conducted experiments using various combinations of three variables, as well as permutations involving no variables or 1 to 2 variables. However, one of the major factors that requires attention is the issue of crawlability. This variable has a significant impact on the outcome and is strongly correlated with source code length and quality, among other factors.

In my opinion, if the source code is excessively lengthy, it may result in an incomplete analysis due to throttling or capping, leading to a compromise in the quality of output and results. In summary, the prompts I have used have proven to be effective within their crawlable limits. However, exceeding this limit results in a margin of diminishing returns as the program is unable to access anything beyond the sum of its parts, rather than the sum total.

RealityMoez · April 25, 2023, 3:57pm

Yes, true.

It’s about the context length limit ChatGPT can handle…
(which is 4000 tokens for GPT-3.5, and 8000-32000 for GPT-4)

I think this prompt would work great using GPT-4, for large code websites.

Paz_LeBon · June 2, 2023, 9:48pm

I was actually wondering that sort of thing, what with all the talk of control etc. I suspect there will eventually maybe be some sort of legal signatue required to be left somewhere on content, i.e on images? irrespecive of whther is pontless or not. BUt i actually digress, Robots.txt, thats a legal requirement o abide by and for all public intents it works, do you think we will have a robots no follow code for lllm’s?

Paz_LeBon · June 2, 2023, 9:50pm

If you stood on the scales yesterday and are stone ligher today, it probably suggest illness tho right? so it can be a ‘ballpark estimate’ in some cases

Ammon · June 2, 2023, 10:52pm

To an extent, a robots.txt can indeed block being a part of the ‘common crawl’ data that is used in some LLMs, including GPT. Google (and thus almost certainly DeepMind by extension) obviously tend to use their own crawl data since they already have it.

However, as more and more ‘AI powered’ tools come along, you can bet that more of them will disobey or completely ignore the Robots Exclusion Protocols, just as any of the tools that snapshot Google SERPs to track ranking positions do (all search results are blocked to all crawlers).

As for the scales - what if you simply wore more clothing, heavier boots, etc the previous day? Machines, of all kinds, are easily fooled, because they only ever check what they were told to check, and any safety precautions or integrity safeguards all have to be planned out in advance. They don’t just notice when something is wrong, unless they were specifically programmed to notice that exact kind of error.

Paz_LeBon · June 3, 2023, 1:24am

good point… in fact sometimes they insist they are right even when you insist they are wrong lol