Screenshot is only useful if it is about Emoji, Images or Mermaid charts rendered by ChatGPT,
in all other cases the text is better. Why? Because itâs indexable.
How I came up with it
If weâre crawling URLs, we can as well analyze their source code.
More insights
This will most likely always disappoint with GPT-3/3.5 from the first tests.
Iâm so used to looking at source code that it doesnât bother me.
Neither one made any reference to a robots.txt or xrobots headers (or meta robots), and Iâd certainly say that making sure they did (as it does pertain to data security) would be an improvement.
Iâm not web design novice, so Iâm curious what other âaspectsâ would be useful other than âsecurity, speed, and commentsâ. Maybe a list or queue for common aspects of websites that would be interesting to other professions might be a cool feature to make this tool even more powerful for the average user.
Absolutely not. This is like having a set of bathroom scales to test your weight, and someone suggesting you could use that to diagnose illnesses, in place of MRIs, CT scans, Ultrasound, and years of study.
I have conducted experiments using various combinations of three variables, as well as permutations involving no variables or 1 to 2 variables. However, one of the major factors that requires attention is the issue of crawlability. This variable has a significant impact on the outcome and is strongly correlated with source code length and quality, among other factors.
In my opinion, if the source code is excessively lengthy, it may result in an incomplete analysis due to throttling or capping, leading to a compromise in the quality of output and results. In summary, the prompts I have used have proven to be effective within their crawlable limits. However, exceeding this limit results in a margin of diminishing returns as the program is unable to access anything beyond the sum of its parts, rather than the sum total.
I was actually wondering that sort of thing, what with all the talk of control etc. I suspect there will eventually maybe be some sort of legal signatue required to be left somewhere on content, i.e on images? irrespecive of whther is pontless or not. BUt i actually digress, Robots.txt, thats a legal requirement o abide by and for all public intents it works, do you think we will have a robots no follow code for lllmâs?
If you stood on the scales yesterday and are stone ligher today, it probably suggest illness tho right? so it can be a âballpark estimateâ in some cases
To an extent, a robots.txt can indeed block being a part of the âcommon crawlâ data that is used in some LLMs, including GPT. Google (and thus almost certainly DeepMind by extension) obviously tend to use their own crawl data since they already have it.
However, as more and more âAI poweredâ tools come along, you can bet that more of them will disobey or completely ignore the Robots Exclusion Protocols, just as any of the tools that snapshot Google SERPs to track ranking positions do (all search results are blocked to all crawlers).
As for the scales - what if you simply wore more clothing, heavier boots, etc the previous day? Machines, of all kinds, are easily fooled, because they only ever check what they were told to check, and any safety precautions or integrity safeguards all have to be planned out in advance. They donât just notice when something is wrong, unless they were specifically programmed to notice that exact kind of error.