all 12 comments

[–]wristaction 3 insightful - 1 fun3 insightful - 0 fun4 insightful - 1 fun -  (0 children)

There was a site called "NewsDiffs" that was shut down almost as fast as it went up.

[–]Site_rly_sux 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (10 children)

It's not 'tried to' - the ia bot is still right there:

https://www.nytimes.com/robots.txt

So why does the NYT not want the IA crawler?

Let's see what theintercept thinks.

They think it's so NYT can do stealth edits without the one particular archive noticing. For evidence, they talk about some stealth edits that fucking everyone noticed.

  1. They edited the tone of an article about Bernie

  2. They removed "death" as one way to get rid of a loan, because sometimes death doesn't discharge the loan

Really?

You fucking pathetic baby snowflakes, you think a major publication is making major changes so that one bot service won't notice changes to "six ways to shed your student debts"

What a pathetic infantile way of looking at the world, exhibited by OP and the Intercept.

Look again at the robots.txt

They also ban the chatgpt bot. And the bot for some crawler called "Omgili"

It's totally infeasible, and totally pathetically paranoid, that OP assumes this is about a cover up, instead of normal web crawler reasons.

Hey maybe they just don't want to render web pages for non-human visitors. That's up to them. For you to assume it's a conspiracy to hide the six ways to lose your student debt is such pathetic paranoid conspiracy bullshit.

OP, there's something seriously wrong with you, if you read and believed the fake news linked here

[–]SueBoyle 2 insightful - 1 fun2 insightful - 0 fun3 insightful - 1 fun -  (1 child)

I have written web crawler software before, though it was small scale for my personal use, my web crawler does not need to respect the instructions that are in the robots.txt file.

So basically whoever's running this crawler could just crawl the New York times website whether the New York times website likes it or not.

There is no police force out there that's going to arrest you for not obeying their instructions in the robots file

In fact there's a twist on this that if you analyze the robots.txt file it might give you clues about where to find sensitive documents that you really want to crawl..

[–]neolib 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

Yeah, archive.today/.is doesn't respect robots.txt for example, unlike archive.org.

[–]JoeyJoeJoe 5 insightful - 3 fun5 insightful - 2 fun6 insightful - 3 fun -  (7 children)

Was about to engage and rebutt but then I saw the user name. Ohhh- it's that guy again.

[–]Site_rly_sux 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (6 children)

Go ahead.

Explain it.

It's a giant conspiracy theory and the proof is: six ways to lose your student loan and the "changing tone" on Bernie

Fuck sakes man. This is peurile.

Explain it, go on

[–]brimshae 4 insightful - 3 fun4 insightful - 2 fun5 insightful - 3 fun -  (5 children)

Explain it

The alternative is trusting the New York Times to be honest.

I rest my case.

[–]Site_rly_sux 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (4 children)

Trusting them with what exactly?

The only statement they've made, that we've seen, is the robots.txt link above. That's a pretty clear statement. Would you like to flesh out how exactly they might be lying within their robots.txt

Are you saying that somehow OP and the Intercept are more trustworthy? Than a technical document ? What

[–]brimshae 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (3 children)

Trusting them with what exactly?

"trusting the New York Times to be honest."

[–]Site_rly_sux 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (2 children)

They literally haven't made any statement on this topic that we have seen other than robots.txt

So you sound like a moron.

Do you believe that their robots.txt is probably a true statement? Are you going to tell me that you have low trust in a machine instruction. You sound stupid

[–]brimshae 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

Someone is angry because he can't understand basic words. It's probably the guy screaming "moron", "stupid", and the like who is still trying to figure out a basic dig at the NYT and their general lack of credibility.

[–]Site_rly_sux 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

There's no matter here on which their credibility is up for debate. They said that certain user agents may not crawl certain areas of their site.

You replied that you didn't believe them.

It IS moronic. Robots dot txt is a machine instruction. There is no sense in which credibility enters the argument. To claim that you don't believe their robots dot txt makes you sound like a retard.