You are not logged in. Your edit will be placed in a queue until it is peer reviewed.
We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.
Required fields*
-
8Why should it matter if we do, when they're feeding the entire network to LLM's live anyway.– Kevin BCommented Jul 9 at 18:14
-
@KevinB Well you can't just prohibit people like that, if you allow LLM usage with the datadump but state the cc by-sa license, it might encourage LLM savvy people to train free (as in freedom) models. In my opinion, if we provide an easier way to do the thing more ethically, people might be drawn to release their models under cc by-sa because they realize there's an easier way to train the models than scraping. Of course, big companies probably won't care...– JohnCommented Jul 10 at 2:40
-
2@KevinB In this case - its that I've gone through the 'proper' process, made requests the right way, and am doing so under the same constraints as downloading them one by one. In theory - I (or some future user) would be an independent backup for data dump, but downloading hundreds of files from hundreds of pages to do things is a bit of a bore. The fact that I've followed processes laid out, and agreed to by the company and gotten nowhere is a bit of a annoyance really.– Journeyman GeekCommented Jul 10 at 3:32
-
3I think the "quiet part out loud" (maybe not so quiet) piece of this is that increased friction was the primary goal. Improving the experience would be a change in course for why it was made that way in the first place... Make no mistake, I'm firmly with you, and I really hope it changes for the better– gating the data dump was a flagrant violation of the reasoning for the dumps in the first place. Even in the most optimistic read, where the company added friction to avoid removing them altogether... the friction is very much the point.– zcoop98Commented Jul 10 at 21:54
-
3Well yes, but that's also why I kept asking from before the official announcement - and was assured by folks it was possible. Its easier for me to do things the 'wrong' way - and either grab it off a unofficial upload (which is likely the best option for the backlog) or run existing tools that automate the process (which have issues due to cloudflare) to grab the full dump . Its as much about keeping assurances made as the actual dumps.– Journeyman GeekCommented Jul 11 at 0:40
-
Mutual trust, you say.– canonCommented Jul 11 at 19:15
-
3Well I see this as not trusting me to turn around and upload it somewhere where I can train an LLM. I'm not having the company live up to promises made when I'm trying to follow their processes. So yeah, mutual trust. The company has issues trusting its userbase, and we have issues trusting the company to live up to its promises.– Journeyman GeekCommented Jul 11 at 23:05
Add a comment
|
How to Edit
- Correct minor typos or mistakes
- Clarify meaning without changing it
- Add related resources or links
- Always respect the author’s intent
- Don’t use edits to reply to the author
How to Format
-
create code fences with backticks ` or tildes ~
```
like so
``` -
add language identifier to highlight code
```python
def function(foo):
print(foo)
``` - put returns between paragraphs
- for linebreak add 2 spaces at end
- _italic_ or **bold**
- indent code by 4 spaces
- backtick escapes
`like _so_`
- quote by placing > at start of line
- to make links (use https whenever possible)
<https://example.com>[example](https://example.com)<a href="https://example.com">example</a>
How to Tag
A tag is a keyword or label that categorizes your question with other, similar questions. Choose one or more (up to 5) tags that will help answerers to find and interpret your question.
- complete the sentence: my question is about...
- use tags that describe things or concepts that are essential, not incidental to your question
- favor using existing popular tags
- read the descriptions that appear below the tag
If your question is primarily about a topic for which you can't find a tag:
- combine multiple words into single-words with hyphens (e.g. stack-overflow), up to a maximum of 35 characters
- creating new tags is a privilege; if you can't yet create a tag you need, then post this question without it, then ask the community to create it for you