Monday, October 2, 2023
NEVERFOMOAGAIN
en English▼
X
ar Arabicen Englishfr Frenchde Germanpt Portugueseru Russianes Spanish
  • PASSIVE INCOME
    • How to Earn Cryptocurrencies for free ?
    • Play Games & apps to earn
  • Reviews
  • BLOCKCHAIN ACADEMY
  • TOP 10
  • News
No Result
View All Result
NEVERFOMOAGAIN
NEVERFOMOAGAIN
en English▼
X
ar Arabicen Englishfr Frenchde Germanpt Portugueseru Russianes Spanish
Home CryptoCurrency News

OpenAI Prepares to Unleash Web Crawler to Devour More of the Open Web

Jose Antonio Lanz by Jose Antonio Lanz
August 8, 2023
in CryptoCurrency News
Reading Time: 8 mins read
0
OpenAI Prepares to Unleash Web Crawler to Devour More of the Open Web
74
SHARES
1.2k
VIEWS
Share on FacebookShare on TwitterShare on Reddit


OpenAI has released a new web crawling bot, GPTBot, to expand its dataset for training its next generation of AI systems—and the next iteration apparently has an official name. The company trademarked the term “GPT-5,” hinting at an upcoming release, while giving web publishers a heads up on how to keep their content out of its massive corpus.

The web crawler will collect publicly available data from websites, while avoiding paywalled, sensitive, and prohibited content, according to OpenAI. Similar to other search engines like Google, Bing, and Yandex, however, the system is opt out—by default, GPTBot will assume accessible information is fair game. In order to prevent the OpenAI web crawler from ingesting a website, its owner must add a “disallow” rule to a standard file on the server.

OpenAI ChatGPT in Robots.txt
How to ban OpenAI’s GPTBot. Image: OpenAI

OpenAI also says that GPTBot will preemptively scan scraped data to remove personally identifiable information (PII) and text that violates its policies.

According to some technology ethicists, however, the opt-out approach still raises consent issues.

On Hacker News, some users justified OpenAI’s move by saying that it must gather up everything it can if people want to have a capable generative AI tool in the future. “They still need current data or their GPT models will be stuck at september 2021 forever,” one user said. Another more privacy-conscious user argued that “OpenAI isn’t even citing in moderation. It’s making a derivative work without citing, thus obscuring it.” 

The release of GPTBot follows recent criticism of OpenAI previously scraping data without permission to train Large Language Models (LLMs) like ChatGPT. To address such concerns, the company updated its privacy policies in April.

Meanwhile, a recent trademark application for GPT-5 seems to confirm that OpenAI is training its next model for a future launch. The new system would very likely involve large-scale web scraping to update and expand its training data.

This could represent a shift away from OpenAI’s early emphasis on transparency and AI safety, but it is not surprising considering that ChatGPT is the most used LLM in the world, despite an increasingly crowded and high-powered marketplace. OpenAI’s star product—and that of any LLM—is only as good as the quality of the data used to train it.

OpenAI needs more and newer data, and it needs lots of it.

On the other hand, there is an open-source LLM, assembled by social media giant Meta. The tech behemoth has offered up its model for free, as long as you are not a competitor nor are too large a business. Meta has not disclosed which datasets it used to train its model, and which information it has collected. However, the approach makes it possible for users to fine-tune the model using their own datasets.

Whereas OpenAI relies on all of its crawled data to train its models and to build a profitable ecosystem around its AI tools, Meta is vying to build a profitable business around its data. Thus, Meta not only uses it to create better models, but also shares it with third parties so they can use it.

“We don’t sell your information. Instead, based on the information we have, advertisers and other partners pay us to show you personalized ads,” Meta explains. According to Meta’s standard privacy disclosures, some of the data the company collects includes purchases, browser history, IDs, financial info, contacts and undisclosed sensitive information among others.

Meta Threads Privacy Information
Some of the data collected by Meta from users of its Thread App. Image: Meta

ChatGPT now draws over 1.5 billion monthly active users. And Microsoft’s $10 billion investment into OpenAI appears prescient, as ChatGPT integration has boosted Bing’s capabilities.

You might also like

Bitcoin Hits 2-Month High as $100M in Short Positions Liquidated Around the Market

SBF Can’t Blame FTX Attorneys At the Start Of His Trial, Says Judge

For Mark Zuckerberg, Advances in AI Still Lead Back to the Metaverse

For now, OpenAI leads the red-hot AI space, with tech giants racing to catch up. The company’s new web crawler may further advance its models’ abilities. But expanding internet data collection also raises ethical questions around copyright and consent.

As AI systems grow more sophisticated, balancing transparency, ethics and capabilities will remain a complex balancing act.

Stay on top of crypto news, get daily updates in your inbox.



Source link

Share30Tweet19Share
Jose Antonio Lanz

Jose Antonio Lanz

Recommended For You

Bitcoin Hits 2-Month High as $100M in Short Positions Liquidated Around the Market

by Nivesh Rustgi
October 2, 2023
0
Bitcoin Touches $31,000 as BlackRock Rally Continues

Bitcoin soared over $28,000 for the first time since August 17 in a surprising move Sunday evening.BTC swiftly added $800 in a sudden spike around 6:30 pm ET,...

Read more

SBF Can’t Blame FTX Attorneys At the Start Of His Trial, Says Judge

by Nicholas Morgan
October 2, 2023
0
SBF Can't Blame FTX Attorneys At the Start Of His Trial, Says Judge

Sam Bankman-Fried and his legal team cannot blame FTX company lawyers for allegedly reviewing and approving decisions that he made as CEO, according to a weekend ruling by...

Read more

For Mark Zuckerberg, Advances in AI Still Lead Back to the Metaverse

by Nicholas Morgan
October 1, 2023
0
For Mark Zuckerberg, Advances in AI Still Lead Back to the Metaverse

Meta’s ambitions in leading the virtual and augmented reality future has not ebbed, and CEO Mark Zuckerberg sees cutting-edge artificial intelligence technology as a compelling way to make...

Read more

This Week on Crypto Twitter: Binance’s CZ and the SEC’s Gary Gensler Face Stiff Criticism

by Tim Hakki
October 1, 2023
0
This Week on Crypto Twitter: Binance’s CZ and the SEC’s Gary Gensler Face Stiff Criticism

Illustration by Mitchell Preffer for DecryptOnce again, U.S. regulators offered very little to the international crypto community to chew on this week. VanEck’s SEC-approved Ethereum futures ETF was...

Read more

Sam Bankman-Fried Has Few Pathways to Acquittal, Say Legal Experts

by Nicholas Morgan
October 1, 2023
0
New York Times Pushes Back Against Sam Bankman-Fried Gag Order

Nearly a year after cryptocurrency exchange FTX went down in flames, the trial of its disgraced founder Sam Bankman-Fried is set to begin in a Manhattan federal court...

Read more
Next Post
Amazon Won't Remove Books Listed Under a Real Author's Name But Allegedly Written With AI

Amazon Won't Remove Books Listed Under a Real Author's Name But Allegedly Written With AI

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

seventeen − two =

Support Us.

Donate

  • Donate withMetaMask
  • Donate With MetaMask

  • Donate withNano
  • Donate Nano

    Scan to Donate Nano to nano_38oxm7kwnysjeyz1mdcp9d5rrq55wyox3gm9ejeed3uhdieurwe4r3k39ntt

Cloud

#Avoid Crypto Scam #Banano #BAT #Bitcoin #Brave Browser #Coinbase #Coinbase Earn #CoinMarketCap #CoinMarketCap Earn #Counter-Strike: Global Offensive #Crypto App #Cryptocurrency Faucet #Cryptocurrency glossary #Cryptocurrency scam #Crypto redflags #CryptoRoyale #Crypto scam #Cryptos Wallet #Do Your Own Research #DYOR #DYOR Checklist #Earn Cryptocurrencies #Earning while browsing #Earn NFT #Folding@Home #Free cryptocurrencies #Free NFT #Hi Dollar #Just cause 2 #Learn Crypto #LIKE #Low-cap cryptocurrencies #NANO #NFT #PERP #Play to earn #PRE #Princeton University #Redflags #Review #ROY #Top 10 #URUS #xMOON #XMS
NEVERFOMOAGAIN

© 2021 By NEVERFOMOAGAIN - All rights reserved.

Navigate Site

  • Best Play to Earn Crypto games and Apps
  • Contact Us
  • Content licensing
  • Cryptocurrency News
  • Cryptocurrency Rankings
  • Home
  • How to Earn Cryptocurrencies for free ?
  • How to Learn about Crypto and Blockchain ?
  • Legal Information.
  • Privacy policy
  • Reviews
  • Terms & Conditions

Follow Us

No Result
View All Result
  • PASSIVE INCOME
    • How to Earn Cryptocurrencies for free ?
    • Play Games & apps to earn
  • Reviews
  • BLOCKCHAIN ACADEMY
  • TOP 10
  • News

© 2021 By NEVERFOMOAGAIN - All rights reserved.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version