DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Hole in law (2020) Watch online Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 04:17
1467 views
Google Pixel Buds Pro 2: $40 off at Amazon
SAVE 17%:As of May 9, you can get the Google Pixel Buds Pro 2 for $189, down from $229, at Amazon. T
Read More
2025-06-26 03:55
1526 views
Play this hilarious voice
If you thought Flappy Birdwas frustratingly addictive, you're going to love this voice-activated ver
Read More
2025-06-26 03:03
2383 views
Clouded leopard cub is not only beyond cute, but a conservation breakthrough
The Nashville Zoo was overcome with joy at the birth of a male clouded leopard on Wednesday. But it'
Read More