BramVanroy/CommonCrawl-CreativeCommons
Viewer
• Updated
• 739M • 997 • 34
Raw CommonCrawl crawls, annotated with Creative Commons license information
Note Only retaining samples that are also present in FineWeb or FineWeb-2
Note Strong filters, only retaining FineWeb data, removing non-commercial data, removing Wiki data