Web Content Extracting

Libraries for extracting web contents.

newspaper14.3K

News extraction, article extraction and content curation in Python.

requests-html13.8K

Pythonic HTML Parsing for Humans.

textract4K

Extract text from any document, Word, PowerPoint, PDFs, etc.

sumy3.5K

A module for automatic summarization of text documents and HTML pages.

toapi3.5K

Every web site provides APIs.

python-readability2.7K

Fast Python port of arc90's readability tool.

html2text1.9K

Convert HTML to Markdown-formatted text.

micawber640

A small library for extracting rich content from URLs.

lassie616

Web Content Retrieval for Humans.