授人以渔,Python二十四线程下载Python书籍资料!

2019-10-04 12:02栏目:编程学习

近段时间,笔者发现一个神奇的网站: ,该网站提供了大量免费的编程方面的电子书,是技术爱好者们的福音。其页面如下:

numpy自带有np.squeeze()
tensorflow自带有tf.squeeze()
其反向操作 增加维度是:tf.expand_dims(input, axis=None, name=None, dim=None)

 号:960410445 群里有志同道合的小伙伴,互帮互助, 群里有不错的视频学习教程和PDF!

tf.squeeze(input, axis=None, name=None, squeeze_dims=None)

Removes dimensions of size 1 from the shape of a tensor.

Given a tensor

input, this operation returns a tensor of the same type with all dimensions of size 1 removed. If you don't want to remove all size 1 dimensions, you can remove specific size 1 dimensions by specifying

axis.

For example:

# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
shape(squeeze(t)) ==> [2, 3]
          ```

Or, to remove specific size 1 dimensions:

```prettyprint
# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
shape(squeeze(t, [2, 4])) ==> [1, 2, 3, 1]

<a id="args_15" style="box-sizing: border-box; margin: 0px; padding: 0px; border: none; outline: 0px; text-decoration: none; color: rgb(246, 145, 30); font-family: Roboto, Helvetica, sans-serif; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></a>

<center style="margin: 0px; padding: 0px; color: rgb; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-内存: rgb(255, 255, 255);">

Args:
  • input: A

    Tensor. The

    input

    to squeeze.

  • axis: An optional list of

    ints. Defaults to

    []. If specified, only squeezes the dimensions listed. The dimension index starts at 0. It is an error to squeeze a dimension that is not 1.

  • name: A name for the operation (optional).

  • squeeze_dims: Deprecated keyword argument that is now axis.

<a id="returns_15" style="box-sizing: border-box; margin: 0px; padding: 0px; border: none; outline: 0px; text-decoration: none; color: rgb(246, 145, 30); font-family: Roboto, Helvetica, sans-serif; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></a>

图片 1image

Returns:

A

Tensor. Has the same type as

input. Contains the same data as

input, but has one or more dimensions of size 1 removed.

<a id="raises_7" style="box-sizing: border-box; margin: 0px; padding: 0px; border: none; outline: 0px; text-decoration: none; color: rgb(246, 145, 30); font-family: Roboto, Helvetica, sans-serif; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 300; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></a>

</center>

Raises:
  • ValueError: When both

    squeeze_dims

    and

    axis

    are specified.

那么我们是否可以通过Python来制作爬虫来帮助我们实现自动下载这些电子书呢?答案是yes. 笔者在空闲时间写了一个爬虫,主要利用urllib.request.urlretrieve()函数和多线程来下载这些电子书。 首先呢,笔者的想法是先将这些电子书的下载链接网址储存到本地的txt文件中,便于永久使用。其Python代码(Ebooks_spider.py)如下, 该代码仅下载第一页的10本电子书作为示例:

# -*- coding:utf-8 -*-# 本爬虫用来下载http://www.allitebooks.com/中的电子书# 本爬虫将需要下载的书的链接写入txt文件,便于永久使用# 网站http://www.allitebooks.com/提供编程方面的电子书# 导入必要的模块import urllib.requestfrom bs4 import BeautifulSoup# 获取网页的源代码def get_content: html = urllib.request.urlopen content = html.read().decode html.close() return content# 将762个网页的网址储存在list中base_url = 'http://www.allitebooks.com/'urls = [base_url]for i in range: urls.append(base_url   'page/%d/' % i)# 电子书列表,每一个元素储存每本书的下载地址和书名book_list =[]# 控制urls的数量,避免书下载过多导致空间不够!!!# 本例只下载前3页的电子书作为演示# 读者可以通过修改url[:3]中的数字,爬取自己想要的网页书,最大值为762for url in urls[:1]: try: # 获取每一页书的链接 content = get_content soup = BeautifulSoup(content, 'lxml') book_links = soup.find_all('div', class_="entry-thumbnail hover-thumb") book_links = [item[0]['href'] for item in book_links] print('nGet page %d successfully!' % (urls.index) except Exception: book_links = [] print('nGet page %d failed!' % (urls.index) # 如果每一页书的链接获取成功 if len(book_links): for book_link in book_links: # 下载每一页中的电子书 try: content = get_content(book_link) soup = BeautifulSoup(content, 'lxml') # 获取每本书的下载网址 link = soup.find('span', class_='download-links') book_url = link[0]['href'] # 如果书的下载链接获取成功 if book_url: # 获取书名 book_name = book_url.split[-1] print('Getting book: %s' % book_name) book_list.append except Exception as e: print('Get page %d Book %d failed' % (urls.index   1, book_links.index(book_link)))# 文件夹directory = 'E:\Ebooks\'# 将书名和链接写入txt文件中,便于永久使用with open(directory 'book.txt', 'w') as f: for item in book_list: f.write 'n')print('写入txt文件完毕!')

可以看到,上述代码主要爬取的是静态页面,因此效率非常高!运行该程序,显示结果如下:

<center style="margin: 0px; padding: 0px; color: rgb; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">

图片 2image

</center>

在book.txt文件中储存了这10本电子书的下载地址,如下:

<center style="margin: 0px; padding: 0px; color: rgb; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">

图片 3image

</center>

接着我们再读取这些下载链接,用urllib.request.urlretrieve()函数和多线程来下载这些电子书。其Python代码(download_ebook.py)如下:

# -*- coding:utf-8 -*-# 本爬虫读取已写入txt文件中的电子书的链接,并用多线程下载import timefrom concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETEDimport urllib.request# 利用urllib.request.urlretrieve()下载PDF文件def download: # 书名 book_name = 'E:\Ebooks\' url.split[-1] print('Downloading book: %s'%book_name) # 开始下载 urllib.request.urlretrieve(url, book_name) print('Finish downloading book: %s'%book_name) #完成下载def main(): start_time = time.time() # 开始时间 file_path = 'E:\Ebooks\book.txt' # txt文件路径 # 读取txt文件内容,即电子书的链接 with open(file_path, 'r') as f: urls = f.readlines() urls = [_.strip() for _ in urls] # 利用Python的多线程进行电子书下载 # 多线程完成后,进入后面的操作 executor = ThreadPoolExecutor) future_tasks = [executor.submit(download, url) for url in urls] wait(future_tasks, return_when=ALL_COMPLETED) # 统计所用时间 end_time = time.time() print('Total cost time:%s'%(end_time - start_time))main()

运行上述代码,结果如下:

<center style="margin: 0px; padding: 0px; color: rgb; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">

图片 4image

</center>

再去文件夹中查看文件:

<center style="margin: 0px; padding: 0px; color: rgb; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);">

图片 5image

</center>

可以看到这10本书都已成功下载,总共用时327秒,每本书的平均下载时间为32.7,约半分钟,而这些书的大小为87.7MB,可见效率相当高的! 怎么样,看到爬虫能做这些多有意思的事情,不知此刻的你有没有心动呢?心动不如行动,至理名言~~ 本次代码已上传github, 地址为: .

版权声明:本文由威尼斯人app发布于编程学习,转载请注明出处:授人以渔,Python二十四线程下载Python书籍资料!