파이썬 병렬화 질문 드립니다

글쓴이: cldnjs2085 / 작성시간: 목, 2017/11/09 - 5:17오후

파이썬으로 이미지를 다운로드하려고 합니다. 그런데 밑에 이미지를 다운하는 함수 download_album_art()가 너무 오래걸리는거같아서 시간을 재어보니 약 40초정도 소요되더군요. 병렬화를 사용할수 있을것같은데 어떻게 하는지를 모르겠네요. 병렬화의 방법과 혹시 안된다면 속도를 향상시킬만한 방법을 알려주세요ㅜㅜ

import os
import shutil
import requests
from bs4 import BeautifulSoup
from urllib import request
 
 
URL = 'https://music.bugs.co.kr/chart/track/day/total'
PATH = os.getcwd() + '/static/images/'
 
 
# Scrapping html code
def get_html(target_url):
    _html = ""
    response = requests.get(target_url)
    if response.status_code == 200:
        _html = response.text
    return _html
 
 
# parse image url and save in list
def get_image_url():
    html = get_html(URL)
    soup = BeautifulSoup(html, 'html.parser')
    img_url = []
 
    for image in soup.select('a.thumbnail > img'):
        if image.has_attr('src'):
            img_url.append(image.get('src'))
        else:
            continue
    return img_url
 
 
# download album art in static/images directory
def download_album_arts():
    images = get_image_url()
    for i in range(0, 100):
        url = images[i]
        file_name = PATH + str(i + 1) + '.png'
        request.urlretrieve(url, file_name)
 
 
# delete all album art
def delete_album_art():
    path = os.getcwd() + '/static/images'
    if os.path.exists(path):
        shutil.rmtree(path)
        os.mkdir(path)
    else:
        os.mkdir(path)

Forums:

프로그래밍 QnA

댓글 달기

직접병렬화 하지 않아도 됩니당

글쓴이: choi101104 / 작성시간: 목, 2017/11/16 - 6:49오후

multiprocessing 모듈을 이용해서 해도 되지만, joblib이라는 스레드 기반 병렬화 라이브러리가 있습니다. 제가 고쳐서 실험했을때는 11초 -> 5초 정도로 실행시간이 줄더군요 ㅎㅎ..

import os
import shutil
import requests
from bs4 import BeautifulSoup
from urllib import request
from joblib import Parallel, delayed
URL = 'https://music.bugs.co.kr/chart/track/day/total'
PATH = os.getcwd() + '/static/images/'
 
 
# Scrapping html code
def get_html(target_url):
    _html = ""
    response = requests.get(target_url)
    if response.status_code == 200:
        _html = response.text
    return _html
 
 
# parse image url and save in list
def get_image_url():
    html = get_html(URL)
    soup = BeautifulSoup(html, 'html.parser')
    img_url = []
 
    for image in soup.select('a.thumbnail > img'):
        if image.has_attr('src'):
            img_url.append(image.get('src'))
        else:
            continue
    return img_url
 
 
# download album art in static/images directory
def download_album_arts(index, image):
    url = image
    file_name = PATH + str(index) + '.png'
    request.urlretrieve(url, file_name)
 
 
# delete all album art
def delete_album_art():
    path = os.getcwd() + '/static/images'
    if os.path.exists(path):
        shutil.rmtree(path)
        os.mkdir(path)
    else:
        os.mkdir(path)
 
 
if __name__ == "__main__":
 
    Parallel(n_jobs=-1)(delayed(download_album_arts)(index, image) for index, image in enumerate(get_image_url()))

댓글 첨부 파일:

첨부	파일 크기
bf.PNG	3.55 KB
캡처af.PNG	3.31 KB

답글

댓글 달기

이름

제목

댓글 *

텍스트 포맷에 대한 자세한 정보

텍스트 양식

Filtered HTML

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

BBCode

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param>
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

Textile

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
You can use Textile markup to format text.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Markdown

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
Quick Tips:
- Two or more spaces at a line's end = Line break
- Double returns = Paragraph
- *Single asterisks* or _single underscores_ = Emphasis
- **Double** or __double__ = Strong
- This is [a link](http://the.link.example.com "The optional title text")
For complete details on the Markdown syntax, see the Markdown documentation and Markdown Extra documentation for tables, footnotes, and more.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Plain text

HTML 태그를 사용할 수 없습니다.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
줄과 단락은 자동으로 분리됩니다.

CAPTCHA

이것은 자동으로 스팸을 올리는 것을 막기 위해서 제공됩니다.

부 메뉴

파이썬 병렬화 질문 드립니다

직접병렬화 하지 않아도 됩니당

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

둘러보기

부 메뉴

현재 위치

파이썬 병렬화 질문 드립니다

직접병렬화 하지 않아도 됩니당

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

검색 폼

둘러보기

사용자 로그인

Oauth2 Login :