파이썬 할 줄 아시는분 계시나요?

글쓴이: lhm7877 / 작성시간: 목, 2015/12/24 - 2:21오전

http://newnovel.aks.ac.kr/Search?keyword=%EC%84%9C%EC%9A%B8&page=1
이 사이트에서 검색 결과 테이블을 파싱해서 csv에
서울01㉡nm 셔울도 셔울도 녀 교육이 흥왕엿다 못니가 더구나 시골 녀인이야 말 것 잇나 명월정_053
이렇게 집어넣으려고 합니다. (명월정_093)은 명월정,093으로 분리
저 사이트에 모든 검색 결과를 csv에 집어넣기 위해서 첫번째 검색시 나오는 단어들을 키워드로 다시 검색하는 방식입니다.

밑에 소스는 파싱해서 출력하는 소스인데 csv에 저장하는 소스는 아닙니다.
저 소스에 몇줄 추가해서 csv에 저장하도록 만들고 싶은데
def process_page(keyword): 함수 안이나
def main() 함수 안에 아래와 같은 소스를 추가하면 된다 생각했는데 그렇진 않네요.
어떤 방식으로 해야 하는거죠?

td 값들을 csv에 저장하려고 했는데 그게 생각대로 안되네요

오류는 AttributeError: 'ResultSet' object has no attribute 'find_all' 이렇게 뜹니다.

import csv
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlencode
import re
 
# Global Variables
usages = dict()
userinfo = {
    "keyword": None,
    "total_count": 0,
    "current_page": 0,
    "current_position": 0
}
 
def get_total_count(html):
    pat = re.compile(r'([\d,]+)(?=\s*</b> 항목이 검색되었습니다)')
    t = pat.findall(html)[0]
    return int(t.replace(',', ''))
 
 
def make_page_url(keyword, page=1):
    base_url = "http://newnovel.aks.ac.kr/Search?"
    params = {
        "keyword": keyword,
        "page": page
    }
    data = urlencode(params)
    return base_url + data
 
 
def process_page(keyword):
 
    result = {}
 
    if keyword not in usages:
        usages[keyword] = False
    elif usages[keyword] is True:
        return None
 
    if userinfo['keyword'] is None:
        userinfo['keyword'] = keyword
        userinfo['current_page'] = 0
        userinfo['total_count'] = 0 
        userinfo['current_position'] = 0 
 
    if userinfo['keyword'] == keyword:
        userinfo['current_page'] += 1
 
    page = userinfo['current_page']
    html = requests.get(make_page_url(keyword, page)).text
 
    if page == 1:
        userinfo['total_count'] = get_total_count(html)
 
    bs = BeautifulSoup(html, 'html.parser')
 
    info_table = bs.find('table', {"class":"table table-striped table-hover oldkorean"})
    rows = info_table.tbody.find_all('tr')                     .
    for row in rows:
        userinfo['current_position'] += 1
        words = row.find_all('td')[2]
        keywords = words.text.split(' ')
        for kwd in keywords:
            if kwd not in usages:
                usages[kwd] = False
        book, num = row.find_all('td')[-1].text.split('_')
        # num = int(num)
        #write = rows.find_all('td')                               이부분 주석이 제가 추가한 함수입니다
        #f = open('parsing2.csv', 'wb') #혹은 ab
        #csvwriter = csv.writer(f)
        #csvwriter.writerow(write)
        #print(tds)
        result[(book, num)] = []
        result[(book, num)] += keywords
 
    if userinfo['current_position'] == userinfo['total_count']:
        userinfo['keyword'] = None
 
    return result
 
def main():
    ckwd = '서울'
    while True:
        r = process_page(ckwd)
        print(r)
        if userinfo['keyword'] is None:
            cands = [x[0] for x in usages.items() if x[1] is False]
            if not cands:
                break
            ckwd = cands[0]
 
main()

Forums:

프로그래밍 QnA

댓글 달기

일단. 허가를 받으시고. 다운받으셔야 할겁니다.

글쓴이: shint / 작성시간: 목, 2015/12/24 - 1:25오후

무단 도용. 저작권. 디지털 복제권. 등등... 곤란할 수 있으니까요.
더군다나. 프로그램은 사용범위가 커서. 주의해야 합니다. 이미. wget과 ftp. 웹브라우저.가 있기는 하지만요.

가장 좋은 방법은 OpenAPI나 RSS 를 XML로 지원해주실 수 있는지. 요청하는걸겁니다.

여기서 테스트해보니. 다운은 받아지지만. 인덱스범위에서 오류메시지가 나오네요.
https://www.python.org/

t = pat.findall(html)[0]
IndexError: list index out of range

긴 문장을 붙여넣기로 사용하시려면.
%cpaste 라고 입력하고. 엔터.
붙여넣기. 완료후.
-- 입력후 엔터.

주소부분을 따올수 있는 그런 함수가 필요' -- 주소만 얻는 방법입니다.
http://kldp.org/node/153979

php로 html문서 특정부분 추출하기
http://kin.naver.com/qna/detail.nhn?d1id=1&dirId=1040203&docId=67531414&qb=cHl0aG9uIGh0bWwg66y47J6Q7Je0&enc=utf8&section=kin&rank=4&search_sort=0&spq=0

[H09] Python을 이용한 웹사이트 자동 문자열 추출
http://dibolsm.blog.me/70141238818

로컬 파일에서 문자열을 분리하는 방법을 파일로 첨부합니다.

댓글 첨부 파일:

첨부	파일 크기
test 파이썬 Python 으로 HTML에서 HREF의 URL 주소와 값 분리하는 방법.zip	5.55 KB

----------------------------------------------------------------------------
젊음'은 모든것을 가능하게 만든다.

매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
정규 근로 시간을 지키는. 야근 없는 회사와 거래합니다.

각 분야별. 좋은 책'이나 사이트' 블로그' 링크 소개 받습니다. shintx@naver.com

답글

댓글 달기

이름

제목

댓글 *

텍스트 포맷에 대한 자세한 정보

텍스트 양식

Filtered HTML

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

BBCode

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param>
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

Textile

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
You can use Textile markup to format text.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Markdown

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
Quick Tips:
- Two or more spaces at a line's end = Line break
- Double returns = Paragraph
- *Single asterisks* or _single underscores_ = Emphasis
- **Double** or __double__ = Strong
- This is [a link](http://the.link.example.com "The optional title text")
For complete details on the Markdown syntax, see the Markdown documentation and Markdown Extra documentation for tables, footnotes, and more.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Plain text

HTML 태그를 사용할 수 없습니다.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
줄과 단락은 자동으로 분리됩니다.

CAPTCHA

이것은 자동으로 스팸을 올리는 것을 막기 위해서 제공됩니다.

부 메뉴

파이썬 할 줄 아시는분 계시나요?

일단. 허가를 받으시고. 다운받으셔야 할겁니다.

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

둘러보기

부 메뉴

현재 위치

파이썬 할 줄 아시는분 계시나요?

일단. 허가를 받으시고. 다운받으셔야 할겁니다.

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

검색 폼

둘러보기

사용자 로그인

Oauth2 Login :