[질문] Berkeley DB의 sync() 또는 close() 관련하여 100만건 이상 데이터 처리를 해보신분 계신가요?

글쓴이: iknights / 작성시간: 월, 2011/04/18 - 2:19오후

안녕하세요.
Berkeley DB 관련 데이터 처리에 문제가 있어 질문 드립니다.

Berkeley DB를 사용하여 데이터 삽입하는 과정중 데이터 레코드가 증가하면 할수록
sync 또는 close 시 엄청 느리게 처리되는 현상에 대해 혹시 이러한 일을 겪어보신분 계신가요?

환경은
1. env 사용, flag = DB_CREATE | DB_INIT_MPOOL | DB_THREAD
2. 트랜잭션은 사용안함.
3. 데이터베이스는 BTREE 구조로 생성 및 오픈

위와 같은 env파일을 기준으로 test.db에 저장할때
1. test.db에 데이터를 저장하는 A모듈 (초당 10만건)
2. test.db를 env를 사용하여 RD_ONLY로 데이터를 검색

위와 같은 환경에서
A 모듈에서 데이터를 저장하고 env를 통해 갱신된 정보로 sync 또는 close 메소드를 호출해야
B 모듈에서 env 정보를 보고 갱신(신규추가) 데이터를 읽어오는거 같습니다.
문제는 데이터양이 10만 -> 20만 -> 30만 점점 데이터가 많아질수록 sync|close 호출 시간이 점점 늘어나 처리속도가 현저히 느려집니다.

env의 cache_size 및 pagesize를 수정해 보았지만 데이터가 많이질수록 처리속도가 느려지는 문제는 여전합니다.

Berkeley DB를 사용하신분들께 조언 부탁드립니다.

Forums:

프로그래밍 QnA

댓글 달기

참고했던 예제입니다.

글쓴이: shint / 작성시간: 월, 2011/04/18 - 5:09오후

올리고보니 제가 잘못올린거 같은 기분이 드는데요.
참고용으로 생각해주세요.

주의
* DB_DBT_USERMEM
* sync() 또는 close() 무관

http://blog.naver.com/daroobil/42908823

#include "db_cxx.h"
#include <string>
 
using namespace std;
 
//----------------------------------------------------------------------------------------
//TIMER
//----------------------------------------------------------------------------------------
//#define CHECK_TIME_START  __int64 freq, start, end; if (QueryPerformanceFrequency((_LARGE_INTEGER*)&freq))  {QueryPerformanceCounter((_LARGE_INTEGER*)&start); 
//#define CHECK_TIME_END(a,b) QueryPerformanceCounter((_LARGE_INTEGER*)&end);  a=(float)((double)(end - start)/freq*1000); b=TRUE;	}else{ b=FALSE;	}
//LARGE_INTEGER g_ticksPerSecond;
//LARGE_INTEGER g_t_start, g_t_end, g_t_diff;
 
 
#include <windows.h>
//http://kin.naver.com/qna/detail.nhn?d1id=1&dirId=1040101&docId=68235573&qb=dmMg7KCV67CA7Iuc6rCE&enc=utf8&section=kin&rank=1&sort=0&spq=0&pid=f1kLqg331zZssv2GafGssv--017557&sid=S1bM8ELGVksAAHS0DGA
double			timeSecond;
LARGE_INTEGER	ticksPerSecond;
LARGE_INTEGER	t_start, t_end, t_diff;  
 
void Fn_start_timer()
{
	QueryPerformanceCounter(&t_start);
}
 
double Fn_end_timer()
{
	QueryPerformanceCounter(&t_end);
	QueryPerformanceFrequency(&ticksPerSecond);
 
	t_diff.QuadPart = t_end.QuadPart- t_start.QuadPart;
	timeSecond = (double)(double)t_diff.QuadPart/(double)ticksPerSecond.QuadPart;
 
	printf ("Elapsed CPU time:   %3.12f  sec / ticks: %I64Ld\n", timeSecond, t_diff.QuadPart);
	return timeSecond;
}
 
 
//env.BeginTxn(&txn);
//db.Get(&txn, &key, &data, flag);
//db.Put(&txn, &key, &data, flag);
//db.Erase(&txn, &key, flag);
//txn.Commit();
 
Db db_(NULL, 0);
 
void Open(bool isSecondary)
{
	char dbName[]="test.db";
	db_.open(NULL, dbName, NULL, DB_BTREE, DB_CREATE, 0);
}
 
void Close()
{
	db_.close(0);
}
 
void put()
{
	int nKey	= 1;
	int nValue	= 100;
 
	Dbt key	(&nKey,		sizeof(int));
	Dbt data(&nValue,	sizeof(int));
	db_.put	(NULL, &key, &data, 0);
}
 
void get()
{
	int nKey	= 1;
	Dbt key(&nKey, sizeof(int));
 
	int nValue	= 0;
	Dbt data;
	data.set_data(&nValue);
	data.set_ulen(sizeof(int));
	data.set_flags(DB_DBT_USERMEM);
 
	db_.get(NULL, &key, &data, 0);
//s	printf("%d\n",nValue);
}
 
void main()
{
	Open(false);
//	put();
//	get();
 
	const int max_count = 100000;
	{{
		printf("DB PUT : START\n");
		Fn_start_timer();
		for(int i=0; i<max_count; i++)
		{
			int	 nKey			= i;
			char caValue[100]	= "value!";
			Dbt key	(&nKey,		sizeof(nKey));
			Dbt data(&caValue,	sizeof(caValue));
			db_.put	(NULL, &key, &data, 0);
		}
		Fn_end_timer();
		printf("DB PUT : END\n\n");
	}}
 
	{{
		printf("DB GET : START\n");
		Fn_start_timer();
		for(int i=0; i<max_count; i++)
		{
			int	 nKey			= i;
			char caValue[100]	= {'\0',};
 
			Dbt key	(&nKey,		sizeof(nKey));
 
			Dbt data;
			data.set_data(&caValue);
			data.set_ulen(sizeof(caValue));
			data.set_flags(DB_DBT_USERMEM);
 
			db_.get(NULL, &key, &data, 0);
		//s	printf("%s\n",caValue);
		}
		Fn_end_timer();
		printf("DB GET : END\n\n");
	}}
 
	{{
		printf("DB DELETE : START\n");
		Fn_start_timer();
		for(int i=0; i<max_count; i++)
		{
			int	 nKey			= i;
			Dbt key	(&nKey,		sizeof(nKey));
			db_.del(NULL, &key, 0);
		//	get		(DbTxn *txnid, Dbt *key, Dbt *data, u_int32_t flags);
		//	del		(DbTxn *txnid, Dbt *key, u_int32_t flags);
		//	open	(DbTxn *txnid, const char *, const char *subname, DBTYPE, u_int32_t, int);
		//	remove	(const char *, const char *, u_int32_t);
		}
		Fn_end_timer();
		printf("DB DELETE : END\n\n");
	}}
	Close();
}

----------------------------------------------------------------------------
젊음'은 모든것을 가능하게 만든다.

매일 1억명이 사용하는 프로그램을 함께 만들어보고 싶습니다.
정규 근로 시간을 지키는. 야근 없는 회사와 거래합니다.

각 분야별. 좋은 책'이나 사이트' 블로그' 링크 소개 받습니다. shintx@naver.com

답글

답변 감사드립니다.

글쓴이: iknights / 작성시간: 목, 2011/04/21 - 10:21오전

안녕하세요.
요즘 정신이 없어서 이제야 답글을 확인했네요.. ^^;
답변 감사드립니다.

put 하는 도중 db_stat -d test.db로 상태 정보를 확인해 보았는데
역시 sync 또는 close 하기 이전까지 다른 프로세스에서는 정보가 보여지질 않네요..

db_stat -d test.db 로 정보를 확인해 보니

Thu Apr 21 10:15:18 2011 Local time
...
0 Number of unique keys in the tree
0 Number of data items in the tree
...

읽는쪽에서는 0으로 보여지네요...
같은 프로세스내부의 DB객체를 가지고 쓰면 문제가 없을듯 싶은데
서로 다른 프로세스에서 같은 DB를 바라보니 문제가 되네요 ^^;

좀 더 찾아봐야 겠군요.. 답변 감사드리고 혹시 더 유용한 정보있으시면 조언 부탁드립니다.
오늘도 행복한 하루 되세요:)

브이~

답글

건수가 많으면 BTREE 말고 HASH를 써야하지 않을까요?

글쓴이: lnsium / 작성시간: 목, 2011/05/26 - 2:49오전

매뉴얼에 이렇게 나오네요..

BTree if your keys have some locality of reference. That is, if they sort well and you can expect that a query for a given key will likely be followed by a query for one of its neighbors.

Hash if your dataset is extremely large. For any given access method, DB must maintain a certain amount of internal information. However, the amount of information that DB must maintain for BTree is much greater than for Hash. The result is that as your dataset grows, this internal information can dominate the cache to the point where there is relatively little space left for application data. As a result, BTree can be forced to perform disk I/O much more frequently than would Hash given the same amount of data.

Moreover, if your dataset becomes so large that DB will almost certainly have to perform disk I/O to satisfy a random request, then Hash will definitely out perform BTree because it has fewer internal records to search through than does BTree.

답글

답변 감사드립니다.

글쓴이: iknights / 작성시간: 목, 2011/05/26 - 9:46오전

초반 적은 데이터를 가지고 BTREE로 테스트를 진행하다보니 대용량에 대한 처리에 대한 테스트가 부족했던거 같네요.
HASH 관련 성능 테스트도 진행해 봐야겠군요.

답변 감사드립니다.

브이~

답글

댓글 달기

이름

제목

댓글 *

텍스트 포맷에 대한 자세한 정보

텍스트 양식

Filtered HTML

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

BBCode

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param>
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

Textile

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
You can use Textile markup to format text.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Markdown

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
Quick Tips:
- Two or more spaces at a line's end = Line break
- Double returns = Paragraph
- *Single asterisks* or _single underscores_ = Emphasis
- **Double** or __double__ = Strong
- This is [a link](http://the.link.example.com "The optional title text")
For complete details on the Markdown syntax, see the Markdown documentation and Markdown Extra documentation for tables, footnotes, and more.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Plain text

HTML 태그를 사용할 수 없습니다.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
줄과 단락은 자동으로 분리됩니다.

CAPTCHA

이것은 자동으로 스팸을 올리는 것을 막기 위해서 제공됩니다.

부 메뉴

[질문] Berkeley DB의 sync() 또는 close() 관련하여 100만건 이상 데이터 처리를 해보신분 계신가요?

참고했던 예제입니다.

답변 감사드립니다.

건수가 많으면 BTREE 말고 HASH를 써야하지 않을까요?

답변 감사드립니다.

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

둘러보기

부 메뉴

현재 위치

[질문] Berkeley DB의 sync() 또는 close() 관련하여 100만건 이상 데이터 처리를 해보신분 계신가요?

참고했던 예제입니다.

답변 감사드립니다.

건수가 많으면 BTREE 말고 HASH를 써야하지 않을까요?

답변 감사드립니다.

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

검색 폼

둘러보기

사용자 로그인

Oauth2 Login :