gcc intrinsic 이라는데 이거 써보신분?

shkim의 이미지

리눅스에서 코딩하다가 Interlocked 시리즈 함수가 없어서 무척 불편했는데

인터넷에서 왠만한 Interlocked 시리즈 함수가 구현된 소스를 찾았습니다.

뭐 잘 되니까 올린거긴 할텐데 고수님이 한번 살펴봐 주세요..

http://svn.assembla.com/svn/iLog/intrin_x86.h

그리고 아래는 혹시 위 URL 지워질까봐 COPY&PASTE ...
--------------------------------------------------------------------------------

/*
	Compatibility <intrin_x86.h> header for GCC -- GCC equivalents of intrinsic
	Microsoft Visual C++ functions. Originally developed for the ReactOS
	(<http://www.reactos.org/>) and TinyKrnl (<http://www.tinykrnl.org/>)
	projects.
 
	Copyright (c) 2006 KJK::Hyperion <hackbunny@reactos.com>
 
	Permission is hereby granted, free of charge, to any person obtaining a
	copy of this software and associated documentation files (the "Software"),
	to deal in the Software without restriction, including without limitation
	the rights to use, copy, modify, merge, publish, distribute, sublicense,
	and/or sell copies of the Software, and to permit persons to whom the
	Software is furnished to do so, subject to the following conditions:
 
	The above copyright notice and this permission notice shall be included in
	all copies or substantial portions of the Software.
 
	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
	FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
	DEALINGS IN THE SOFTWARE.
*/
 
#ifndef KJK_INTRIN_X86_H_
#define KJK_INTRIN_X86_H_
 
/*
	FIXME: review all "memory" clobbers, add/remove to match Visual C++
	behavior: some "obvious" memory barriers are not present in the Visual C++
	implementation - e.g. __stosX; on the other hand, some memory barriers that
	*are* present could have been missed
*/
 
/*
	NOTE: this is a *compatibility* header. Some functions may look wrong at
	first, but they're only "as wrong" as they would be on Visual C++. Our
	priority is compatibility
 
	NOTE: unlike most people who write inline asm for GCC, I didn't pull the
	constraints and the uses of __volatile__ out of my... hat. Do not touch
	them. I hate cargo cult programming
 
	NOTE: be very careful with declaring "memory" clobbers. Some "obvious"
	barriers aren't there in Visual C++ (e.g. __stosX)
 
	NOTE: review all intrinsics with a return value, add/remove __volatile__
	where necessary. If an intrinsic whose value is ignored generates a no-op
	under Visual C++, __volatile__ must be omitted; if it always generates code
	(for example, if it has side effects), __volatile__ must be specified. GCC
	will only optimize out non-volatile asm blocks with outputs, so input-only
	blocks are safe. Oddities such as the non-volatile 'rdmsr' are intentional
	and follow Visual C++ behavior
 
	NOTE: on GCC 4.1.0, please use the __sync_* built-ins for barriers and
	atomic operations. Test the version like this:
 
	#if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) > 40100
		...
 
	Pay attention to the type of barrier. Make it match with what Visual C++
	would use in the same case
*/
 
/*** Stack frame juggling ***/
#define _ReturnAddress() (__builtin_return_address(0))
#define _AddressOfReturnAddress() (&(((void **)(__builtin_frame_address(0)))[1]))
/* TODO: __getcallerseflags but how??? */
 
 
/*** Atomic operations ***/
 
#if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) > 40100
#define _ReadWriteBarrier() __sync_synchronize()
#else
static void __inline__ __attribute__((always_inline)) _MemoryBarrier(void)
{
	__asm__ __volatile__("" : : : "memory");
}
#define _ReadWriteBarrier() _MemoryBarrier()
#endif
 
/* BUGBUG: GCC only supports full barriers */
#define _ReadBarrier _ReadWriteBarrier
#define _WriteBarrier _ReadWriteBarrier
 
#if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) > 40100
 
static __inline__ __attribute__((always_inline)) char _InterlockedCompareExchange8(volatile char * const Destination, const char Exchange, const char Comperand)
{
	return __sync_val_compare_and_swap(Destination, Comperand, Exchange);
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedCompareExchange16(volatile short * const Destination, const short Exchange, const short Comperand)
{
	return __sync_val_compare_and_swap(Destination, Comperand, Exchange);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedCompareExchange(volatile long * const Destination, const long Exchange, const long Comperand)
{
	return __sync_val_compare_and_swap(Destination, Comperand, Exchange);
}
 
static __inline__ __attribute__((always_inline)) long long _InterlockedCompareExchange64(volatile long long * const Destination, const long long Exchange, const long long Comperand)
{
	return __sync_val_compare_and_swap(Destination, Comperand, Exchange);
}
 
static __inline__ __attribute__((always_inline)) void * _InterlockedCompareExchangePointer(void * volatile * const Destination, void * const Exchange, void * const Comperand)
{
	return __sync_val_compare_and_swap(Destination, Comperand, Exchange);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedExchange(volatile long * const Target, const long Value)
{
	/* NOTE: __sync_lock_test_and_set would be an acquire barrier, so we force a full barrier */
	__sync_synchronize();
	return __sync_lock_test_and_set(Target, Value);
}
 
static __inline__ __attribute__((always_inline)) void * _InterlockedExchangePointer(void * volatile * const Target, void * const Value)
{
	/* NOTE: ditto */
	__sync_synchronize();
	return __sync_lock_test_and_set(Target, Value);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedExchangeAdd(volatile long * const Addend, const long Value)
{
	return __sync_fetch_and_add(Addend, Value);
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedAnd8(volatile char * const value, const char mask)
{
	return __sync_fetch_and_and(value, mask);
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedAnd16(volatile short * const value, const short mask)
{
	return __sync_fetch_and_and(value, mask);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedAnd(volatile long * const value, const long mask)
{
	return __sync_fetch_and_and(value, mask);
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedOr8(volatile char * const value, const char mask)
{
	return __sync_fetch_and_or(value, mask);
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedOr16(volatile short * const value, const short mask)
{
	return __sync_fetch_and_or(value, mask);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedOr(volatile long * const value, const long mask)
{
	return __sync_fetch_and_or(value, mask);
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedXor8(volatile char * const value, const char mask)
{
	return __sync_fetch_and_xor(value, mask);
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedXor16(volatile short * const value, const short mask)
{
	return __sync_fetch_and_xor(value, mask);
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedXor(volatile long * const value, const long mask)
{
	return __sync_fetch_and_xor(value, mask);
}
 
#else
 
static __inline__ __attribute__((always_inline)) char _InterlockedCompareExchange8(volatile char * const Destination, const char Exchange, const char Comperand)
{
	char retval = Comperand;
	__asm__("lock; cmpxchgb %b[Exchange], %[Destination]" : [retval] "+a" (retval) : [Destination] "m" (*Destination), [Exchange] "q" (Exchange) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedCompareExchange16(volatile short * const Destination, const short Exchange, const short Comperand)
{
	short retval = Comperand;
	__asm__("lock; cmpxchgw %w[Exchange], %[Destination]" : [retval] "+a" (retval) : [Destination] "m" (*Destination), [Exchange] "q" (Exchange): "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedCompareExchange(volatile long * const Destination, const long Exchange, const long Comperand)
{
	long retval = Comperand;
	__asm__("lock; cmpxchgl %k[Exchange], %[Destination]" : [retval] "+a" (retval) : [Destination] "m" (*Destination), [Exchange] "q" (Exchange): "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) long long _InterlockedCompareExchange64(volatile long long * const Destination, const long long Exchange, const long long Comperand)
{
	long long retval = Comperand;
 
	__asm__
	(
		"cmpxchg8b %[Destination]" :
		[retval] "+A" (retval) :
			[Destination] "m" (*Destination),
			"b" ((unsigned long)((Exchange >>  0) & 0xFFFFFFFF)),
			"c" ((unsigned long)((Exchange >> 32) & 0xFFFFFFFF)) :
		"memory"
	);
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) void * _InterlockedCompareExchangePointer(void * volatile * const Destination, void * const Exchange, void * const Comperand)
{
	void * retval = (void *)Comperand;
	__asm__("lock; cmpxchgl %k[Exchange], %[Destination]" : [retval] "=a" (retval) : "[retval]" (retval), [Destination] "m" (*Destination), [Exchange] "q" (Exchange) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedExchange(volatile long * const Target, const long Value)
{
	long retval = Value;
	__asm__("xchgl %[retval], %[Target]" : [retval] "+r" (retval) : [Target] "m" (*Target) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) void * _InterlockedExchangePointer(void * volatile * const Target, void * const Value)
{
	void * retval = Value;
	__asm__("xchgl %[retval], %[Target]" : [retval] "+r" (retval) : [Target] "m" (*Target) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedExchangeAdd(volatile long * const Addend, const long Value)
{
	long retval = Value;
	__asm__("lock; xaddl %[retval], %[Addend]" : [retval] "+r" (retval) : [Addend] "m" (*Addend) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedAnd8(volatile char * const value, const char mask)
{
	char x;
	char y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange8(value, x & mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedAnd16(volatile short * const value, const short mask)
{
	short x;
	short y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange16(value, x & mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedAnd(volatile long * const value, const long mask)
{
	long x;
	long y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange(value, x & mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedOr8(volatile char * const value, const char mask)
{
	char x;
	char y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange8(value, x | mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedOr16(volatile short * const value, const short mask)
{
	short x;
	short y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange16(value, x | mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedOr(volatile long * const value, const long mask)
{
	long x;
	long y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange(value, x | mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) char _InterlockedXor8(volatile char * const value, const char mask)
{
	char x;
	char y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange8(value, x ^ mask, x);
	}
	while(y != x);
 
	return y;
}
 
static __inline__ __attribute__((always_inline)) short _InterlockedXor16(volatile short * const value, const short mask)
{
	short x;
	short y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange16(value, x ^ mask, x);
	}
	while(y != x);
 
	return y;
}
 
 
static __inline__ __attribute__((always_inline)) long _InterlockedXor(volatile long * const value, const long mask)
{
	long x;
	long y;
 
	y = *value;
 
	do
	{
		x = y;
		y = _InterlockedCompareExchange(value, x ^ mask, x);
	}
	while(y != x);
 
	return y;
}
 
#endif
 
static __inline__ __attribute__((always_inline)) long _InterlockedAddLargeStatistic(volatile long long * const Addend, const long Value)
{
	__asm__
	(
		"lock; add %[Value], %[Lo32];"
		"jae LABEL%=;"
		"lock; adc $0, %[Hi32];"
		"LABEL%=:;" :
		[Lo32] "=m" (*((volatile long *)(Addend) + 0)), [Hi32] "=m" (*((volatile long *)(Addend) + 1)) :
		[Value] "ir" (Value)
	);
 
	return Value;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedDecrement(volatile long * const lpAddend)
{
	return _InterlockedExchangeAdd(lpAddend, -1) - 1;
}
 
static __inline__ __attribute__((always_inline)) long _InterlockedIncrement(volatile long * const lpAddend)
{
	return _InterlockedExchangeAdd(lpAddend, 1) + 1;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _interlockedbittestandreset(volatile long * a, const long b)
{
	unsigned char retval;
	__asm__("lock; btrl %[b], %[a]; setb %b[retval]" : [retval] "=r" (retval), [a] "=m" (a) : [b] "Ir" (b) : "memory");
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _interlockedbittestandset(volatile long * a, const long b)
{
	unsigned char retval;
	__asm__("lock; btsl %[b], %[a]; setc %b[retval]" : [retval] "=r" (retval), [a] "=m" (a) : [b] "Ir" (b) : "memory");
	return retval;
}
 
 
/*** String operations ***/
/* NOTE: we don't set a memory clobber in the __stosX functions because Visual C++ doesn't */
static __inline__ __attribute__((always_inline)) void __stosb(unsigned char * Dest, const unsigned char Data, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; stosb" :
		[Dest] "=D" (Dest), [Count] "=c" (Count) :
		"[Dest]" (Dest), "a" (Data), "[Count]" (Count)
	);
}
 
static __inline__ __attribute__((always_inline)) void __stosw(unsigned short * Dest, const unsigned short Data, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; stosw" :
		[Dest] "=D" (Dest), [Count] "=c" (Count) :
		"[Dest]" (Dest), "a" (Data), "[Count]" (Count)
	);
}
 
static __inline__ __attribute__((always_inline)) void __stosd(unsigned long * Dest, const unsigned long Data, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; stosl" :
		[Dest] "=D" (Dest), [Count] "=c" (Count) :
		"[Dest]" (Dest), "a" (Data), "[Count]" (Count)
	);
}
 
static __inline__ __attribute__((always_inline)) void __movsb(unsigned char * Destination, const unsigned char * Source, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; movsb" :
		[Destination] "=D" (Destination), [Source] "=S" (Source), [Count] "=c" (Count) :
		"[Destination]" (Destination), "[Source]" (Source), "[Count]" (Count)
	);
}
 
static __inline__ __attribute__((always_inline)) void __movsw(unsigned short * Destination, const unsigned short * Source, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; movsw" :
		[Destination] "=D" (Destination), [Source] "=S" (Source), [Count] "=c" (Count) :
		"[Destination]" (Destination), "[Source]" (Source), "[Count]" (Count)
	);
}
 
static __inline__ __attribute__((always_inline)) void __movsd(unsigned long * Destination, const unsigned long * Source, size_t Count)
{
	__asm__ __volatile__
	(
		"rep; movsd" :
		[Destination] "=D" (Destination), [Source] "=S" (Source), [Count] "=c" (Count) :
		"[Destination]" (Destination), "[Source]" (Source), "[Count]" (Count)
	);
}
 
 
/*** FS segment addressing ***/
static __inline__ __attribute__((always_inline)) void __writefsbyte(const unsigned long Offset, const unsigned char Data)
{
	__asm__("movb %b[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __writefsword(const unsigned long Offset, const unsigned short Data)
{
	__asm__("movw %w[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __writefsdword(const unsigned long Offset, const unsigned long Data)
{
	__asm__("movl %k[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
static __inline__ __attribute__((always_inline)) unsigned char __readfsbyte(const unsigned long Offset)
{
	unsigned char value;
	__asm__("movb %%fs:%a[Offset], %b[value]" : [value] "=q" (value) : [Offset] "irm" (Offset));
	return value;
}
 
static __inline__ __attribute__((always_inline)) unsigned short __readfsword(const unsigned long Offset)
{
	unsigned short value;
	__asm__("movw %%fs:%a[Offset], %w[value]" : [value] "=q" (value) : [Offset] "irm" (Offset));
	return value;
}
 
static __inline__ __attribute__((always_inline)) unsigned long __readfsdword(const unsigned long Offset)
{
	unsigned long value;
	__asm__("movl %%fs:%a[Offset], %k[value]" : [value] "=q" (value) : [Offset] "irm" (Offset));
	return value;
}
 
static __inline__ __attribute__((always_inline)) void __incfsbyte(const unsigned long Offset)
{
	__asm__("incb %%fs:%a[Offset]" : : [Offset] "ir" (Offset));
}
 
static __inline__ __attribute__((always_inline)) void __incfsword(const unsigned long Offset)
{
	__asm__("incw %%fs:%a[Offset]" : : [Offset] "ir" (Offset));
}
 
static __inline__ __attribute__((always_inline)) void __incfsdword(const unsigned long Offset)
{
	__asm__("incl %%fs:%a[Offset]" : : [Offset] "ir" (Offset));
}
 
/* NOTE: the bizarre implementation of __addfsxxx mimics the broken Visual C++ behavior */
static __inline__ __attribute__((always_inline)) void __addfsbyte(const unsigned long Offset, const unsigned char Data)
{
	if(!__builtin_constant_p(Offset))
		__asm__("addb %k[Offset], %%fs:%a[Offset]" : : [Offset] "r" (Offset));
	else
		__asm__("addb %b[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __addfsword(const unsigned long Offset, const unsigned short Data)
{
	if(!__builtin_constant_p(Offset))
		__asm__("addw %k[Offset], %%fs:%a[Offset]" : : [Offset] "r" (Offset));
	else
		__asm__("addw %w[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __addfsdword(const unsigned long Offset, const unsigned int Data)
{
	if(!__builtin_constant_p(Offset))
		__asm__("addl %k[Offset], %%fs:%a[Offset]" : : [Offset] "r" (Offset));
	else
		__asm__("addl %k[Data], %%fs:%a[Offset]" : : [Offset] "ir" (Offset), [Data] "iq" (Data));
}
 
 
/*** Bit manipulation ***/
static  __inline__ __attribute__((always_inline)) unsigned char _BitScanForward(unsigned long * const Index, const unsigned long Mask)
{
	__asm__("bsfl %[Mask], %<ol>
</ol>
" : <ol>
</ol>
 "=r" (*Index) : [Mask] "mr" (Mask));
	return Mask ? 1 : 0;
}
 
static  __inline__ __attribute__((always_inline)) unsigned char _BitScanReverse(unsigned long * const Index, const unsigned long Mask)
{
	__asm__("bsrl %[Mask], %<ol>
</ol>
" : <ol>
</ol>
 "=r" (*Index) : [Mask] "mr" (Mask));
	return Mask ? 1 : 0;
}
 
/* NOTE: again, the bizarre implementation follows Visual C++ */
static  __inline__ __attribute__((always_inline)) unsigned char _bittest(const long * const a, const long b)
{
	unsigned char retval;
 
	if(__builtin_constant_p(b))
		__asm__("bt %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*(a + (b / 32))), [b] "Ir" (b % 32));
	else
		__asm__("bt %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*a), [b] "r" (b));
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _bittestandcomplement(long * const a, const long b)
{
	unsigned char retval;
 
	if(__builtin_constant_p(b))
		__asm__("btc %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*(a + (b / 32))), [b] "Ir" (b % 32));
	else
		__asm__("btc %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*a), [b] "r" (b));
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _bittestandreset(long * const a, const long b)
{
	unsigned char retval;
 
	if(__builtin_constant_p(b))
		__asm__("btr %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*(a + (b / 32))), [b] "Ir" (b % 32));
	else
		__asm__("btr %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*a), [b] "r" (b));
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _bittestandset(long * const a, const long b)
{
	unsigned char retval;
 
	if(__builtin_constant_p(b))
		__asm__("bts %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*(a + (b / 32))), [b] "Ir" (b % 32));
	else
		__asm__("bts %[b], %[a]; setb %b[retval]" : [retval] "=q" (retval) : [a] "mr" (*a), [b] "r" (b));
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _rotl8(const unsigned char value, const unsigned char shift)
{
	unsigned char retval;
	__asm__("rolb %b[shift], %b[retval]" : [retval] "=rm" (retval) : "[retval]" (value), [shift] "Nc" (shift));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned short _rotl16(const unsigned short value, const unsigned char shift)
{
	unsigned short retval;
	__asm__("rolw %b[shift], %w[retval]" : [retval] "=rm" (retval) : "[retval]" (value), [shift] "Nc" (shift));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned char _rotr8(const unsigned char value, const unsigned char shift)
{
	unsigned char retval;
	__asm__("rorb %b[shift], %b[retval]" : [retval] "=rm" (retval) : "[retval]" (value), [shift] "Nc" (shift));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned short _rotr16(const unsigned short value, const unsigned char shift)
{
	unsigned short retval;
	__asm__("rorw %b[shift], %w[retval]" : [retval] "=rm" (retval) : "[retval]" (value), [shift] "Nc" (shift));
	return retval;
}
 
/*
	NOTE: in __ll_lshift, __ll_rshift and __ull_rshift we use the "A"
	constraint (edx:eax) for the Mask argument, because it's the only way GCC
	can pass 64-bit operands around - passing the two 32 bit parts separately
	just confuses it. Also we declare Bit as an int and then truncate it to
	match Visual C++ behavior
*/
static __inline__ __attribute__((always_inline)) unsigned long long __ll_lshift(const unsigned long long Mask, const int Bit)
{
	unsigned long long retval = Mask;
 
	__asm__
	(
		"shldl %b[Bit], %%eax, %%edx; sall %b[Bit], %%eax" :
		"+A" (retval) :
		[Bit] "Nc" ((unsigned char)((unsigned long)Bit) & 0xFF)
	);
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) long long __ll_rshift(const long long Mask, const int Bit)
{
	unsigned long long retval = Mask;
 
	__asm__
	(
		"shldl %b[Bit], %%eax, %%edx; sarl %b[Bit], %%eax" :
		"+A" (retval) :
		[Bit] "Nc" ((unsigned char)((unsigned long)Bit) & 0xFF)
	);
 
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned long long __ull_rshift(const unsigned long long Mask, int Bit)
{
	unsigned long long retval = Mask;
 
	__asm__
	(
		"shrdl %b[Bit], %%eax, %%edx; shrl %b[Bit], %%eax" :
		"+A" (retval) :
		[Bit] "Nc" ((unsigned char)((unsigned long)Bit) & 0xFF)
	);
 
	return retval;
}
 
 
/*** 64-bit math ***/
static __inline__ __attribute__((always_inline)) long long __emul(const int a, const int b)
{
	long long retval;
	__asm__("imull %[b]" : "=A" (retval) : [a] "a" (a), [b] "rm" (b));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) unsigned long long __emulu(const unsigned int a, const unsigned int b)
{
	unsigned long long retval;
	__asm__("mull %[b]" : "=A" (retval) : [a] "a" (a), [b] "rm" (b));
	return retval;
}
 
 
/*** Port I/O ***/
static __inline__ __attribute__((always_inline)) unsigned char __inbyte(const unsigned short Port)
{
	unsigned char byte;
	__asm__ __volatile__("inb %w[Port], %b[byte]" : [byte] "=a" (byte) : [Port] "Nd" (Port));
	return byte;
}
 
static __inline__ __attribute__((always_inline)) unsigned short __inword(const unsigned short Port)
{
	unsigned short word;
	__asm__ __volatile__("inw %w[Port], %w[word]" : [word] "=a" (word) : [Port] "Nd" (Port));
	return word;
}
 
static __inline__ __attribute__((always_inline)) unsigned long __indword(const unsigned short Port)
{
	unsigned long dword;
	__asm__ __volatile__("inl %w[Port], %k[dword]" : [dword] "=a" (dword) : [Port] "Nd" (Port));
	return dword;
}
 
static __inline__ __attribute__((always_inline)) void __inbytestring(unsigned short Port, unsigned char * Buffer, unsigned long Count)
{
	__asm__ __volatile__
	(
		"rep; insb" :
		[Buffer] "=D" (Buffer), [Count] "=c" (Count) :
		"d" (Port), "[Buffer]" (Buffer), "[Count]" (Count) :
		"memory"
	);
}
 
static __inline__ __attribute__((always_inline)) void __inwordstring(unsigned short Port, unsigned short * Buffer, unsigned long Count)
{
	__asm__ __volatile__
	(
		"rep; insw" :
		[Buffer] "=D" (Buffer), [Count] "=c" (Count) :
		"d" (Port), "[Buffer]" (Buffer), "[Count]" (Count) :
		"memory"
	);
}
 
static __inline__ __attribute__((always_inline)) void __indwordstring(unsigned short Port, unsigned long * Buffer, unsigned long Count)
{
	__asm__ __volatile__
	(
		"rep; insl" :
		[Buffer] "=D" (Buffer), [Count] "=c" (Count) :
		"d" (Port), "[Buffer]" (Buffer), "[Count]" (Count) :
		"memory"
	);
}
 
static __inline__ __attribute__((always_inline)) void __outbyte(unsigned short const Port, const unsigned char Data)
{
	__asm__ __volatile__("outb %b[Data], %w[Port]" : : [Port] "Nd" (Port), [Data] "a" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __outword(unsigned short const Port, const unsigned short Data)
{
	__asm__ __volatile__("outw %w[Data], %w[Port]" : : [Port] "Nd" (Port), [Data] "a" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __outdword(unsigned short const Port, const unsigned long Data)
{
	__asm__ __volatile__("outl %k[Data], %w[Port]" : : [Port] "Nd" (Port), [Data] "a" (Data));
}
 
static __inline__ __attribute__((always_inline)) void __outbytestring(unsigned short const Port, const unsigned char * const Buffer, const unsigned long Count)
{
	__asm__ __volatile__("rep; outsb" : : [Port] "d" (Port), [Buffer] "S" (Buffer), "c" (Count));
}
 
static __inline__ __attribute__((always_inline)) void __outwordstring(unsigned short const Port, const unsigned short * const Buffer, const unsigned long Count)
{
	__asm__ __volatile__("rep; outsw" : : [Port] "d" (Port), [Buffer] "S" (Buffer), "c" (Count));
}
 
static __inline__ __attribute__((always_inline)) void __outdwordstring(unsigned short const Port, const unsigned long * const Buffer, const unsigned long Count)
{
	__asm__ __volatile__("rep; outsl" : : [Port] "d" (Port), [Buffer] "S" (Buffer), "c" (Count));
}
 
 
/*** System information ***/
static __inline__ __attribute__((always_inline)) void __cpuid(int CPUInfo[], const int InfoType)
{
	__asm__ __volatile__("cpuid" : "=a" (CPUInfo[0]), "=b" (CPUInfo[1]), "=c" (CPUInfo[2]), "=d" (CPUInfo[3]) : "a" (InfoType));
}
 
static __inline__ __attribute__((always_inline)) unsigned long long __rdtsc(void)
{
	unsigned long long retval;
	__asm__ __volatile__("rdtsc" : "=A"(retval));
	return retval;
}
 
 
/*** Interrupts ***/
static __inline__ __attribute__((always_inline)) void __debugbreak(void)
{
	__asm__("int $3");
}
 
static __inline__ __attribute__((always_inline)) void __int2c(void)
{
	__asm__("int $0x2c");
}
 
static __inline__ __attribute__((always_inline)) void _disable(void)
{
	__asm__("cli");
}
 
static __inline__ __attribute__((always_inline)) void _enable(void)
{
	__asm__("sti");
}
 
 
/*** Protected memory management ***/
static __inline__ __attribute__((always_inline)) unsigned long __readcr0(void)
{
	unsigned long value;
	__asm__ __volatile__("mov %%cr0, %[value]" : [value] "=q" (value));
	return value;
}
 
static __inline__ __attribute__((always_inline)) unsigned long __readcr2(void)
{
	unsigned long value;
	__asm__ __volatile__("mov %%cr2, %[value]" : [value] "=q" (value));
	return value;
}
 
static __inline__ __attribute__((always_inline)) unsigned long __readcr3(void)
{
	unsigned long value;
	__asm__ __volatile__("mov %%cr3, %[value]" : [value] "=q" (value));
	return value;
}
 
static __inline__ __attribute__((always_inline)) unsigned long __readcr4(void)
{
	unsigned long value;
	__asm__ __volatile__("mov %%cr4, %[value]" : [value] "=q" (value));
	return value;
}
 
static __inline__ __attribute__((always_inline)) void __writecr0(const unsigned long long Data)
{
	__asm__("mov %[Data], %%cr0" : : [Data] "q" ((const unsigned long)(Data & 0xFFFFFFFF)) : "memory");
}
 
static __inline__ __attribute__((always_inline)) void __writecr3(const unsigned long long Data)
{
	__asm__("mov %[Data], %%cr3" : : [Data] "q" ((const unsigned long)(Data & 0xFFFFFFFF)) : "memory");
}
 
static __inline__ __attribute__((always_inline)) void __writecr4(const unsigned long long Data)
{
	__asm__("mov %[Data], %%cr4" : : [Data] "q" ((const unsigned long)(Data & 0xFFFFFFFF)) : "memory");
}
 
static __inline__ __attribute__((always_inline)) void __invlpg(void * const Address)
{
	__asm__("invlpg %[Address]" : : [Address] "m" (*((unsigned char *)(Address))));
}
 
 
/*** System operations ***/
static __inline__ __attribute__((always_inline)) unsigned long long __readmsr(const int reg)
{
	unsigned long long retval;
	__asm__ __volatile__("rdmsr" : "=A" (retval) : "c" (reg));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) void __writemsr(const unsigned long Register, const unsigned long long Value)
{
	__asm__ __volatile__("wrmsr" : : "A" (Value), "c" (Register));
}
 
static __inline__ __attribute__((always_inline)) unsigned long long __readpmc(const int counter)
{
	unsigned long long retval;
	__asm__ __volatile__("rdpmc" : "=A" (retval) : "c" (counter));
	return retval;
}
 
/* NOTE: an immediate value for 'a' will raise an ICE in Visual C++ */
static __inline__ __attribute__((always_inline)) unsigned long __segmentlimit(const unsigned long a)
{
	unsigned long retval;
	__asm__ __volatile__("lsl %[a], %[retval]" : [retval] "=r" (retval) : [a] "rm" (a));
	return retval;
}
 
static __inline__ __attribute__((always_inline)) void __wbinvd(void)
{
	__asm__ __volatile__("wbinvd");
}
 
#endif /* KJK_INTRIN_X86_H_ */
 
/* EOF */
cinsk의 이미지

정말 느린 것이 아니라면, pthread_mutex_t를 쓰는 것을 생각해 보기 바랍니다.

단순히 global mutex를 두고 작업해도 될 것 같다는 생각이 드는군요.

--
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
Korean Ver: http://www.cinsk.org/cfaqs/

shkim의 이미지

요즘 서버 프로그래밍의 대세는 멀티코어 CPU 와 Lockfree 알고리즘의 활용이라서

느리고 빠르고 문제가 아니라 시작부터 이런것을 잘 활용해야 한다고 생각합니다.

grassman의 이미지

인라인 어셈블리 떡칠이라서 저같으면 되도록 안쓰고 싶습니다만...
일단 IA32에서는 동작할 겁니다. 다른 CPU에서는 당연히 동작 불가입니다.

shkim의 이미지

intrinsics 는 인라인 어셈블리가 아닙니다.
장기적으로 볼때 컴파일러가 지원을 할 예약어 비슷한 것입니다.
위 헤더에는 gcc 4버전 미만은 인라인어셈으로 구현되어 있네요.
플랫폼이 달라도 intrinsics 키워드는 거의 같습니다. 지원못하면 어쩔수 없고요
그러므로 인라인어셈은 자제하는 것이 좋지만 intrinsic 은 많이 써줘도 되는 것입니다.

gcc 와 visual c의 intrinsics 가 키워드가 틀린데
어쨌거나 위와 같은 헬퍼 헤더가 있어서 둘중 하나로 맞춰서 쓸수 있으면 뭐 좋은거죠..

grassman의 이미지

intrinsics의 의미에 대한 것이 아니라 위의 코드에서 inline assembly가 많이 쓰인 것에 대한 불만 표시였습니다. 일이 임베디드 관련 된 쪽인지라 어셈블리는 어지간하면 안쓰려는 경향이 있어서 오버했습니다.

atomic operation에 관계되는 코드의 특성상 어셈블리가 반드시 사용되어야 하는 경우가 많은게 사실입니다만... 그럴 경우 제 개인적으로는 운영체제에서 지원해 주는 표준 함수 쪽을 쓰는 것이 좋다고 생각합니다. PC 기반으로 계속 프로그래밍 하신다면 관계 없는 얘기가 되겠지만 다른 CPU에 대한 이식을 필요로 하는 코드에서는 일거리를 도리어 늘리는 결과를 초래할 수도 있으니까요.

아뭏튼 원하시는 용도에는 적합해 보입니다.

shkim의 이미지

제가 조사해본 바로는 리눅스는 운영체제에서 유저모드 atomic 함수를 지원하지 않는 것 같습니다.
/usr/include 찾아보면 커널모드에서만 사용 가능한 atomic 함수 헤더 파일이 있더군요.
이걸 유저모드 프로그램에서 쓰는 방법은 못찾았습니다.
윈도우에는 당연히 있고.. Mac OS X 에도 있는데,
리눅스는 왜 이런거 지원 안해주는지 모르겠습니다. ㅠㅠ

bushi의 이미지

CPU에 굉장히 의존적인 것이 POSIX 에 포함될 수 있으려나 모르겠습니다.
(그런게 없는 CPU 는 어쩌라고... CPU 까지 따져가며 사용하라고 만든 POSIX 가 아닌데...)

어플리케이션 수준에선 그다지 필요성을 느껴본 적이 없습니다.
단위 op 수준에서의 lockless 를 고민해 본 적이 없다는게 정확한 표현이겠군요.

커널수준에선 똥줄이 탈 정도로 아쉽고, 때문에 오래전부터 API 까지 정의되어 있습니다.

마이크로 커널이라면 POSIX 와는 상관없이 독점적인 API 와 체계를 규정할 수도 있겠죠.
예로 드신 것들이 모두 여기에 해당되는 것 같은데요.

OTL

anfl의 이미지

윗분 말씀하신것 처럼 intrinsic은 일종의 키워드라고 보시면 됩니다.
주로 SIMD와 같은 operation들 혹은 C 언어에서 지원해주지 않는
instruction들을 사용하고자 할때 intrinsic을 사용할수 있습니다.

inline assembly라고 안쓸 이유는 없지만
굳이 싫어하신다면 intrinsic을 사용하시면 됩니다.
둘의 개념적인 차이점은 있지만 하는 일은 비슷하다고 보시면 될겁니다.


댓글 달기

Filtered HTML

  • 텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
  • 사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>
  • 다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
  • web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

BBCode

  • 텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
  • 다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
  • 사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param>
  • web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

Textile

  • 다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
  • You can use Textile markup to format text.
  • 사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Markdown

  • 다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
  • Quick Tips:
    • Two or more spaces at a line's end = Line break
    • Double returns = Paragraph
    • *Single asterisks* or _single underscores_ = Emphasis
    • **Double** or __double__ = Strong
    • This is [a link](http://the.link.example.com "The optional title text")
    For complete details on the Markdown syntax, see the Markdown documentation and Markdown Extra documentation for tables, footnotes, and more.
  • web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
  • 사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Plain text

  • HTML 태그를 사용할 수 없습니다.
  • web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
  • 줄과 단락은 자동으로 분리됩니다.
댓글 첨부 파일
이 댓글에 이미지나 파일을 업로드 합니다.
파일 크기는 8 MB보다 작아야 합니다.
허용할 파일 형식: txt pdf doc xls gif jpg jpeg mp3 png rar zip.
CAPTCHA
이것은 자동으로 스팸을 올리는 것을 막기 위해서 제공됩니다.