smp :: broneri homebrew story

smp

2006. 6. 3. 17:43

multicore vs multiprocessor
http://forums.appleinsider.com/showthread.php?s=a4ba8a396f6bb1c5638a6dd6a4df47cd&threadid=65217
---------------------------------------------------
The terminology being used here is really the source of the confusion. Multiple processors or multi-processors, and its related technical implementations: symmetric multi-processing, asymmetric multi-processing (or -processor), were terms used back in the day to describe systems with multiple processors. Once multi-core processors started to ship, terminology confusion reigned.

For the most part, multi-core simply is a multi-processor technology in the same CPU package. A package consists of the CPU silicon die, the material substrate it's glued onto, the wiring/signaling system, and the housing that goes around it. All you see is the housing and the wiring pins or balls.

A multi-core processor is simply the implementation of a multi-processor system in the same CPU packaging. There are two major variations of this:

1. A multiple processor system can exist on one piece of silicon, including the cores and a communication bus between the cores. Athlon X2, FX, Opteron; Intel Core 1 Duo, Core 2 Duo; IBM PPC 970mp, Power4, Power5 (cores) are all examples of this.

2. A multiple processor system can exist in the same CPU packaging. Two discrete CPU dies (dice, whatever) are placed inside the CPU housing, typically connected by the system's processor bus. Intel Paxville, Clovertown, Kensfield, IBM Power4, Power5 (the MCM) are all examples of this.

In today's time, multiple processor systems are leaning towards being described as systems where multiple CPU sockets are present.

June, 2006 Update: Multithreading, multicores and multiprocessors
http://www.embedded.com/showArticle.jhtml?articleID=173400008

강추
http://www.embedded.com/showArticle.jhtml?articleID=183702075

CPU에 L1캐쉬 붙으면 Core가 되고 여러 Core와 L2캐쉬가 공유되어
프로세서로 묶일 수 있다.
각 스레드가 각자의 CPU에서 수행시 데이터 쓰기시 system memory로
쓰기가 매번 일어난다 (pingpong)
만일 아래와 같은 상황이면 어떤 Core나 어떤 Processor든 쓰기시 pingpong으로 성능이 저하된다.
Processor1       Processor2
Core0 Core1      Core0 Core1
CPU  CPU        CPU    CPU
L1     L1            L1      L1
L2     L2            L2      L2

Programming Model
Functional Decomposition
하나의 기능적 operation 수행을 각 스레드로 처리
따라서 각 기능들은 system memory의 데이터를 공유하여 사용하므로
각 Core의 캐쉬에 같은 데이터들이 존재하게 된다.
이 경우, 두 스레드가 같은 데이터를 쓰고 읽는 상황에선 pingpong이 잠재적으로 발생가능하다.
따라서 두 스레드는 데이터 처리의 일련 작업 단위로 번갈아가면서
즉, 스레드1은 일련 작업단위의 데이터를 처리하고 스레드2는 이전 일련
작업단위의 데이터를 처리하는 식이 좋다.

만일 두 코어가 각자 L1캐쉬가 있고 공유된 L2캐쉬가 있다면 좀 낫지만
여전히 코어의 L1캐쉬간에 cache coherency를 위한 오버헤드가 발생한다.

Domain Decomposition
Data Decomposition이라고하며
처리해야할 일련의 데이터 묶음 단위로 스레드 처리
이 경우 각 스레드는 같은 함수를 호출하지만, 데이터는 서로 다르다.
따라서 각 Core의 캐쉬에는 다른 데이터가 들어 있어 pingpong될일이 없다.
이런 식으로 애플리케이션을 작성하는게 낫다.

만일 두 코어가 각자 L1캐쉬가 있고 공유된 L2캐쉬가 있다면 L2캐쉬에
동일한 위치의 캐쉬 라인에 써지는 데이터를 두 스레드가 접근한다면 성능에
조금 안좋다. 이런 충돌은 스레드가 같은 데이터 자료구조, 데이터 레이아웃
을 복사할 때 발생된다. L2캐쉬가 각 Core별로 분리되었다면 이런 충돌은 없을것이다.

False Sharing
두 스레드간에 사용되는 서로 다른 데이터가 같은 캐쉬라인을 사용하는 것 의미

False Sharing
The selection of a programming model looks at cases where the designer knows that data is shared between threads. False sharing results when separate data items that are accessed by separate threads are allocated to the same cache line.

Since a data access causes an entire cache line to be read into cache from System Memory, if one data item in the cache line is shared, all of the data items in that cache line will be treated as shared by the cache subsystem.

Two data items could be updated in unrelated transactions by two threads running on different cores but, if the two items are in the same cache line, the cache subsystem will have to update the System Memory in order to maintain cache coherency setting up a condition where pingponging can occur.

An array of structures could be used to organize data that will be accessed by multiple threads. Each thread could access one structure from the array following the domain decomposition model. By following this model, data sharing between threads will not occur and the system avoids the performance impact of maintaining consistency between the caches of each thread’s core unless the structures used by the threads are in the same cache line.

If the cache line size is 64 bytes and the structure size of a structure in the array is 32 bytes, two structures will occupy one cache line. If the two threads accessing the two structures are running on different cores, an update to one structure in the cache line will force the entire cache line to be written to System Memory. It will also invalidate the cache line in the cache on the second core.

The next time the second structure is accessed, it will have to be read from System Memory. If a sequence of updates is done to the structures, the performance of the system can seriously degrade.

One technique to avoid false sharing is to align data items on cache line boundaries using compiler alignment directives. However, over-use of this technique can result in cache lines that are only partially used. True sharing refers to cases where the sharing of data between threads is intended by the software designer.

An example would be the structures associated with locking primitives such as semaphores. The locks may be required to synchronize access to a data area but the designer can minimize the potential impact to system performance by ensuring that a minimal number of threads is used to access the data area. In some cases, it may make sense to create copies of the data that can be operated on by several threads and then fed back to a single thread that updates the shared data area.

SMP
---------------------------------------------------------------
SMP(Symmetric MultiProcessor)는 메모리를 공유하는 여러 CPU에의한
병렬 처리 형태로, NUMA 구조의 일종이다.
보통 SMP를 지원하는 OS는 스레드나 태스크를 각 CPU에 할당하여
최대한 CPU를 활용하게 된다.
공유되는 리소스(메모리, IP)에 대해서 모든 CPU는 대칭적으로 사용할 수 있다. 보통 SMP에서 하나의 OS가 여러 CPU를 동작시키므로, CPU 설계는homogeneous (동질성) 하다.

04BusBasedSMP.ppt
http://www.lrr.in.tum.de/~gerndt/home/Teaching/SS2006/Hochleistungsarchitekturen/04BusBasedSMP.pdf

Shared Memory Multiprocessors Bus Based Shared Memory Private
http://www.iro.umontreal.ca/~aboulham/F3380/new-l17x4.pdf

Multiprocessor Operating Systems
http://www.phptr.com/articles/article.asp?p=26027&seqNum=1&rl=1

SMP Issue 강추
http://www.cse.unsw.edu.au/~cs9242/02/lectures/10-smp.pdf

Performance and Implementation Complexity in Multiprocessor Operating System Kernels
http://www.bth.se/fou/forskinfo.nsf/7172434ef4f6e8bcc1256f5f00488045/7b3f17aab99e3c2fc125709d0045a969/$FILE/Kagstrom_lic.pdf

Design and Benchmarking of Real-Time Multiprocessor Operating ...
http://www.mrtc.mdh.se/publications/0576.pdf

SMP RTOS
----------------------------------------------------------------
Multiprocessing with real-time operating systems
http://www.embedded.com/showArticle.jhtml?articleID=10700141#1

Putting Multicore Processing in Context
http://www.embedded.com/showArticle.jhtml?articleID=175802474
Dealing with hardware and OS issues
http://www.embedded.com/showArticle.jhtml?articleID=181401320

Priority Inheritance Spin Lock (for Fine grained Spin lock)
http://ieeexplore.ieee.org/iel3/3753/10958/00508963.pdf?arnumber=508963#search=%22A%20Prioritized%20Multiprocessor%20Spin%20Lock%22

Putting multicore processing in context for your power designs
http://www.powermanagementdesignline.com/howto/175802954

SMP 지원 OS
---------------------------------------------------------------
Windows NT
Linux 2.6

Intel MPS (MultiProcessor Specification)
---------------------------------------------------------------
Bringing SMP to Your UP Operating System
http://www.cheesecake.org/sac/smp.html

How to adapt traditional RTOSes to symmetric multiprocessing
http://www.embedded.com/showArticle.jhtml?articleID=174913768

User Level Spin lock
http://codeproject.com/threads/spinlocks.asp

Windows Driver synchronization
http://islab.hufs.ac.kr/main/sub/seminar_driver/Win2k_driver_ch_5.ppt

해머 프로세서에서의 멀티프로세서 구성
http://www.technoa.co.kr/content/View.asp?pPageID=38629

AMD's New Designs on Software: NUMA
http://www.devx.com/amd/Article/31802

FreeBSD
---------------------------------------------------------------
http://www.freebsd.org/smp/

linux SMP
---------------------------------------------------------------
Linux 2.6 현재 상태
Intel의 MPS (Multi Processor Specification) 지원
ARM MPCore 지원
Sparc 지원
PowerPC 지원
Alpha 지원

An Implementation Of Multiprocessor Linux
http://suparum.rz.uni-mannheim.de/Linux/smp/smp.html

Linux SMP HOWTO
http://www.linux.org/docs/ldp/howto/Parallel-Processing-HOWTO.html
http://www.linux.org/docs/ldp/howto/SMP-HOWTO.html

Linux Parallel Processing HOWTO
http://yara.ecn.purdue.edu/~pplinux/PPHOWTO/pphowto.html

Linux Parallel Processing Using SMP
http://yara.ecn.purdue.edu/~pplinux/ppsmp.html

The Linux Symmetrical Multiprocessing (SMP) Model
http://www.ibmpressbooks.com/articles/article.asp?p=389712&seqNum=6&rl=1

Moving to SMP
http://www.linuxjournal.com/article/3515

Kernel Locking Techniques
http://www.linuxjournal.com/article/5833

The Linux Scheduler
http://www.linuxjournal.com/article/3910

Understanding Caching
http://www.linuxjournal.com/article/7105

CPU Affinity
http://www.linuxjournal.com/article/6799

Shielded Processors: Guaranteeing Sub-millisecond Response in Standard Linux
http://linuxdevices.com/articles/AT8610061752.html

Take charge of processor affinity
http://www-128.ibm.com/developerworks/linux/library/l-affinity.html?ca=dgr-lnxw09Affinity

Linux "processor affinity" explained
http://linuxdevices.com/news/NS7543786562.html

Improved Linux* SMP Scaling: User-directed Processor Affinity
http://www.intel.com/cd/ids/developer/asmo-na/eng/188935.htm?page=4

interrupt
http://blog.naver.com/jinkaien?Redirect=Log&logNo=140026060442

spin lock
http://dblab.kmu.ac.kr/%7Egbkwon/study/EmbeddedKernel-KLDP/start.kerenl.lockkernel.html

spin lock, semaphore, global interrupt disabling
http://220.70.2.32/Common/Board/Files/Kenel%20Sync.ppt

Linux Locking 추천
http://www2.ustc.edu.cn/%7Ejames/em2005/5.pdf

Memory Ordering in Modern Microprocessors Paul E. McKenney Draft of 2006/03/13 10:47
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2006.03.13a.pdf

Chapter 3: Tuning CPU resources
http://osr5doc.sco.com:1997/cgi-bin/printchapter/PERFORM/BOOKCHAPTER-3.html

Scaling dcache with RCU
http://www.linuxjournal.com/node/7124/print

chap07-Multiprocessor.ppt
http://dmclab.hanyang.ac.kr/files/courseware/graduate/AdvancedOS/Spring2005/notes/chap07-Multiprocessor.pdf#search=%22Spin%20Lock%20Granularity%22

동기화와 다중 처리기
http://ssrnet.snu.ac.kr/course/os2004-1/chap7.ppt

Locking in OS Kernels for SMP Systems 강추
http://kbs.cs.tu-berlin.de/teaching/ws2005/htos/papers/smp_locking.pdf#search=%22SMP%20lock%20granularity%22

ARM11, MPCore
-----------------------------------------------------------------
ARM11 MPCore Multiprocessor 제품 소개
http://arm.convergencepromotions.com/catalog/753.htm

http://www.arm.com/documentation/
ARM11 MPCore Processor r0p2 Technical Reference Manual ( 4MB .pdf )
http://www.arm.com/pdfs/DDI0360B_arm11_mpcore_r0p2_trm.pdf

arm linux mpcore patch
http://www.arm.com/linux/linux_download.html

asymmetric multiprocessor (AMP)
----------------------------------------------------------------
서로 다른 코어에서 같은 OS 또는 다른 OS가 동작된다.
서로 CPU가 다르므로 heterogeneous(이질성)하다.
버스 설정과 CPU의 버스 구조가 다를 수 있어 각 CPU가 볼 수 있는
리소스가 서로 다르다 (대칭적이지 않다)

다른 OS간에는 메시지 패싱 매커니즘이 필요하다. (분산 OS 환경이다)

How to make your asymmetric multiprocessor design OS and CPU independent
http://www.embedded.com/showArticle.jhtml?articleID=175007215

'KB > Win32/x86' 카테고리의 다른 글

Programming Optimization (0)	2006.06.12
Researchers in Computer Architecture and Compilers (0)	2006.06.12
x86 아키텍처 정보 (0)	2006.05.25
[MSDN] 64비트 윈도우를 프로그래밍 하기 위해서 알아야 할 모든 것들 (0)	2006.05.15
A Crash Course on the Depths of Win32™ Structured Exception Handling (0)	2006.05.15

broneri homebrew story

smp

'KB > Win32/x86' 카테고리의 다른 글

+ Recent posts

티스토리툴바