Intel details future Larrabee graphics chip

This is a discussion on Intel details future Larrabee graphics chip within the Arch forums in Other Technologies category; Wilco Dijkstra wrote: > Chris M. Thomasson wrote in message news:kXPlk.7164$QX3.5075-at-newsfe02.iad... >> NV55 wrote in message >>> Larrabee will be a stand-alone chip, meaning it will be very different >>> than the low-end--but widely used--integrated graphics that Intel now >>> offers as part of the silicon that accompanies its processors. And >>> Larrabee will be based on the universal Intel x86 architecture. >> [...] >> >> Are they saying that programming this chip will be easier than programming a GPU because it honors the well >> established x86 arch? > > That's rubbish indeed. The cache coherency seems to be the only ...

Go Back   Database Forum > Other Technologies > Arch

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #11  
Old 08-05-2008, 01:19 PM
Default Re: Intel details future Larrabee graphics chip

Wilco Dijkstra wrote:
> "Chris M. Thomasson" wrote in message news:kXPlk.7164$QX3.5075-at-newsfe02.iad...
>> "NV55" wrote in message


>>> Larrabee will be a stand-alone chip, meaning it will be very different
>>> than the low-end--but widely used--integrated graphics that Intel now
>>> offers as part of the silicon that accompanies its processors. And
>>> Larrabee will be based on the universal Intel x86 architecture.

>> [...]
>>
>> Are they saying that programming this chip will be easier than programming a GPU because it honors the well
>> established x86 arch?

>
> That's rubbish indeed. The cache coherency seems to be the only advantage
> as other GPU also support C.


The real advantage has been lost in the Page Ranking: Larrabee doesn't just
support C, it supports pthreads (and thus any other concurrency model
that can be built on pthreads). MIMD + cache coherence + x86 is a
significant advantage over CUDA (which I would describe as "C, but not
as we know it").

I noticed recently that Cilk++, TBB, Fortress, and X10 are all using
work-stealing rather than static partitioning. AFAIK MIMD is a
prerequisite for work-stealing, so many of the future parallel
programming languages may not be able to run on conventional GPUs at all.

Wes Felter - wesley-at-felter.org
Reply With Quote
  #12  
Old 08-05-2008, 01:40 PM
Default Re: Intel details future Larrabee graphics chip


In article <48987d7b$1@kcnews01>, Wes Felter writes:
|>
|> The real advantage has been lost in the Page Ranking: Larrabee doesn't just
|> support C, it supports pthreads (and thus any other concurrency model
|> that can be built on pthreads).

Unfortunately, the very concept of supporting C and pthreads is
ill-formed. The standards are so grossly inconsistent that God
alone knows what they mean. I know for a certainty that nobody
who worked on them does.

The reason that pthreads causes only as much problem as it does
is that users don't use pthreads as such for high-communication
applications, and so the incidence of failing race conditions and
exposed inconsistencies is low. That applies EVEN to codes written
solely for the x86!

If users start using Larrabee or Niagara etc. for high-communication
applications, and use pthreads, all that will change.

|> I noticed recently that Cilk++, TBB, Fortress, and X10 are all using
|> work-stealing rather than static partitioning. AFAIK MIMD is a
|> prerequisite for work-stealing, so many of the future parallel
|> programming languages may not be able to run on conventional GPUs
|> at all.

I notice your implication that those have a future - well, we can
agree that they don't have a past :-)

More seriously, I agree with you, whether it is those languages or
others. SIMD has been proven to be a massively successful model,
for a restricted set of problems. And attempts to extend it to a
very much wider range of problems have failed, over a period of 30+
years. I teach that you should always look at SIMD first, and use
it if at all possible, but don't be surprised if it isn't.


Regards,
Nick Maclaren.
Reply With Quote
  #13  
Old 08-05-2008, 02:20 PM
Default Re: Intel details future Larrabee graphics chip

Nick Maclaren wrote:
> In article <48987d7b$1@kcnews01>, Wes Felter writes:
> |>
> |> The real advantage has been lost in the Page Ranking: Larrabee doesn't just
> |> support C, it supports pthreads (and thus any other concurrency model
> |> that can be built on pthreads).
>
> Unfortunately, the very concept of supporting C and pthreads is
> ill-formed. The standards are so grossly inconsistent that God
> alone knows what they mean. I know for a certainty that nobody
> who worked on them does.


According to the nice white paper Intel published, they've already
extended pthreads:

http://softwarecommunity.intel.com/U...e_manycore.pdf

"We have extended the API to also allow developers to specify thread
affinity with a particular HW thread or core."

and then they go on to say:

"Although P-threads is a powerful thread programming API, its
thread creation and thread switching costs may be too high for
some application threading. To amortize such costs, Larrabee
Native provides a task scheduling API based on a light weight
distributed task stealing scheduler [Blumofe et al. 1996]. A
production implementation of such a task programming API can
be found in Intel Thread Building Blocks"

The key missing item, at least to me, was a specification of the double
vs single precision performance. On the original Cell, double ran at 1/8
the speed of float, but it seems like more recent versions is fixing
this, to the point where you get about 50% of the throughput.

This is an important point for people (like me) who would like to have a
TFlop or so available in single chip and then gang up a cluster of them
to run serious simulation tasks.

Terje

--
-
"almost all programming can be viewed as an exercise in caching"
Reply With Quote
  #14  
Old 08-05-2008, 02:52 PM
Default Re: Intel details future Larrabee graphics chip


In article ,
Terje Mathisen writes:
|>
|> > Unfortunately, the very concept of supporting C and pthreads is
|> > ill-formed. The standards are so grossly inconsistent that God
|> > alone knows what they mean. I know for a certainty that nobody
|> > who worked on them does.
|>
|> According to the nice white paper Intel published, they've already
|> extended pthreads:
|>
|> http://softwarecommunity.intel.com/U...e_manycore.pdf
|>
|> "We have extended the API to also allow developers to specify thread
|> affinity with a particular HW thread or core."

Clearly useful, but it doesn't address my points. If they had
defined a proper memory model, or sorted out the thread- safety
mess, that would be much more useful.

|> and then they go on to say:
|>
|> "Although P-threads is a powerful thread programming API, its
|> thread creation and thread switching costs may be too high for
|> some application threading. To amortize such costs, Larrabee
|> Native provides a task scheduling API based on a light weight
|> distributed task stealing scheduler [Blumofe et al. 1996]. A
|> production implementation of such a task programming API can
|> be found in Intel Thread Building Blocks"

Well, the actual specification may say something more rational;
as it stands, that is codswallop. Because there is so much state
in C and a pthread, you can't quiesce one section of code and start
another without doing it at the thread level.

|> The key missing item, at least to me, was a specification of the double
|> vs single precision performance. On the original Cell, double ran at 1/8
|> the speed of float, but it seems like more recent versions is fixing
|> this, to the point where you get about 50% of the throughput.

A key point compared with the chip being unprogrammable?

Yes, it's important, but let's see if it is possible to program the
thing and get reliable results even with integers! And that is so
far unproven. Remember the Itanic?


Regards,
Nick Maclaren.
Reply With Quote
  #15  
Old 08-05-2008, 04:38 PM
Default Re: Intel details future Larrabee graphics chip

On Tue, 05 Aug 2008 08:24:04 -0700, John Larkin
wrote:

>On Tue, 5 Aug 2008 13:30:52 +0200, "Skybuck Flying"
> wrote:
>
>>As the number of cores goes up the watt requirements goes up too ?

>
>Not necessarily, if the technology progresses and the clock rates are
>kept reasonable. And one can always throttle down the CPUs that aren't
>busy.
>
>>
>>Will we need a zillion watts of power soon ?
>>
>>Bye,
>> Skybuck.
>>

>
>I saw suggestions of something like 60 cores, 240 threads in the
>reasonable future.
>


Oops, 4 threads per core is 320 threads.

My XP is currently running 33 processes and maybe a couple dozen
device drivers.

John

Reply With Quote
  #16  
Old 08-05-2008, 04:42 PM
Default Re: Intel details future Larrabee graphics chip

"Terje Mathisen" wrote in message
news:XZWdnTqNEJhJFgXVnZ2dnUVZ8sDinZ2d-at-giganews.com ...
> Nick Maclaren wrote:
>> In article <48987d7b$1@kcnews01>, Wes Felter writes:
>> |> |> The real advantage has been lost in the Page Ranking: Larrabee doesn't just
>> |> support C, it supports pthreads (and thus any other concurrency model
>> |> that can be built on pthreads).
>>
>> Unfortunately, the very concept of supporting C and pthreads is
>> ill-formed. The standards are so grossly inconsistent that God
>> alone knows what they mean. I know for a certainty that nobody
>> who worked on them does.

>
> According to the nice white paper Intel published, they've already
> extended pthreads:
>
> http://softwarecommunity.intel.com/U...e_manycore.pdf
>
> "We have extended the API to also allow developers to specify thread
> affinity with a particular HW thread or core."
>
> and then they go on to say:
>
> "Although P-threads is a powerful thread programming API, its
> thread creation and thread switching costs may be too high for
> some application threading. To amortize such costs, Larrabee
> Native provides a task scheduling API based on a light weight
> distributed task stealing scheduler [Blumofe et al. 1996]. A
> production implementation of such a task programming API can
> be found in Intel Thread Building Blocks"


FWIW, last time I checked, there was a very nasty race-condition in the TBB
"scheduler":

http://groups.google.com/group/comp....e96ade96038553
(read all...)


Also, there is a much better work-stealing algorithm out there:

http://research.sun.com/scalable/pub...rkstealing.pdf

http://groups.google.com/group/comp....d297f61b369a41

However, knowing SUN, its probably has a patent application...




> The key missing item, at least to me, was a specification of the double vs
> single precision performance. On the original Cell, double ran at 1/8 the
> speed of float, but it seems like more recent versions is fixing this, to
> the point where you get about 50% of the throughput.
>
> This is an important point for people (like me) who would like to have a
> TFlop or so available in single chip and then gang up a cluster of them to
> run serious simulation tasks.


Reply With Quote
  #17  
Old 08-05-2008, 04:54 PM
Default Re: Intel details future Larrabee graphics chip

"John Larkin" wrote in message
news:rtrg9458spr43ss941mq9p040b2lp6hbgg-at-4ax.com...
> On Tue, 5 Aug 2008 13:30:52 +0200, "Skybuck Flying"
> wrote:
>
>>As the number of cores goes up the watt requirements goes up too ?

>
> Not necessarily, if the technology progresses and the clock rates are
> kept reasonable. And one can always throttle down the CPUs that aren't
> busy.
>
>>
>>Will we need a zillion watts of power soon ?
>>
>>Bye,
>> Skybuck.
>>

>
> I saw suggestions of something like 60 cores, 240 threads in the
> reasonable future.


I can see it now... A mega-core GPU chip that can dedicate 1 core per-pixel.

lol.




> This has got to affect OS design.


They need to completely rethink their multi-threaded synchronization
algorihtms. I have a feeling that efficient distributed non-blocking
algorihtms, which are comfortable running under a very weak cache coherency
model will be all the rage. Getting rid of atomic RMW or StoreLoad style
memory barriers is the first step.

Reply With Quote
  #18  
Old 08-05-2008, 04:57 PM
Default Re: Intel details future Larrabee graphics chip

Chris M. Thomasson wrote:
> "John Larkin" wrote in
> message news:rtrg9458spr43ss941mq9p040b2lp6hbgg-at-4ax.com...
>> On Tue, 5 Aug 2008 13:30:52 +0200, "Skybuck Flying"
>> wrote:
>>
>>> As the number of cores goes up the watt requirements goes up too ?

>>
>> Not necessarily, if the technology progresses and the clock rates are
>> kept reasonable. And one can always throttle down the CPUs that aren't
>> busy.
>>
>>>
>>> Will we need a zillion watts of power soon ?
>>>
>>> Bye,
>>> Skybuck.
>>>

>>
>> I saw suggestions of something like 60 cores, 240 threads in the
>> reasonable future.

>
> I can see it now... A mega-core GPU chip that can dedicate 1 core
> per-pixel.


Why not?
Probably configured as a systolic array
http://en.wikipedia.org/wiki/Systolic_array


--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.theconsensus.org/ - A UK political party
http://www.onetribe.me.uk/wordpress/?cat=5 - Our podcasts on weird stuff
Reply With Quote
  #19  
Old 08-05-2008, 08:13 PM
Default Re: Intel details future Larrabee graphics chip

"Dirk Bruere at NeoPax" wrote in message
news:6fqv72Fcv806U1-at-mid.individual.net...
> Skybuck Flying wrote:
>> As the number of cores goes up the watt requirements goes up too ?
>>
>> Will we need a zillion watts of power soon ?
>>
>> Bye,
>> Skybuck.

>
> Since the ATI Radeon™ HD 4800 series has 800 cores you work it out.


Just note that the 4870 needs TWO of those 6 pin power leads...

Rarius


---- Posted via Pronews.com - Premium Corporate Usenet News Provider ----
http://www.pronews.com offers corporate packages that have access to 100,000+ newsgroups
Reply With Quote
  #20  
Old 08-06-2008, 11:26 AM
Default Re: Intel details future Larrabee graphics chip

Nick Maclaren wrote:
> In article ,
> Terje Mathisen writes:
> |> The key missing item, at least to me, was a specification of the double
> |> vs single precision performance. On the original Cell, double ran at 1/8
> |> the speed of float, but it seems like more recent versions is fixing
> |> this, to the point where you get about 50% of the throughput.
>
> A key point compared with the chip being unprogrammable?
>
> Yes, it's important, but let's see if it is possible to program the
> thing and get reliable results even with integers! And that is so
> far unproven. Remember the Itanic?


I'm very confident that the chip will actually work, and give useful,
repeatable results, but I don't expect things like fast (or even any?)
denormal handling except flush to zero.

Terje

--
-
"almost all programming can be viewed as an exercise in caching"
Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 11:07 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Integrated by bbpixel2008 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.