| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#11
|
| On Aug 26, 12:56*pm, Jon Elson > John Larkin wrote: > > You can do coarse delays by counting at some modest clock frequency, > > and get fine delays from a fast-clocked shift register or a simple > > external analog vernier. The analog thing can take you down to > > picosecond resolution. > > > You can also double resolution by using both clock edges. > > > But can you tolerate the 1-clock p-p jitter that you'll get from > > asynchronous trigger inputs slamming into a continuous clock? > > I didn't think so, that's why I designed a hideous analog delay circuit, > much like the no-longer-available-at-a-sane-price AD9501. *(A current > source, > integrating cap, comparator and DAC) *I ended up with 1200 components on > one board for 64 of these delay circuits. *And, it uses the difficult to > mount AD CMP603 in the 3 mm square CSP that gave me FITS getting a > couple boards working. *Many, many, many shorts and opens! > > But, aparently, 2 ns of jitter is NOT a problem. *Only the initial delay > will suffer the jitter, the width of the second timer will always be > synched to the clock, and so the width won't vary. *That was the more > critical part of it. > > I'm still researching how you do this with the DDR feature. > > Jon Hello Jon, The Virtex 4 family has input and output SERDES on their IOs. These are not the MGTs of the FX version, and are on all versions of the Virtex 4s. The SERDES can be used with the DDR registers in the IOB to get even faster performance from them. The SERDES can also be used in pairs to get a larger parallel to serial ratio. What you could do for your application is to use a pair of SERDES to get a 10:1 parallel to serial ratio, and combine that with the DDR registers to get the serial IO running at twice your high speed clock. If you have a high speed clock of say 400 MHz, your serial IO would be running a 800 MHz, and the parallel data path would be running at only 40 MHz. For the output, run your counters at 40 MHz, and scale them to count down by 10. When they are less than ten, use the LSBs to calculate the 10 bit data to give to the output SERDES to put the edge where you want it. Do the same for calculating the duration of the pulse. Since you said you want a pulse as narrow as 10 ns, you might want to use an 8:1 ratio to get rid of the case of two edges within one parallel word. Or run faster with a faster speed grade to get to 500 MHz. For the input, do the inverse. I was not clear if you needed a minimum delay of 10ns, a minimum pulse width or 10 ns, or both. If you need a minimum delay of 10 ns, you would need to run a smaller ratio to get your parallel clock fast enough. Take a look at the Virtex-4 users guide section on the ISERDES and OSERDES for more information about them: http://www.xilinx.com/support/docume...ides/ug070.pdf I also noted that you did not like that they came in BGAs. We have used out side rework shops to place BGAs for us with good results. If I remember correctly, it cost under $100 USD each for just a few boards. I recommend http://www.process-sciences.com/services/default.asp Regards, John McCaskill www.FasterTechnology.com |
|
#12
|
| > So, you think a 13-bit counter feeding a 13-bit identity comparator will > work at 250 MHz? Others have said it may be possible but what they fail to acknowledge is the large amount of extra design effort and care required to get there. 250 MHz is really pushing the limits in spartan 3e in my experience. You may have to work very hard to get there: for example I have just finished a distributed arithmetic filter design, that has only 1 LUT level between flops and after a lot of effort I got it to run at 206 MHz in a sp3 1600e. I can see how to get to 220MHz, but beyond that I don't know. The longest carry chain is 10 bits. I had to bypass synthesis and instantiate xilinx primitives directly to gaurantee my logic was implemented in 1 LUT level. Then I had to manually floorplan the design - placing each flop with the corresponding LUT by hand( I uses RLOC's embedded in the VHDL source). The automatic placer didn't always place the LUT with the FLOP so you end up with 2 routes which kills the timing completely. > Yeah, I really don't think we can handle $2000 IC's. *This isn't a real > production project, we might build 5 of them at a time, but we are still > cost-sensitive. If its such low volumes just take the unit cost hit and move to a Virtex part. How valuable is your time? |
|
#13
|
| On Tue, 26 Aug 2008 12:56:50 -0500, Jon Elson > > >John Larkin wrote: >> You can do coarse delays by counting at some modest clock frequency, >> and get fine delays from a fast-clocked shift register or a simple >> external analog vernier. The analog thing can take you down to >> picosecond resolution. >> >> You can also double resolution by using both clock edges. >> >> But can you tolerate the 1-clock p-p jitter that you'll get from >> asynchronous trigger inputs slamming into a continuous clock? > > >I didn't think so, that's why I designed a hideous analog delay circuit, >much like the no-longer-available-at-a-sane-price AD9501. (A current >source, >integrating cap, comparator and DAC) I ended up with 1200 components on >one board for 64 of these delay circuits. And, it uses the difficult to >mount AD CMP603 in the 3 mm square CSP that gave me FITS getting a >couple boards working. Many, many, many shorts and opens! > >But, aparently, 2 ns of jitter is NOT a problem. Only the initial delay >will suffer the jitter, the width of the second timer will always be >synched to the clock, and so the width won't vary. That was the more >critical part of it. > >I'm still researching how you do this with the DDR feature. > >Jon You might consider this trick: when a trigger comes in, start an oscillator, and use that to clock the FPGA. Then time out with counters, and use some fractional-clock trick if you need sub-clock delay or width resolution. I've done this with LC oscillators and coaxial ceramic resonator oscillators; both can be started essentially instantly and can have pretty good jitter performance. The resulting FPGA logic can be pretty simple. John |
|
#14
|
| Andrew FPGA wrote: >> So, you think a 13-bit counter feeding a 13-bit identity comparator will >> work at 250 MHz? > Others have said it may be possible but what they fail to acknowledge > is the large amount of extra design effort and care required to get > there. 250 MHz is really pushing the limits in spartan 3e in my > experience. You may have to work very hard to get there: for example I > have just finished a distributed arithmetic filter design, that has > only 1 LUT level between flops and after a lot of effort I got it to > run at 206 MHz in a sp3 1600e. I can see how to get to 220MHz, but > beyond that I don't know. The longest carry chain is 10 bits. > > I had to bypass synthesis and instantiate xilinx primitives directly > to gaurantee my logic was implemented in 1 LUT level. Then I had to > manually floorplan the design - placing each flop with the > corresponding LUT by hand( I uses RLOC's embedded in the VHDL source). > The automatic placer didn't always place the LUT with the FLOP so you > end up with 2 routes which kills the timing completely. > >> Yeah, I really don't think we can handle $2000 IC's. This isn't a real >> production project, we might build 5 of them at a time, but we are still >> cost-sensitive. > If its such low volumes just take the unit cost hit and move to a > Virtex part. How valuable is your time? Time is valuable - a good tradeoff in any cost analysis. I'm afraid his management may look at his time as a fixed cost and not consider the lost opportunity cost of allowing him to get on to new projects quicker. I had no problem running 300 MHz for a 600 Mb/s front end in the S3E (without using the DDR IOBs). The trick is breaking everything down to the 1 level of logic, but through the LUT attached to the register in its slice. I even transferred data between posedge and negedge domains (1.6ns!) with latches (rather than flops) to open up the timing budget. If the counter doesn't want to run full-width at 250MHz, the 2 LSbits can run at that rate and a slower counter run at a much lower rate. Or 1 LSbit for a 125MHz rate. It's all good. The high-speed techniques are for those who know how to pipeline and to rearrange algorithms to implement the "lean" logic. The programmable fixed-width counter is an excellent challenge for someone that has the brains but not (yet) the experience. The problem is well confined and the results quickly verifiable. I'd suggest any experienced FPGA designer could do this even if they haven't previously pushed the limits. The experience - to me, at least - is very valuable. For someone who hasn't worked with FPGAs (or fine-grain logic gates) the task might be too much "fresh out of the chute." I'd recommend the challenge to anyone wanting to expand their skills. - John_H |
|
#15
|
| John McCaskill wrote: > The Virtex 4 family has input and output SERDES on their IOs. These > are not the MGTs of the FX version, and are on all versions of the > Virtex 4s. The SERDES can be used with the DDR registers in the IOB to > get even faster performance from them. The SERDES can also be used in > pairs to get a larger parallel to serial ratio. SERDES are on more devices these days, and are the obvious and simple' way to get extended timing. If the price/package excludes those, you can use multiple phase clocks to interpolate time : generate as many phases as the device/DLLs can, and capture, then feed into a priority encoder to get a Phase Location, and then on output, a similar converse PhaseSum is used for fractional clock times. More logic, but lower clock speeds. -jg |
|
#16
|
| Andrew FPGA wrote: >>So, you think a 13-bit counter feeding a 13-bit identity comparator will >>work at 250 MHz? > > Others have said it may be possible but what they fail to acknowledge > is the large amount of extra design effort and care required to get > there. 250 MHz is really pushing the limits in spartan 3e in my > experience. You may have to work very hard to get there: for example I > have just finished a distributed arithmetic filter design, that has > only 1 LUT level between flops and after a lot of effort I got it to > run at 206 MHz in a sp3 1600e. I can see how to get to 220MHz, but > beyond that I don't know. The longest carry chain is 10 bits. > > I had to bypass synthesis and instantiate xilinx primitives directly > to gaurantee my logic was implemented in 1 LUT level. Then I had to > manually floorplan the design - placing each flop with the > corresponding LUT by hand( I uses RLOC's embedded in the VHDL source). > The automatic placer didn't always place the LUT with the FLOP so you > end up with 2 routes which kills the timing completely. Yes, with 64 instantiations of the circuit on the FPGA, I really DON'T want to deal with this! > > >>Yeah, I really don't think we can handle $2000 IC's. This isn't a real >>production project, we might build 5 of them at a time, but we are still >>cost-sensitive. > > If its such low volumes just take the unit cost hit and move to a > Virtex part. How valuable is your time? Yes, I think you are right, and I greatly appreciate the data points about the 206 MHz and the 10-bit carry. The circuit I need to implement is REALLY simple, but gets a bit more complicated when you add in the logic to handle the DDR. The SERDES components in the Virtex look like they would be ideal to handle this, and instead of only having an X2 option with DDR, this makes an X8 option nearly as simple. The part cost, all by itself, is not that terrible, the smaller Virtex chips are around $200. The other problem, however, is we have no capability or experience with BGAs, and would have to send them out. That at least doubles the cost! Thanks again for the info! Jon |
|
#17
|
| John McCaskill wrote: > I was not clear if you needed a minimum delay of 10ns, a minimum pulse > width or 10 ns, or both. If you need a minimum delay of 10 ns, you > would need to run a smaller ratio to get your parallel clock fast > enough. > Yes, as soon as you mentioned the SERDES in a previous message, I realized this was the best scheme for Virtex 4. I think trying to use Spartan 3E at insane clock rates would be risky, and might lead to a total collapse of the automatic tools when they try to route 64 instances of this module per chip. I don't want to have to do what Andrew ran into with manual placing x64 times. No FUN! > Take a look at the Virtex-4 users guide section on the ISERDES and > OSERDES for more information about them: > http://www.xilinx.com/support/docume...ides/ug070.pdf > > I also noted that you did not like that they came in BGAs. We have > used out side rework shops to place BGAs for us with good results. If > I remember correctly, it cost under $100 USD each for just a few > boards. I recommend http://www.process-sciences.com/services/default.asp The outfits I've seen so far seem to have a high setup charge, which makes them roughly double the component price. There likely are shops with better rates for small jobs. Jon |
|
#18
|
| John Larkin wrote: > You might consider this trick: when a trigger comes in, start an > oscillator, and use that to clock the FPGA. Then time out with > counters, and use some fractional-clock trick if you need sub-clock > delay or width resolution. > > I've done this with LC oscillators and coaxial ceramic resonator > oscillators; both can be started essentially instantly and can have > pretty good jitter performance. The resulting FPGA logic can be pretty > simple. But, we have 32 independent inputs, with no time correlation. Not enough global clock trees to handle that. Jon |
|
#19
|
| On Aug 27, 2:45*pm, Jon Elson > > Yes, I think you are right, and I greatly appreciate the data points > about the 206 MHz and the 10-bit carry. *The circuit I need to implement > is REALLY simple, but gets a bit more complicated when you add in the > logic to handle the DDR. *The SERDES components in the Virtex look like > they would be ideal to handle this, and instead of only having an X2 > option with DDR, this makes an X8 option nearly as simple. *The part > cost, all by itself, is not that terrible, the smaller Virtex chips are > around $200. *The other problem, however, is we have no capability or > experience with BGAs, and would have to send them out. *That at least > doubles the cost! > > Thanks again for the info! 1) The Virtex is a good way to go; the SERDES will work for you as long as your minimum delays are met (same is true of shorter delay DDR version). 2) I thought you had 32 channels 3) One instantiation is almost the same as 64 in complexity. It's one lone module that produces the results for each instance. 4) BGAs can give growing pains, but it's the industry's current sweet- spot. If not now, than a few more months down the road. I'd personally be happy to crank out a design like this in the Spartan3x series, but for a 5-up production the added speed and functionality of the Virtex is a good win. |
|
#20
|
| Jon Elson wrote: > Yes, I think you are right, and I greatly appreciate the data points > about the 206 MHz and the 10-bit carry. The circuit I need to implement > is REALLY simple, but gets a bit more complicated when you add in the > logic to handle the DDR. The SERDES components in the Virtex look like > they would be ideal to handle this, and instead of only having an X2 > option with DDR, this makes an X8 option nearly as simple. The part > cost, all by itself, is not that terrible, the smaller Virtex chips are > around $200. The other problem, however, is we have no capability or > experience with BGAs, and would have to send them out. That at least > doubles the cost! Lattice claim to have lowest cost SERDES - but I'm not sure anyone does SERDES in non-BGA packages... Could you target a low cost Eval Board - as a 'FPGA module' ? -jg |
![]() |
| Thread Tools | |
| Display Modes | |