AndAudio.com

由 **JCP** 發表於週四 7月 03, 2003 2:00 pm

主要是我好奇到底在linux上面跑fortean寫的科學計算軟體，到底pentium-m和pentium-4比起來是如何。因為還沒有機會弄到pentium-m的機器來自己測試，所以想問看看是否有人試過。
如果有的話，想先問針對pentium-m的kernel compiling要怎麼設定。
不過我只是好奇想知道一下，以後應該不會再買intel的cpu了。
要發揮intel cpu的能力還得花大錢跟intel買compiler，讓我後悔買了一台pentium-4m的notebook。

另外我亂猜pentium-m有對windows做最佳化，要不然就是有特別對pentium-m做最佳化的winXP，因為在windows上面pentium-4實在是輸的太慘了。可是我自己用linux覺得pentium-4比在windows下要爭氣。

由 **RogerShih** 發表於週四 7月 03, 2003 2:22 pm

K8 差不多要出來了, 應該有測試了吧!!:p

complier 請留意 intel 配合丟出來的 complier 跟 linker, 通常是 Linker, 目的碼轉成二進位機械碼通常還是得做調整.

也許 Pentium 4-M 有對 x86 指令做更好的對應吧.

由 **lifaung** 發表於週五 7月 04, 2003 8:19 am

P-M是P6架構吧,所以會贏P4這種怪異架構的機率相當大,
INTEL當初搞出來這怪物就是為了快速拉升時脈的,和ATHLON想法不同

P6架構從P-PRO生存到現在不是假的,而是因為改良多代後,技術成熟,
可靠度高的結果~~~

其實LINUX底下的話可以試試看AMD的ATHLON系列,不需要對他最佳化
都會很快,這也是他的優點吧,和ALPHA一樣都是屬於暴力型EV6 BUS的CPU

以一顆CPU的製作完成度來說的話我還滿喜歡P3之後的產品,L2的頻寬拉到256BIT
現在L2 CACHE又加到1MB,TLB也重新設計過,速度快,準確率也高

而ATHLON系列的話也是不錯啦,我家裡的機器都被汰換成ATHLON了
以ATHLON來看的話他快取的使用能力比P系列高相當多,只是因為架構複雜
而不容易提升時脈數,不過想想看,他的架構雖然先進,不過只有64BIT的
L2通道,能搞成這樣真的該給AMD的設計工程師頒獎,要不是他們的努力
可能現在我們還在用P3-800

以網站伺服器來說的話,我會偏好使用K7,因為...P4仔細看看架構的話就
知道問題在哪,他對於連續大封包處理很行,不過多數小封包就會屈居下風了
這也是K7,和同樣部分架構的K8在這裡吃定P4架構的產品的原因
如果是雙CPU的話恐怕再街近時脈下會差更多,因為K7,K8的BUS TYPE的緣故
其實說蠢一點,EV6的特點有一點可能已經有很多人玩過了"不同時脈的CPU跑雙CPU"
因為像是AMD MPX晶片組這樣的東西其實就是很典型的"兩顆北橋"的想法
兩顆CPU可以擁有獨立的頻寬和記憶體,所以在處理上來說的自由度會大很多
到了多CPU的話P4可能會受限於他的記憶體頻寬,而K7,K8則可以隨著
CPU數量來增加總頻寬數,然後最後卡死在CPU間的頻寬上面

LINUX並沒有完全支援P4阿,現在的WINXP的架構是將SSE2當作是浮點運算來用的
P43.2G的浮點運算能力可能還沒有一顆AMD2200+那麼好,甚至更低
上次看到別人的測試結果是如此的,去找找看有個叫做SYSTEM MARK的軟體吧
他的在浮點運算比較低分的地方就可以看到他的真實浮點運算能力
而ATHLON就只有一個數值...結論是...是我的話還是會繼續支持我的ATHLON下去吧
畢竟價格便宜,速度還不算低,泛用性也高,而且我不喜歡當主流派的~~~

真的要買的話P-M會比較有前途,畢竟原本的浮點也不錯(同時脈還是會
輸ATHLON,但是比P4大概高個1.5X以上)又多加了SSE2指令集的支援
日後大概就看的出來他的優勢在哪了吧~~~~~~~~~

不然就等ATHLON核心改版吧,要是真的會的話啦,加入SSE2的話,如果
設計的好的話就會像是P-M一樣又是一尾活龍(不過P-M的架構時脈也拉不高
P-4的架構現階段來說就可以輕鬆超過5GHZ了,ATHLON的BARTON大概
落在3G吧,要超過會有良率問題,也可以說是INTEL可能早就發現這種撇
步摟,INTEL還是不可小看的阿

)

由 **JCP** 發表於週五 7月 04, 2003 12:20 pm

我家裡是duron 1100，實驗室裡是k7-600，另外有兩台雙k7，當初灌linux的時候真的是快的可怕。前兩天ace's hardware上面有一篇文章是比較不同compiler在windows上面對p4速度的影響，裡面有提到gcc沒有對p4最佳化，因為intel不願意公佈細節。同樣的問題也會發生在P-M身上，因為intel對P-M的細節更捨不得公佈，之前被lindows的老闆抖出來還鬧的不小。我之所以會買p4-m是因為想要轉到notebook上，那時候k7-m的notebook還很少。最近看到hp出了一些低價的k7-m notebook就後悔了，但是已經晚了。

另外之前apple發表G5，我想下一台機器不是K8就是G5。今年之內amd會出mobile K8，到時候應該頗精彩。其實Tom's hardware上面已經有demo機了。

過一陣子還是來去找找看免錢的intel fortran and C compiler on linux，看看是否可以達到intel宣稱的30% improvement。

由 **starrer** 發表於週五 7月 04, 2003 2:40 pm

Intel C++ and Fortran 在Linux 下的Compiler
可以在線上申請教育版的來試用，沒有日期限制。

以我們測試的結果，
在某些我們自行開發的程式
比起gcc, icc有超過50%的效率改善，
不過得看程式的特性，
我覺得intel的compiler的確對
自家的處理器做了非常好的教調

由 **windwalker** 發表於週五 7月 04, 2003 3:33 pm

小離題一下.........
我一直不想去用Athlon的原因.........
都是為了主機板啊............ ><
到現在即使是nForce2還是讓我沒有去用它的動力......
總覺得K7系統的主機板在穩定度上和使用Intel原廠MCH/ICH的主機板還是有差.....
(我沒錢拼Tyan 2460/2466啊~~~~~ T_T )

btw........這個討論內容......對照最近念的計組.........真的很有趣 ^^||

由 **JCP** 發表於週五 7月 04, 2003 6:03 pm

多謝starr，我已經填申請書了，明天等intel的回音。
icc之所以比gcc好，應該是因為P4的細節intel沒有公開，照ace's hardware討論區的說法，P3應該就沒差這麼多。這也是我為什麼覺得受騙的原因，cpu已經被貴一次了，竟然連complier都還要再被剝一層皮。以後不會再跟intel打交道了。那一顆cpu用gcc就可以完全最佳化，我就買那一顆。

昨天看到intel說pentium M出貨超過一百萬顆，不過實際賣到消費者手上的應該不多，所以我還沒機會借到跑linux的電腦來測試。應該是說我認識的人還沒有一個買centrino筆電的。即使有，會跑linux的機率也低於5%。算了，過幾個月再說吧。

由 **gwliao** 發表於週六 7月 05, 2003 7:07 pm

JCP 寫:昨天看到intel說pentium M出貨超過一百萬顆，不過實際賣到消費者手上的應該不多，所以我還沒機會借到跑linux的電腦來測試。應該是說我認識的人還沒有一個買centrino筆電的。即使有，會跑linux的機率也低於5%。算了，過幾個月再說吧。

JCP兄, 不知你是否用gcc 3.X?
是的話, 可以試試Gentoo Linux提供的gcc3最佳化的參數! :bs:

該不會你已經試過了

由 **JCP** 發表於週日 7月 06, 2003 9:51 am

其實我不會用compiler，只會用人家寫好的Makefile，等intel compiler裝好後要來好好研究compiler怎麼用。
Ace's hardware討論區的文章是說gcc即使選了P4的參數，其實除了cache shift之外也沒什麼幫助，因為intel沒有公開細節。我看/usr/src/arch/i386/config.in
裡頭的內容，compile kernel的時候，確實P3跟P4只差在cache shift一項，甚至跟K7也只差這一項。不過據ace's hardware的說法gcc對P3跟K7是有最佳化的。
不知道gentoo linux所提供的參數是如何？除了-march跟-mcpu之外還有其他的參數可以試嗎？

由 **gwliao** 發表於週日 7月 06, 2003 2:06 pm

JCP 寫:不知道gentoo linux所提供的參數是如何？除了-march跟-mcpu之外還有其他的參數可以試嗎？

Safe flags to use for gentoo-1.4
http://www.freehackers.org/gentoo/gccflags/flag_gcc3.html
Experimental flags to use for gentoo-1.4
http://www.freehackers.org/gentoo/gccflags/flag_gcc3opt.html

P4的參數跟P3的一樣 :mad:

以上參數請使用gcc 3

由 **gwliao** 發表於週日 7月 06, 2003 2:10 pm

JCP 寫:其實我不會用compiler，只會用人家寫好的Makefile，等intel compiler裝好後要來好好研究compiler怎麼用。

以下是我節錄SPEC的文件中有關Intel C的使用方法!
----------------------------------------------------------
Description of compiler flags for Intel C++ Compiler 7.0 and 7.1
----------------------------------------------------------------
-O1 optimize for speed, but disable some optimizations which increase
code size for a small speed benefit. Includes inline expansion
except for intrinsic functions, global optimizations, string
pooling optimizations.

-O2 This is the default level of optimization.
Optimizes for speed. The -O2 option includes O1 optimizations
and in addition enables inlining of intrinsics and more speed
optimizations.

-O3: Builds on -01 and -02 optimizations by enabling high-level
optimization. This level does not guarantee higher performance
unless loop and memory access transformation take place. In
conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the
compiler to perform more aggressive data dependency analysis than
for -O2. This may result in longer compilation times.

-Oa[-] assume [do not assume] no aliasing in program

-Qax<codes> generate code specialized for processor extensions
specified by <codes> while also generating generic IA-32 code.
<codes> includes one or more of the following characters:
i Pentium Pro and Pentium II processor instructions
M MMX(TM) instructions
K streaming SIMD extensions (implies i and M above)
W Pentium 4 processor with Streaming SIMD Extensions 2
(implies i, M and K)

-Qx<codes> generate specialized code to run exclusively on processors
supporting the extensions indicated by <codes> as
described above.

-Ob{0|1|2} Controls the compiler's inline expansion.
0: disable inlining.
1: disables inlining unless -Qip or -Ob2 are specified.
2: enables inlining of any function. However, the
compiler decides which functions are inlined. This
option enables interprocedural optimizations and has
the same effect as specifying the -Qip option.

-Qip enable single-file IP optimizations
(within files, same as -Ob2)

-Qipo multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion

-Qwp_ipo no longer supported by the compiler; used to mean - inter-procedural
optimization making a "whole-program" assumption.

-Qprof_gen instrument program for profiling for the first phase of
two-phase profile guided otimization

-Qprof_use Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -Qprof_use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files

-Qrcd The Intel compiler uses the -Qrcd option to improve the
performance of code that requires floating-point-to-integer
conversions.

The system default floating point rounding mode is
round-to-nearest. This means that values are rounded during
floating point calculations. However, the C language requires
floating point values to be truncated when a conversion to an
integer is involved. To do this, the compiler must change the
rounding mode to truncation before each floating
point-to-integer conversion and change it back afterwards.

The -Qrcd option disables the change to truncation of the
rounding mode for all floating point calculations, including
floating point-to-integer conversions. Turning on this option
can improve performance, but floating point conversions to
integer will not conform to C semantics.

-Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to
let the compiler decide whether to perform unrolling or not. Use
n = 0 to disable unroller.
If n is not specified, the compiler automatically chooses the maximum
number of times to unroll a loop.

-GX Enables the full C++ Exception Handling unwind semantics.

-GR Enables C++ Runtime Type Information (RTTI).

-Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union
types as one of the following: 1, 2, 4, 8, or 16 (default) bytes.

shlW32M.lib: MicroQuill SmartHeap Library 5.0 available from
http://www.microquill.com/

由 **lifaung** 發表於週日 7月 06, 2003 4:05 pm

windwalker 寫:小離題一下.........
我一直不想去用Athlon的原因.........
都是為了主機板啊............ ><
到現在即使是nForce2還是讓我沒有去用它的動力......
總覺得K7系統的主機板在穩定度上和使用Intel原廠MCH/ICH的主機板還是有差.....
(我沒錢拼Tyan 2460/2466啊~~~~~ T_T )

btw........這個討論內容......對照最近念的計組.........真的很有趣 ^^||

基於某些原因個人對NV沒什麼好感,雖然在萬不得已嚇我還是用
了他的NF2,不過完全是為了那個內建顯示功能而以,
現階段來說我個人認為啦,真的是只有VIA的主機板是K7裡面真正能用的
從8233A以後就解決了南橋的問題,之間的傳輸性能還不差
以實際效能上來說的話SIS是越來越糟糕,VIA則是小有精進
目前的願望只有~~~NF2最好還有新版本的南橋可以逼迫VIA
再改進一下他的小問題~~~~~

以INTEL來說...說真的,我已經下定決心如果不是需要的話那我不
會去買他的CPU系列產品,像是PDA這種的就沒辦法了
因為跟在他後面沒什麼前途,跟在AMD後面也沒前途,不過要是
AMD倒了,那我們就得用每顆500美金以上的CPU了
所以...小弟還是會追隨AMD的

--
前幾天家中的8K3A+才加裝了BARTON 2800+和512MB的DDR記憶體
再KT333的版子上面又獲得重生啦,和去年的配備DURON1.1G
配備上256MB的DDR333記憶體比起來,速度在測試上幾乎可以*2

這樣的配備對我的KT333來說已經算是升到底了
接下來還要榨效能的話那應該要搞到13X以上的倍頻了吧~~~
其實VIA的產品至少~~~不會像NF一樣有些南橋上的問題
記憶體的穩定度也好很多
同樣是創見的DDR266,KT333可以跑到DDR333 CL=2同步
不過NF7-M則只能繼續跑DDR266,CL=2 ,加壓也沒用
其實很想買KT600給我姐用的,只是...KT600上週還是沒有出現
整合S-ATA的產品,好可惜~~~

如果想進場撈最後一票的K7的話可以趁現在買入磐英的KT600
主機板,性能應該不差,又有S-ATA,當初686B的問題其實在盤英
上也沒出現過(其實那只有幾個爭效能的廠因為調整了參數所以
造成某些奇怪問題出現,磐英用的是公版,所以沒什麼出槌子)

PS...太迷信INTEL的人可以免鐵齒啦,INTEL去年末的伺服器主機
板AGP插槽就有問題,因為晶片組的問題,後來的幾個產品有些有,
有些沒這問題...看運氣吧,另外INTEL真的是越作越回去,也開始
在東衝西撞的了

由 **Stevehyc** 發表於週四 7月 10, 2003 1:53 pm

windwalker 寫:(我沒錢拼Tyan 2460/2466啊~~~~~ T_T )

赫然發現, 2460/2466 都是我們家代工的, 可惜手上沒有 REG 的 DDR RAM, 否則 .......

AndAudio.com

有沒有人用pentium-m（centrino）在linux上做過科學計算

有沒有人用pentium-m（centrino）在linux上做過科學計算

誰在線上