computing star

2013年1月18日

PgBouncer

PgBouncer 是 PostgreSQL 的輕量的連接池，支援三種模式。

Session pooling/會話連接池
最禮貌的方法。在客戶端連接的時候，在它的連接生命期內，會給它賦予一個服務器連接。在客戶端斷開的時候，服務器連接會放回到連接池中。
Transaction pooling/事務連接池
服務器連接只有在一個事務裡的時候才賦予客戶端。在 PgBouncer 注意到事務結束的時候，服務器將會放回連接池中。這是一個 hack，因為它打破了應用對後段連接的看法。只有在應用配合這樣的使用模式，沒有使用會破壞這種使用模式的時候才能用這個連接方式。參閱下標獲取會破壞這種模式的特性。
Statement pooling/語句連接池
最激進的模式。這是事務連接池的一個扭曲的變種 - 不允許多語句的事務。這就意味著是在客戶端強制“autocomit”模式，主要是給 PL/Proxy 用的。

setting in Ubuntu 12.04

安裝：sudo apt-get install pgbouncer
設定檔：位於/etc/pgbouncer資料夾下，pgbouncer.ini為主要設定檔，userlist.txt為可連線的使用者(在 pgbounce.ini中指定)。
man pgbouncer可以看到manaul中提供的一個簡單配置的例子。

上面的配置說明了該pgbouncer創建了針對127.0.0.1上的template1的一個連接池，該連接池對調用方的呈現的資料庫名稱是pg_template1,它映射到了template1上。所有訪問pbbouncer上的pg_template1的請求都會轉到template1上完成。
pool_mode 指明了連接池的模型，pgbouncer目前支援三種連接池模型。分別是session, transaction和statment三個級別。
a. session. 會話級連結。只有與當用戶端的會話結束時，pgbouncer才會收回已分配的連結
b. transaction 事務級連接。當事務完成後，pgbouncer會回收已分配的連結。也就是說用戶端只是在事務中才能獨佔此連結，非事務的對資料庫的請求是沒有獨享的連結的。
c. statement 語句級連結。任何對資料庫的請求完成後，pgbouncer都會回收連結。此種模式下，用戶端不能使用事務，否則會造成資料的不一致。 pgbouncer的預設設置是session連結。
listen_port和listen_addr是pgbouncer監聽的位址和埠號。
auth_type和auth_file是bppgbouncer用以完成用戶端身份認證。auth_file中保存用戶名和密碼，根據驗證方式(auth_type)的不同，auth_file的內容也有不同。
md5: 基於md5的密碼驗證，
auth_file中需要有普通文本和md5值兩種形式的密碼；
crypt: 基於crypt的密碼驗證(man 3 crypt), auth_file必須包含文本密碼；
plain: 明文驗證方式；
trust: 不進行驗證，但auth_file依然需要保存用戶名；
any: 也不進行驗證，而且auth_file中不需要保存用戶名了。但此種方式需要在pg_template1中明確說明用戶名進行真實資料庫的登錄。如: pg_template1 = host=127.0.0.1 user=exampleuser dbname=template1.否則會報錯的。需要說明的是：auth_file中的用戶名、密碼都必須使用雙引號，否則還是報錯。
logfile和pidfile分別保存log檔和pid檔的路徑。
admin_users：列出哪些用戶可以登錄pgbouncer進行管理，以逗號進行分隔 stats_users：列出哪些用戶可以登錄pgbouncer進行唯讀操作，如可以列出伺服器狀態，訪問連結等，但是不能執行reload。

2013年1月17日

Postgresql 9.1安裝(Ubuntu 12.04)

安裝

$sudo apt-get install postgresql
#查詢已安裝的版本: psql -V
#改變postgres user密碼: sudo passwd postgres
#變更為postgres user: su - postgres
#進入psql console: psql postgres
#上述三步驟可簡化為: sudo -u postgres psql postgres
在psql中修改postgres帳號的密碼： ALTER USER postgres WITH PASSWORD '';

設定

所有設定檔都在/etc/postgresql資料夾下
sudo vim /etc/postgresql/9.1/main/postgresql.conf(主要設定檔)設定listen_address='localhost' (只有本機可連線)
or listen_address='*' (本機所有介面均可連線)
启用密码验证:
#password_encryption = on改为password_encryption = on
ph_hba.conf(可訪問的client ip range)
在文档末尾加上以下内容
# to allow your client visiting postgresql server
host all all 0.0.0.0 0.0.0.0 md5
share memory of os
http://www.postgresql.org/docs/9.1/static/kernel-resources.html
在/etc/sysctl.conf中設定
kernels.shmmax=9663676416 #(9G, 9*2^9)
kernels.shmall=2359296 #(9G/4096k)
完成後，重新啟動pgsql服務：service postgresql restart

2012年12月17日

Threano scan function

http://deeplearning.net/software/theano/tutorial/loop.html 使用scan的好處在連結裡面已交代清楚，但是用法仍需再解釋清楚。首先從第一個範例開始，下面兩個程式都是計算A的k次方之值：

result = 1
for i in xrange(k):
    result = result * A

import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False

k = T.iscalar("k")
A = T.vector("A")

def inner_fct(prior_result, A):
    return prior_result * A

# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
                            outputs_info=T.ones_like(A),
                            non_sequences=A, n_steps=k)

'''
Scan has provided us with A ** 1 through A ** k.  
Keep only the last value. Scan notices this and 
does not waste memory saving them.
'''
final_result = result[-1]

power = theano.function(inputs=[A, k], outputs=final_result,
                      updates=updates)

print power(range(10),2)
#[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]

scan當中的fn為所要執行的函數，也可使用lambda的方式來定義。
第二個param outputs_info設定為大小與 A相同的矩陣，且矩陣內之值全部為1。
non_sequences為在scan當中不會變動之值，在此A在整個loop當中均不會變化。
steps為所要執行次數。

Theano shared variable

http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables
裡面的參考範例如下：

from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state;
                              updates=[(state, state+inc)])

裡面比較特殊的部份是state為shared variable, 0為其初始值。
此值可在多個function當中共用, 在程式當中可用state.get_value()的方式取其值，也可用state.set_value(val)的方式來設定其值。

另一需說明的部份為function.update([shared-variable, new-expression]), 此函數必須為pair form，也可使用dict的key=value形式。
此式的意義即在每次執行時，都將shared-variable.value更換成new-expression所得到的結果。

因此在執行範例後得到的結果如下：

state.get_value() #程式尚未執行，array(0)
accumulator(1)    #array(0)->array(1)
state.get_value() #array(1)
accumulator(300)  #array(1)->array(301)
state.get_value() #array(301)
#reset shared variable
state.set_value(-1)
accumulator(3)
state.get_value() #array(-1)->array(2)

如同上述，shard variable可被多個function共用，因此定義另一個decreaser對state做存取：

decrementor = function([inc], state, updates=[(state, state-inc)])
decrementor(2)
state.get_value() #array(2)->array(0)

如果要在shared variable放函數時，需改用function.given()，範例如下：

fn_of_state = state * 2 + inc
# the type (lscalar) must match the shared ariable we
# are replacing with the ``givens`` list
foo = T.lscalar() 
skip_shared = function([inc, foo], fn_of_state,
                                   givens=[(state, foo)])
skip_shared(1, 3)  # we're using 3 for the state, not state.value
state.get_value()  # old state still there, but we didn't use it
#array(0)

雖然上述的函數相當方便，但文件中未提到是否會有race condition的情形發生。
http://deeplearning.net/software/theano/tutorial/aliasing.html
在understanding memory aliasing for speed and correctness這一節中，提到了theano有自已管理記憶體的機制(pool)，而theano會管理pool中變數之變動。

theano的pool中的變數與python的變數位於不同的memory space，因此不會互相衝突
Theano functions only modify buffers that are in Theano’s memory space.
Theano's memory space includes the buffers allocated to store shared variables and the temporaries used to evaluate functions.
Physically, Theano's memory space may be spread across the host, a GPU device(s), and in the future may even include objects on a remote machine.
The memory allocated for a shared variable buffer is unique: it is never aliased to anothershared variable.
Theano's managed memory is constant while Theano functions are not running and Theano's library code is not running.
The default behaviour of a function is to return user-space values for outputs, and to expect user-space values for inputs.

The distinction between Theano-managed memory and user-managed memory can be broken down by some Theano functions (e.g. shared, get_value and the constructors for In and Out) by using aborrow=True flag. This can make those methods faster (by avoiding copy operations) at the expense of risking subtle bugs in the overall program (by aliasing memory).

Theano gpu setting

根據官方網站的設定：

http://deeplearning.net/software/theano/library/config.html#libdoc-config

如果要改用gpu而不是使用cpu來計算函數，必須在import theano之前就先設定，方法有兩種：

在$HOME/.theanorc中設定
在環境變數THEANO_FLAGS中設定

而在eclipse的開發環境中，如果要對不同的檔案使用不同的設定，使用方法2較有彈性，執行設定方法如下：

切換到要執行的檔案，選擇上方的Run->Run configrations->Environment->New，然後Name中填入THEANO_FLAGS，values中填入floatX=float32,device=gpu後, 按下方的apply
因為eclipse似乎無法正確讀入環境變數中的PATH設定，因此要在同一畫面中加入name=PATH, values=usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda/bin:/usr/local/cuda/bin (依cuda安裝位置而定).

之後在程式當中即可正確使用gpu來計算。

也可用print theano.config來確定設定正確。

2011年10月22日

Linux Filesystem Hierarchy Standard

在網路上看到有高手整理好的FHS結構圖，非常的好用：

關於資料夾的相關解說，也可以在FHS的網站找到詳細說明。

2011年10月7日

IPv6格式整理

最後在學校的電腦突然出現了IPv6的IP，不過一些網路服務原本是以IPv4的設定為主，改到IPv6後會有問題，於是決定一次把問題整理清楚。
IPV6 位址共128bytes，分成8 區以:隔開格式如下
2001:0288:75d8:0000:0222:15ff:fe5b:4525

其中每一個數字均為16進位(4位元)，每一區為16 位元
每一區塊前置的0 可以省略，相連的0:0:0 等不管幾個相連都可以簡寫為::
但::的寫法一個位址只能有一個(以相連的0 最多的優先)
- 001:0db8:85a3 :0000:1319:8a2e:0370:7344等同於2001:0db8:85a3::1319:8a2e:0370:7344
- 2001:0DB8 :0000:0000:0000:0000 :1428:57ab等同於2001:0DB8::1428:57ab
::1 代表 localhost(lo)
:: 代表any(相當於ipv4 的0.0.0.0
ff0 開頭的位址為多點位址
ff02::1 一般電腦的廣播位址
ff02::2 路由器廣播位址
- ping6 –I eth1 ff01::1 ping 一般PC 有那些
- ping6 –I eth1 ff02::1 ping 路由器有那些

IPv6位址的類型
IPv6有Unicast、Multicast 和 Anycast 三種類型。 IPv6不再使用IPv4的廣播(Broadcast)方式來通信，而是使用Multicast或者Anycast替代廣播。
而IPv6 的 Unicast 如同 IPv4 的 Unicast 傳送模式，用在單一節點對單一節點的資料傳送。 Unicast有下列型態：

Global(Scope:global)： Global的IPv6 位址，就如同 IPv4 的公開位址(Public Address)，在全世界具有唯一性，其它節點不會有相同的位址。
Link- Local(Scope:link)：位址僅用在單一個連結上 (同一個子網路中)，不可被繞送到其他連結或網際網路上。它的功用如同 IPv4 的 APIPA 位址(169.254.X.X)，僅在一個特定的網路區段使用，這類位址的封包不能通過路由器。
Site-Local(Scope:site)：位址可以跨連結，在網點間繞送，但不可被繞送到網際網路。Link-Local及Site-Local位址的概念就像是IPv4中的私有位址，對主機間及路由器間自動建立暫時性的通信非常有用。

Link-local單點位址(Scope:link):
主機平台的 IPv6 有啟用，則每一網路介面都會自動配得一個 Link-local 單點位址，此位址以fe80::/64 開頭。此ip 位址可由網卡算出。
例如某一網卡mac 位址為00-0F-EA-41-59-47，最左邊一定是00，寫成2 進位變成0000 0000，再將由左邊算起第7 個bit(稱為global bit 一定是0)改成1，於是00 就會變成02，於是變成02:0f:ea:41:59:47。再將mac 位址拆成2 半,中間加入fffe變成020f:eaff:fe41:5947，於是ipv6 位址就會變成fe80::20f:eaff:fe41:5947，即為網卡的local 單點位址。

IPv6的首碼
在 IPv6 位址的 128 bits 之中，前幾個 bits 為首碼。另一種常見的 IPv6 位址表示法是「IPv6 位址 / 首碼長度」。至於首碼長度到底是多少 bit 呢?這必須視位址是屬於Unicast、Multicast 或 Anycast而定。
Unicast的首碼及位址格式

Global的前 3 bits 為首碼，內容固定是「001」。最後的 64 bits 為 Interface ID。Interface ID 的功用如同 IPv4 的 Host ID 。
Site-Local的前 10 bits 為首碼，內容固定為「1111111011」, 間隔 38 bits 的 0 之後, 接著 16 bits 的「子網路位址」( Subnet ID )，最後才是 64 bits 的介面位址。
因為這種位址的前 10 bits 之後的 6 bit固定補0，所以整個16bit是「1111111011000000」，以十六進位來表示，就是FEC0，因此有人就說 Site-Local IPv6 位址的首碼為FEC0。這種說法是額外多加入六個bit的0！因為實際首碼只有 10 bits。但就結果來看並沒有錯，且加入6bit以後，用FEC0來表示首碼，要比用1111111011清楚許多。
Link-Local也是用前 10 bits 為首碼，內容固定為「1111111010」，接著是連續 54 bits 的 0，最後的 64 bits 也是介面位址。如同在 Site-Local 一樣，Link-Local IPv6 的首碼暫且可以用 FE80來表示。

Multicast的首碼及位址格式

Multicast前 8 bits 為首碼，內容為「11111111」，最後 112 bits 為「群組位址」

Anycast的首碼及位址格式

Anycast的首碼長度不是固定長度，首碼之外均為 0。

訂閱：文章 (Atom)