Search This Blog

Dec 26, 2007

Got fever

Got fever, have injection on hospital. Still feel no comfortable.

Dec 20, 2007

Arp Virus

We can not browse the web normally, it will pop up a window to show "The Earn the QQ...". The web page has beed embeded a link to another website. I have check it and found there mighe be Arp virus. use arp -a and found gateway MAC address is floating. Download the tools to scan whole company computer, and found our sale's computer MAC address match the result. Asked sale man disconect lan. And we can browse the web OK.

Dec 19, 2007

simple_idct_armv5te.s

Today debug why armv5te does not work. And finally found that Multi compile the following code, It seems multi don't know #if 0.
;#if 0
; mov v1, #(1<<(COL_SHIFT-1))
; smlabt v2, ip, a4, v1 ;/* v2 = W4*col[1] + (1<<(COL_SHIF1)) */
; smlabb v1, ip, a4, v1 ;/* v1 = W4*col[0] + (1<<(COL_SHIF1)) */
; ldr a4, [a1, #(16*4)]
;#else
It works after comment the code, and I have tested the performance between simple_idct_arm.s, the put and add function speed increase 90%. However, I put armv5te code into whole project, the performance only improve 7%. Why this happens? The test code run in OCRAM, the project code run in SDRAM, there should have many memory access stall. So I mighe need consider to move decode_slice MB into OCRAM.

Dec 13, 2007

Currently Yuv2RGB convert

Since I think the bottomnect is decode part. I will no put more attention on YUV2RGB convert. FFMPEG yuv2rgb is too slow, so I write one, it is faster than FFMPEG.

void yuv_convert_rgb(AVPicture *dst, const AVPicture *src,
int width, int height)
{
const uint8_t *y1_ptr, *y2_ptr, *cb_ptr, *cr_ptr;
uint8_t *d, *d1, *d2;
int w, y, width2;
int v1,uv,u2;
int dst_linesize,src_linesize,src_uvlinesize;

d = dst->data[0];
y1_ptr = src->data[0];
cb_ptr = src->data[1];
cr_ptr = src->data[2];
width2 = (width + 1) >> 1;
dst_linesize = dst->linesize[0];
src_linesize = src->linesize[0];
src_uvlinesize = src_linesize >> 1;
for(;height >= 2; height -= 2) {
d1 = d;
d2 = d + dst_linesize;
y2_ptr = y1_ptr + src_linesize;
for(w = width; w >= 2; w -= 2) {
v1 = a1[cr_ptr[0]];
u2 = a4[cb_ptr[0]];
uv = a2[cr_ptr[0]] + a3[cb_ptr[0]];

((uint32_t *)(d1))[0] = ((((y1_ptr[0] + v1) >> 3) << 11) | (((y1_ptr[0] - uv) >> 2) << 5) | ((y1_ptr[0] + u2) >> 3))
|(((((y1_ptr[1] + v1) >> 3) << 11) | (((y1_ptr[1] - uv) >> 2) << 5) | ((y1_ptr[1] + u2) >> 3)) << 16);

((uint32_t *)(d2))[0] = ((((y2_ptr[0] + v1) >> 3) << 11) | (((y2_ptr[0] - uv) >> 2) << 5) | ((y2_ptr[0] + u2) >> 3))
|(((((y2_ptr[1] + v1) >> 3) << 11) | (((y2_ptr[1] - uv) >> 2) << 5) | ((y2_ptr[1] + u2) >> 3)) << 16);

d1 += 2 * 2;
d2 += 2 * 2;

y1_ptr += 2;
y2_ptr += 2;
cb_ptr++;
cr_ptr++;
}
d += (dst_linesize<<1);
y1_ptr += (src_linesize<<1) - width;
cb_ptr += src_uvlinesize - width2;
cr_ptr += src_uvlinesize - width2;
}
}

The time cost.

Currently, decode frame cost 37ms, yuv2rgb cost 13ms, lcd display cost 14ms. However, LCD display through DMA, the time can overlap with decode time. We need consider how to cut down decode time. I might need add profile to check which part cost time too much.

Dec 12, 2007

YUV2RGB 快速转换(转发)

lcd显示器是多少位色的啊?俺不知道,人家说都用16位色(rgb565),那俺就学学这个怎么转换来着。如果用公式的话,又是乘法,又是饱和,又是移位,又是或的,会把人累死的:(.不过SDL里有个查表的算法不错,分析一下。
下面是一个表的初始化。
swdata->pixels = (Uint8 *) malloc(width*height*2);
swdata->colortab = (int *)malloc(4*256*sizeof(int));
Cr_r_tab = &swdata->colortab[0*256];
Cr_g_tab = &swdata->colortab[1*256];
Cb_g_tab = &swdata->colortab[2*256];
Cb_b_tab = &swdata->colortab[3*256];
swdata->rgb_2_pix = (Uint32 *)malloc(3*768*sizeof(Uint32));
r_2_pix_alloc = &swdata->rgb_2_pix[0*768];
g_2_pix_alloc = &swdata->rgb_2_pix[1*768];
b_2_pix_alloc = &swdata->rgb_2_pix[2*768];
for (i=0; i<256; i++) {
/* 这里的一个表是为乘法做的一个表*/
CB = CR = (i-128);
Cr_r_tab[i] = (int) ( (0.419/0.299) * CR);
Cr_g_tab[i] = (int) (-(0.299/0.419) * CR);
Cb_g_tab[i] = (int) (-(0.114/0.331) * CB);
Cb_b_tab[i] = (int) ( (0.587/0.331) * CB);
}
Rmask = display->format->Rmask;
Gmask = display->format->Gmask;
Bmask = display->format->Bmask;
for ( i=0; i<256; ++i ) {
/*这个表是为饱和做的,并且已经做好了移位,到查表的时候只要将这几个rgb的值或起来即可,r被饱和到 0~0xf800之间(高5位有值),g被饱和到0~0x07e0之间(中间6位有值),b被饱和到0~0x001f之间(低5位有 值)*/
r_2_pix_alloc[i+256] = i >> (8 - number_of_bits_set(Rmask));
r_2_pix_alloc[i+256] <<= free_bits_at_bottom(Rmask);
g_2_pix_alloc[i+256] = i >> (8 - number_of_bits_set(Gmask));
g_2_pix_alloc[i+256] <<= free_bits_at_bottom(Gmask);
b_2_pix_alloc[i+256] = i >> (8 - number_of_bits_set(Bmask));
b_2_pix_alloc[i+256] <<= free_bits_at_bottom(Bmask);
}
下面是使用部分代码,也就是查表过程
/*前面那个768的系数其实是算在第二个表的偏移上的,因为y部分数据要使用四次,所以提前到cb,cr里面*/
cr_r = 0*768+256 + colortab[ *cr + 0*256 ];
crb_g = 1*768+256 + colortab[ *cr + 1*256 ] + colortab[ *cb + 2*256 ];
cb_b = 2*768+256 + colortab[ *cb + 3*256 ];
++cr; ++cb;
L = *lum++;/*将3个值或起来,构成rgb565,没什么好说的。*/
*row1++ = (rgb_2_pix[ L + cr_r ]|rgb_2_pix[ L + crb_g ]|rgb_2_pix[ L + cb_b ]);

创建blog

在香港公司建立blog,等回去看看能否登录