Recent changes to this wiki:
Add q3a + ogt shaders news.
diff --git a/index.mdwn b/index.mdwn index 8799d58..d16e142 100644 --- a/index.mdwn +++ b/index.mdwn @@ -7,6 +7,7 @@ The aim of this driver and others such as [freedreno](http://freedreno.github.co ## News === +* 2013-03-18: Q3A now runs with open source generated shaders! Read all about it [at libvs blog](http://libv.livejournal.com/24402.html) * 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). * 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz.
diff --git a/index.mdwn b/index.mdwn index 89e6fde..8799d58 100644 --- a/index.mdwn +++ b/index.mdwn @@ -65,6 +65,7 @@ Lima Documents * [[Fragment+Shader+Backend]] * [[Render State]] * [[Texel Formats]] +* [[Compiling Q3A Shaders]] ## Contribute ===
diff --git a/Compiling_Q3A_Shaders.mdwn b/Compiling_Q3A_Shaders.mdwn new file mode 100644 index 0000000..422a599 --- /dev/null +++ b/Compiling_Q3A_Shaders.mdwn @@ -0,0 +1,21 @@ +This page explains how to use open-gpu-tools to generate the required shaders for the limare port of Quake 3 Arena to run without using the binary compiler. The shaders have been hand-converted from ESSL (the input of the binary compiler) to a custom assembly/IR, and so some playing around/learning/reading the source is necessary in order to understand how the shaders work. + +## Setting up open-gpu-tools + +Clone [my open-gpu-tools tree](https://gitorious.org/~cwabbott/open-gpu-tools/cwabbotts-open-gpu-tools), and switch to the ir branch. Compile libcommon.so by cd'ing to the common directory and running make. Same thing with ir_tools and assemble. + +## Fragment shaders + +The fragment shaders are written in assembly, meaning that you have to use the use the assemble tool to generate a working MBS file. To assemble a shader ~/my_shader.in into an mbs file ~/my_shader.mbs, from the assemble directory do: + + ./assemble -a lima_pp -s verbose -t fragment -o ~/my_shader.mbs ~/my_shader.in + +## Vertex shaders + +The vertex shaders are compiled from gp_ir, meaning you need to use the ir_tools to compile it to MBS. To parse an input shader ~/my_shader.in into a binary gp_ir file ~/my_shader.ir, from the ir_tool directory do: + + ./ir_parse -i lima_gp_ir -o ~/my_shader.ir ~/my_shader.in + +And to compile that to MBS, do: + + ./ir_lower -i lima_gp_ir -a lima_gp -f mbs -o ~/my_shader.mbs ~/my_shader.ir
Move into setting up X
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index e29580e..4c8b0c3 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -193,5 +193,42 @@ locale-gen en_US.UTF-8 </pre> or by whichever locale is listed as LANG when running locale. +# Setting up X + +Create /etc/X11/xorg.conf with the following content: +<pre> +Section "Device" + identifier "FBDEV" + Driver "fbdev" + Option "fbdev" "/dev/fb6" +EndSection + +Section "Screen" + identifier "Default Screen" + Device "FBDEV" + DefaultDepth 16 +EndSection +</pre> + +You can now start the display manager: +<pre> +lightdm& +</pre> + +I haven't yet figured out how the strange exynos fb drivers can be coaxed into doing 24 bit colour. + # Mali binaries +Install es2gears and es2_info through: + +<pre> +apt-get install mesa-utils-extra +</pre> + +This will drag in the full mesa, which includes an openGLESv2 lib, which we really do not need. + +<pre> +mv /usr/lib/arm-linux-gnueabihf/mesa-egl /usr/lib/arm-linux-gnueabihf/.mesa-egl +</pre> + +Nasty, but works.
Add random fluff for getting ALIP running.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn
index 8fb3b3f..e29580e 100644
--- a/OdroidSetup.mdwn
+++ b/OdroidSetup.mdwn
@@ -138,4 +138,60 @@ Then run the following to create boot.scr, the file that u-boot looks for:
mkimage -A arm -O linux -T script -C none -a 0 -e 0 -n "BOOT Script for ODROID-X2" -d boot.txt boot.scr
</pre>
+# First boot setup
+
+I experienced some resolver issues, as apparently the dhcpd nameserver info was not passed on properly (networkmanager?) So i added the following to /etc/resolv.conf to manually override things
+
+<pre>
+nameserver 192.168.x.x
+</pre>
+
+I then went on to install the most important package for any network connected device:
+<pre>
+apt-get update
+apt-get install openssh-server
+</pre>
+
+I could then ssh into the device and start changing some things.
+<pre>
+sudo -s
+passwd
+</pre>
+
+Now you can just ssh in as root.
+
+<pre>
+echo "odroid" > /etc/hostname
+</pre>
+
+Log out and in again to see this take effect.
+
+Then drop the linaro user and add your own. Make sure it is added to the video group.
+<pre>
+userdel -r linaro
+adduser user
+adduser user video
+</pre>
+
+You will see loads of locale issues when running any apt things:
+<pre>
+perl: warning: Setting locale failed.
+perl: warning: Please check that your locale settings:
+ LANGUAGE = (unset),
+ LC_ALL = (unset),
+ LANG = "en_US.UTF-8"
+ are supported and installed on your system.
+perl: warning: Falling back to the standard locale ("C").
+locale: Cannot set LC_CTYPE to default locale: No such file or directory
+locale: Cannot set LC_MESSAGES to default locale: No such file or directory
+locale: Cannot set LC_ALL to default locale: No such file or directory
+</pre>
+
+You can fix these by running:
+<pre>
+locale-gen en_US.UTF-8
+</pre>
+or by whichever locale is listed as LANG when running locale.
+
# Mali binaries
+
Add boot.scr creation information, and make the boot partition vfat.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn
index 475aa35..8fb3b3f 100644
--- a/OdroidSetup.mdwn
+++ b/OdroidSetup.mdwn
@@ -29,17 +29,18 @@ I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x834d1732
Device Boot Start End Blocks Id System
-/dev/mmcblk0p1 3072 527359 262144 83 Linux
+/dev/mmcblk0p1 3072 527359 262144 b W95 FAT32
/dev/mmcblk0p2 527360 31116287 15294464 83 Linux
</pre>
-The important thing to note is that the first partition should start at 3072, as the space underneath is used by the u-boot and trustedzone binaries. It also might pay to provide a separate boot partition, with kernel images and u-boot script files. Apart from that, you are free to partition as you like, as long as you update u-boot script accordingly.
+The important thing to note is that the first partition should start at 3072, as the space underneath is used by the u-boot and trustedzone binaries, and it should be a FAT based boot partition. Apart from that, you are free to partition as you like, as long as you update u-boot script accordingly.
Note that this for a 16GB card, actual offsets and sizes might look different for you. In this setup, 256MB was reserved for the boot partition, and the remainder was given for one big root filesystem.
Now format all partitions:
<pre>
-mkfs.ext3 /dev/mmcblkX
+mkfs.vfat /dev/mmcblkXp1
+mkfs.ext3 /dev/mmcblkXp2
</pre>
# U-boot setup Pt.1
@@ -125,4 +126,16 @@ cp arch/arm/boot/zImage PATH_TO_BOOTFS
</pre>
# U-boot setup Pt.2
+Now we create a file called boot.txt in our boot partition, and it should contain the following:
+<pre>
+setenv bootargs 'root=/dev/mmcblk0p2 rw rootwait console=tty0 console=ttySAC1,115200n8 mem=2047M'
+ext2load mmc 0:1 0x40008000 zImage
+bootm 0x40008000
+</pre>
+
+Then run the following to create boot.scr, the file that u-boot looks for:
+<pre>
+mkimage -A arm -O linux -T script -C none -a 0 -e 0 -n "BOOT Script for ODROID-X2" -d boot.txt boot.scr
+</pre>
+
# Mali binaries
Add line for installing zImage.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index c3f370b..475aa35 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -119,6 +119,10 @@ Once that's done, run: make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/PATH_TO_ROOTFS/ modules_install </pre> +You can now copy the kernel image to the boot partition: +<pre> +cp arch/arm/boot/zImage PATH_TO_BOOTFS +</pre> # U-boot setup Pt.2 # Mali binaries
Add module installation.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index 41b5f15..c3f370b 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -114,6 +114,11 @@ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- -j5 zImage modules Now go and make some tea :) +Once that's done, run: +<pre> +make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=/PATH_TO_ROOTFS/ modules_install +</pre> + # U-boot setup Pt.2 # Mali binaries
Add first part of kernel build info.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index a272e2c..41b5f15 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -73,6 +73,47 @@ Pick a root filesystem laid out for arm hardfloat. My current preference is a Li # Kernel build +First, you need a clone of the odroid kernel. + +You can either clone an existing kernel tree, and then fetch the odroid one on top: + +<pre> +git clone /home/user/kernel/linux-2.6/ kernel +cd kernel/ +git remote rm origin +git remote add origin https://github.com/hardkernel/linux.git +git fetch +git checkout odroid-3.0.y +</pre> + +Or you can just make a quick copy of the top level tree, without downloading a full (and huge) git repository. + +<pre> +git clone --depth 1 https://github.com/hardkernel/linux.git -b odroid-3.0.y kernel +</pre> + +Make sure that your cross toolchain is in your path. + +You can now select one of many odroid machine targets, although i personally find "ubuntu" very shortsighted: + +<pre> +ls arch/arm/configs/odroid*ubuntu* +</pre> + +Here, we pick the odroid-x2 with mali enabled: + +<pre> +make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- odroidx2_ubuntu_mali_defconfig +</pre> + +After this we can build our kernel: + +<pre> +make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- -j5 zImage modules +</pre> + +Now go and make some tea :) + # U-boot setup Pt.2 # Mali binaries
Add rootfs description.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index a61288f..a272e2c 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -1,5 +1,7 @@ This document gathers all the necessary info to set up an SD-Card with a bootable gnu/linux, with your own kernel and with mali binaries installed. +Order your odroid with the uart module. As with any ARM device, serial is indispensable for debugging any boot failures. Make sure that you are using a recent enough driver for cp210x, this module only became useful after linux kernel 3.2, but the fixes to this specific module can easily be backported. Ask libv on irc for more info if you need this. + # SD-Card First off, clear out the first bits of the SD-Card for sanity's sake: @@ -35,10 +37,42 @@ The important thing to note is that the first partition should start at 3072, as Note that this for a 16GB card, actual offsets and sizes might look different for you. In this setup, 256MB was reserved for the boot partition, and the remainder was given for one big root filesystem. -# U-boot setup +Now format all partitions: +<pre> +mkfs.ext3 /dev/mmcblkX +</pre> + +# U-boot setup Pt.1 + +Samsung does currently not provide sources with its build of u-boot, so both Samsung and Hardkernel are violating the GPL. + +You can download a tarball with all the u-boot binaries from [here](http://www.mdrjr.net/odroid/mirror/BSPs/Alpha4/unpacked/boot.tar.gz) + +Untar this: + +<pre> +tar -zxvf boot.tar.gz +</pre> + +Then make the script in there executable: +<pre> +chmod +x sd_fusing.sh +</pre> + +And now make this script install all the blobs to your SD-Card: + +<pre> +./sd_fusing.sh /dev/mmcblkX +</pre> + +After that, your SD-Card should be bootable, if you have a uart, you should be able to see U-boot attempting to load already. + +# Root filesystem + +Pick a root filesystem laid out for arm hardfloat. My current preference is a Linaro ALIP style image, with lightdm and xfce. It can be downloaded [here](https://snapshots.linaro.org/quantal/images/alip). Once you have downloaded it, you can simply untar it in the root partition of your sd-card. After untarring, you need to move everything in the binary directory one level up. This is there to protect people from overwriting their main filesystem. Do not forget to remove SHA256SUMS :) # Kernel build -# Root fs +# U-boot setup Pt.2 # Mali binaries
Fill out SD-Card section.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn index 33d63c2..a61288f 100644 --- a/OdroidSetup.mdwn +++ b/OdroidSetup.mdwn @@ -2,8 +2,43 @@ This document gathers all the necessary info to set up an SD-Card with a bootabl # SD-Card +First off, clear out the first bits of the SD-Card for sanity's sake: + +<code> +dd if=/dev/zero of=/dev/mmcblkX bs=1M count=5 +</code> + +Then set up some partitions on the SD-Card: + +<code> +fdisk /dev/mmcblkX +</code> + +And work it until it looks somewhat like this: + +<pre> +Command (m for help): p + +Disk /dev/mmcblk0: 15.9 GB, 15931539456 bytes +4 heads, 16 sectors/track, 486192 cylinders, total 31116288 sectors +Units = sectors of 1 * 512 = 512 bytes +Sector size (logical/physical): 512 bytes / 512 bytes +I/O size (minimum/optimal): 512 bytes / 512 bytes +Disk identifier: 0x834d1732 + + Device Boot Start End Blocks Id System +/dev/mmcblk0p1 3072 527359 262144 83 Linux +/dev/mmcblk0p2 527360 31116287 15294464 83 Linux +</pre> + +The important thing to note is that the first partition should start at 3072, as the space underneath is used by the u-boot and trustedzone binaries. It also might pay to provide a separate boot partition, with kernel images and u-boot script files. Apart from that, you are free to partition as you like, as long as you update u-boot script accordingly. + +Note that this for a 16GB card, actual offsets and sizes might look different for you. In this setup, 256MB was reserved for the boot partition, and the remainder was given for one big root filesystem. + # U-boot setup # Kernel build +# Root fs + # Mali binaries
Initial structure.
diff --git a/OdroidSetup.mdwn b/OdroidSetup.mdwn new file mode 100644 index 0000000..33d63c2 --- /dev/null +++ b/OdroidSetup.mdwn @@ -0,0 +1,9 @@ +This document gathers all the necessary info to set up an SD-Card with a bootable gnu/linux, with your own kernel and with mali binaries installed. + +# SD-Card + +# U-boot setup + +# Kernel build + +# Mali binaries
Fill out ODROID section. Links to hardkernel are deliberately not put in place, hardkernel does not wish to support the lima driver project.
diff --git a/Devices.mdwn b/Devices.mdwn index c941665..2b612d9 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -35,13 +35,19 @@ According to the [spec sheet](http://www.pointofview-online.com/showroom.php?sho # Exynos 4 (**GPL VIOLATOR**) +These SoCs are the best performing Mali-400 devices out there. They are proper speed-daemons. The exynos 42xx series has a dual A9, whereas the exynos 44xx series has a quad A9. All come with a Mali-400MP4. + All exynos 4 devices come with binary only u-boot. This means that Samsung, and its device makers, are violating the GPL. ## Origen Board (**GPL VIOLATOR**) ## ODROIDs (**GPL VIOLATOR**) +The Odroids are small developer boards with many possible connections. The Odroid-x2/u2 is hyperfast, as it can clock the 4 A9s to 2GHz, and the Mali-400MP4 can clock up to 640MHz. This makes for a nice high-end benchmarker, and a good comparison for the comparatively meek A10. + +Hardkernel tries to portray itself as open source friendly, but they have a lot to learn still. They are providing some sort of crazy android and ubuntu pre-made SD-card images, and even hand out Mali binaries for ubuntu. Hardkernel knows the pain of getting the Mali binaries built and integrated, yet they are not interested in cooperating with our project. They officially claim to have "community based" support, and we all know what that means. Since these devices are developer boards, they have a much longer life span than your average mobile phone or tablet. In the mid to long term, hardkernel, well, hardkernels customers, will end up depending on the support of the lima driver project. +Here is [[some_information|OdroidSetup]] on how to set up your own SD card with a custom built kernel and with mali binaries (which we need for reverse engineering). ## Samsung Galaxy S II (**GPL VIOLATOR**)
Expand allwinner section and mark samsung as a gpl violator.
diff --git a/Devices.mdwn b/Devices.mdwn index da1b79a..c941665 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -4,11 +4,15 @@ Be careful where you buy, most cheap shops will not ship from your country but w # AllWinner A10 -The allwinner A10 and A13 SoCs are currently the easiest and best supported targets for developing an open source driver for the ARM Mali. There is a very active open source community, called [linux-sunxi](http://linux-sunxi.org), to support these SoCs, and device support is growing rapidly. +The allwinner A10 and A13 SoCs are currently the easiest and best supported targets for developing an open source driver for the ARM Mali. -## Cubieboard (Open Source Hardware!) +These devices are a Cortex A8 capable of clocking little over 1GHz, comes with lots of expansion possibilities, even SATA. It features a Mali-400MP1, so it is not a stellar performer, but it more than makes up for that in availability and price, and openness. Allwinner itself is not directly supporting open source software, and would be a GPL violator in itself. But luckily, their lack of control on their device makers made the necessary code fall out through the cracks, and they are the most compliant of any chinese SoC maker today. -The [Cubieboard](http://cubieboard.org) comes with 512 or 1024 MB of DDR3 RAM, 4 GB of NAND flash storage, a microSD card slot, Fast Ethernet, USB host ports, a SATA port, HDMI output and can be had for as low as 49 USD. As of December 2012, it is currently only available for pre-order. +There is a very active open source community, called [linux-sunxi](http://linux-sunxi.org), to support these SoCs, and device support is growing rapidly. Check out [the main linux-sunxi page](http://linux-sunxi.org/Main_Page) to find out about the supported, and sometimes even fully open source, hardware available. + +## Cubieboard (**Open Source Hardware!**) + +The [Cubieboard](http://cubieboard.org) comes with 512 or 1024 MB of DDR3 RAM, 4 GB of NAND flash storage, a microSD card slot, Fast Ethernet, USB host ports, a SATA port, HDMI output and can be had for as low as 49 USD. ## Gooseberry @@ -18,8 +22,6 @@ The [Gooseberry](http://gooseberry.atspace.co.uk/) board is actually a tablet bo The [Hackberry](https://www.miniand.com/products/Hackberry%20A10%20Developer%20Board) development board comes with 1 GB of DDR3 RAM, 4 GB of NAND flash storage, a full-size SDHC card slot, Fast Ethernet, USB host ports, built-in 802.11n Wi-Fi, HDMI output and can be had for 65 USD. -## Mele A1000 - # AMLogic 8726-M (Mali 400) ## Zenithink ZT-280 (**GPL VIOLATOR**) @@ -31,15 +33,19 @@ The ZT-280 range includes the C71, a 7" tablet with a capacitive display. Can be According to the [spec sheet](http://www.pointofview-online.com/showroom.php?shop_mode=product_detail&product_id=308) provided by its manufacturer/reseller, the ProTab 2XXL features a Mali-400 GPU. This tablet features a 10" capacitive touch-screen, and is very competetively priced - it retails for [about EUR 170](http://geizhals.eu/713232). Point of View publishes "Firmware Updates" in its somewhat chaotic [download area](http://downloads.pointofview-online.com/Drivers/), but there's no source code in sight anywhere. -# Exynos 4 +# Exynos 4 (**GPL VIOLATOR**) + +All exynos 4 devices come with binary only u-boot. This means that Samsung, and its device makers, are violating the GPL. + +## Origen Board (**GPL VIOLATOR**) + +## ODROIDs (**GPL VIOLATOR**) -## Origen Board -## ODROID -## Samsung Galaxy S II +## Samsung Galaxy S II (**GPL VIOLATOR**) -## Samsung Galaxy S III +## Samsung Galaxy S III (**GPL VIOLATOR**) # Exynos 5
Add link to linux-sunxi, and list allwinner first. It is our prime target today.
diff --git a/Devices.mdwn b/Devices.mdwn index 5f4e858..da1b79a 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -2,20 +2,11 @@ This page lists some of the available devices with a Mali GPU, together with som Be careful where you buy, most cheap shops will not ship from your country but will ship from China. This means that you might end up paying customs, and end up wasting some time at the customs office. -# AMLogic 8726-M (Mali 400) - -## Zenithink ZT-280 (**GPL VIOLATOR**) - -The ZT-280 range includes the C71, a 7" tablet with a capacitive display. Can be had for under EUR 100 these days, but add customs and postage to that. - - -## Point of View ProTab 2XXL (**GPL VIOLATOR**) - -According to the [spec sheet](http://www.pointofview-online.com/showroom.php?shop_mode=product_detail&product_id=308) provided by its manufacturer/reseller, the ProTab 2XXL features a Mali-400 GPU. This tablet features a 10" capacitive touch-screen, and is very competetively priced - it retails for [about EUR 170](http://geizhals.eu/713232). Point of View publishes "Firmware Updates" in its somewhat chaotic [download area](http://downloads.pointofview-online.com/Drivers/), but there's no source code in sight anywhere. - # AllWinner A10 -## Cubieboard +The allwinner A10 and A13 SoCs are currently the easiest and best supported targets for developing an open source driver for the ARM Mali. There is a very active open source community, called [linux-sunxi](http://linux-sunxi.org), to support these SoCs, and device support is growing rapidly. + +## Cubieboard (Open Source Hardware!) The [Cubieboard](http://cubieboard.org) comes with 512 or 1024 MB of DDR3 RAM, 4 GB of NAND flash storage, a microSD card slot, Fast Ethernet, USB host ports, a SATA port, HDMI output and can be had for as low as 49 USD. As of December 2012, it is currently only available for pre-order. @@ -29,6 +20,17 @@ The [Hackberry](https://www.miniand.com/products/Hackberry%20A10%20Developer%20B ## Mele A1000 +# AMLogic 8726-M (Mali 400) + +## Zenithink ZT-280 (**GPL VIOLATOR**) + +The ZT-280 range includes the C71, a 7" tablet with a capacitive display. Can be had for under EUR 100 these days, but add customs and postage to that. + + +## Point of View ProTab 2XXL (**GPL VIOLATOR**) + +According to the [spec sheet](http://www.pointofview-online.com/showroom.php?shop_mode=product_detail&product_id=308) provided by its manufacturer/reseller, the ProTab 2XXL features a Mali-400 GPU. This tablet features a 10" capacitive touch-screen, and is very competetively priced - it retails for [about EUR 170](http://geizhals.eu/713232). Point of View publishes "Firmware Updates" in its somewhat chaotic [download area](http://downloads.pointofview-online.com/Drivers/), but there's no source code in sight anywhere. + # Exynos 4 ## Origen Board
Carlos is not an anonymous user, but just a useless user.
This reverts commit 71a2ca94db7a71c268455a744d2708bb090ed774
This reverts commit 71a2ca94db7a71c268455a744d2708bb090ed774
diff --git a/index.mdwn b/index.mdwn index 150cc4f..89e6fde 100644 --- a/index.mdwn +++ b/index.mdwn @@ -3,7 +3,7 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs. -The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary xcvdrivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. +The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. ## News ===
Carlos is not an anonymous user, but just a useless user.
This reverts commit da881376fedc3e8ed8dcc4377b62f6ae656a643e
This reverts commit da881376fedc3e8ed8dcc4377b62f6ae656a643e
diff --git a/index.mdwn b/index.mdwn index 32fedd9..150cc4f 100644 --- a/index.mdwn +++ b/index.mdwn @@ -3,7 +3,7 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs. -The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. +The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary xcvdrivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. ## News === @@ -82,7 +82,3 @@ Please subscribe to our [mailinglist](http://vlists.pepperfish.net/cgi-bin/mailm === -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER
Carlos is not an anonymous user, but just a useless user.
This reverts commit f80dab29f9403a8e1c6da8028d61db2c71a938b8
This reverts commit f80dab29f9403a8e1c6da8028d61db2c71a938b8
diff --git a/index.mdwn b/index.mdwn index 246f68c..32fedd9 100644 --- a/index.mdwn +++ b/index.mdwn @@ -5,27 +5,11 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER - -=== ## News === * 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). * 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER - -=== * 2012-05-27: Linuxtag talk slides and a separate demo of limare was posted on [phoronix](http://www.phoronix.com/scan.php?page=news_item&px=MTEwODA). * 2012-05-26: Lima talk at [Linuxtag Berlin](http://www.linuxtag.org/2012/de/program/program/vortragsdetails.html?no_cache=1&talkid=481): Textured, lighted portal cube, spins away correctly [(full video)](http://blip.tv/opensuse/linuxtag2012-lima-liberating-arm-s-mali-gpu-6166702)! * 2012-04-14: Rob Clark announces the [freedreno project](http://bloggingthemonkey.blogspot.co.uk/2012/04/fighting-back-against-binary-blobs.html) inspired by the Lima approach @@ -37,14 +21,6 @@ ANONYMOUS USER * 2012-02-03: First public renders of [smoothed triangle, smoothed strip, smoothed fan, flat quad, triangle quad, smoothed lighted rotated cube](http://limadriver.org/content) * 2012-01-24: A new name has been chosen for the project: remali now becomes Lima! We now have a gitorious project, there is the #lima channel on freenode. A mailing list will be created soon. * 2012-01-23: [Codethink](http://www.codethink.co.uk/) puts out a [press release](http://www.prweb.com/releases/2012/1/prweb9130318.htm) for the business world. This is definitely not vaporware! -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER - -=== * 2012-01-21: Talk appears on [the FOSDEM schedule.](http://fosdem.org/2012/schedule/event/mali "Liberating ARM's Mali GPU")[The cat is out of the bag!](http://twitter.com/#!/codethink/status/160803588929626112) Story published by [phoronix](http://www.phoronix.com/vr.php?view=16971), hits [slashdot](http://linux.slashdot.org/story/12/01/21/0935248/coming-soon-an-open-source-reverse-engineered-mali-gpu-driver), [golem](http://www.golem.de/1201/89274.html), [pro-linux](http://www.pro-linux.de/news/1/17948/freier-treiber-fuer-mali-grafikprozessoren-angekuendigt.html) and [tweakers](http://tweakers.net/nieuws/79485/opensourcedriver-voor-arms-mali-gpu-in-ontwikkeling.html). ## Software @@ -58,14 +34,7 @@ Documentation for the shader compiler, and the initial investigation of the inst === ### [Mali-400](Hardware#Mali-400): -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER -=== * [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) * [Allwinner A10](Hardware#Allwinner+A10) (Mele A1000, MK802) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) @@ -77,14 +46,7 @@ ANONYMOUS USER ## Documentation === -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER -=== The documentation is currently kept in the wiki, pages of interest are: Original (Falanx) datasheets: @@ -106,14 +68,6 @@ Lima Documents ## Contribute === -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER - -=== The Lima driver currently only has some preliminary and highly experimental support. This experimental phase is necessary to gain a full and complete understanding of how the Mali GPUs work. Once more is known, an actual graphics driver (most likely based off of Mesa/Gallium) can be written. There is a lot of interesting work that still needs to be done! @@ -132,5 +86,3 @@ PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS GREETINGS, ANONYMOUS USER - -===
Carlos is not an anonymous user, but just a useless user.
This reverts commit a23a8009ccc54368db33c8c1b96a68c4de3e4d9e
This reverts commit a23a8009ccc54368db33c8c1b96a68c4de3e4d9e
diff --git a/index.mdwn b/index.mdwn index 9a6cd4a..246f68c 100644 --- a/index.mdwn +++ b/index.mdwn @@ -1,11 +1,3 @@ -=== - -PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS - -GREETINGS, -ANONYMOUS USER - -=== # **Lima**: An open source graphics driver for ARM Mali GPUs ===
Carlos is not an anonymous user, but just a useless user.
This reverts commit ae53600143425a7d6bf7abebb26c9e6dc16797ae
This reverts commit ae53600143425a7d6bf7abebb26c9e6dc16797ae
diff --git a/index.mdwn b/index.mdwn index 3573463..9a6cd4a 100644 --- a/index.mdwn +++ b/index.mdwn @@ -23,6 +23,8 @@ ANONYMOUS USER === ## News === +* 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). +* 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. ===
This reverts commit 8927230e9df7a8c63c997baa3bf707108e600842
diff --git a/index.mdwn b/index.mdwn index 9a6cd4a..3573463 100644 --- a/index.mdwn +++ b/index.mdwn @@ -23,8 +23,6 @@ ANONYMOUS USER === ## News === -* 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). -* 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. ===
diff --git a/index.mdwn b/index.mdwn index 246f68c..9a6cd4a 100644 --- a/index.mdwn +++ b/index.mdwn @@ -1,3 +1,11 @@ +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER + +=== # **Lima**: An open source graphics driver for ARM Mali GPUs ===
diff --git a/index.mdwn b/index.mdwn index 32fedd9..246f68c 100644 --- a/index.mdwn +++ b/index.mdwn @@ -5,11 +5,27 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER + +=== ## News === * 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). * 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER + +=== * 2012-05-27: Linuxtag talk slides and a separate demo of limare was posted on [phoronix](http://www.phoronix.com/scan.php?page=news_item&px=MTEwODA). * 2012-05-26: Lima talk at [Linuxtag Berlin](http://www.linuxtag.org/2012/de/program/program/vortragsdetails.html?no_cache=1&talkid=481): Textured, lighted portal cube, spins away correctly [(full video)](http://blip.tv/opensuse/linuxtag2012-lima-liberating-arm-s-mali-gpu-6166702)! * 2012-04-14: Rob Clark announces the [freedreno project](http://bloggingthemonkey.blogspot.co.uk/2012/04/fighting-back-against-binary-blobs.html) inspired by the Lima approach @@ -21,6 +37,14 @@ The aim of this driver and others such as [freedreno](http://freedreno.github.co * 2012-02-03: First public renders of [smoothed triangle, smoothed strip, smoothed fan, flat quad, triangle quad, smoothed lighted rotated cube](http://limadriver.org/content) * 2012-01-24: A new name has been chosen for the project: remali now becomes Lima! We now have a gitorious project, there is the #lima channel on freenode. A mailing list will be created soon. * 2012-01-23: [Codethink](http://www.codethink.co.uk/) puts out a [press release](http://www.prweb.com/releases/2012/1/prweb9130318.htm) for the business world. This is definitely not vaporware! +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER + +=== * 2012-01-21: Talk appears on [the FOSDEM schedule.](http://fosdem.org/2012/schedule/event/mali "Liberating ARM's Mali GPU")[The cat is out of the bag!](http://twitter.com/#!/codethink/status/160803588929626112) Story published by [phoronix](http://www.phoronix.com/vr.php?view=16971), hits [slashdot](http://linux.slashdot.org/story/12/01/21/0935248/coming-soon-an-open-source-reverse-engineered-mali-gpu-driver), [golem](http://www.golem.de/1201/89274.html), [pro-linux](http://www.pro-linux.de/news/1/17948/freier-treiber-fuer-mali-grafikprozessoren-angekuendigt.html) and [tweakers](http://tweakers.net/nieuws/79485/opensourcedriver-voor-arms-mali-gpu-in-ontwikkeling.html). ## Software @@ -34,7 +58,14 @@ Documentation for the shader compiler, and the initial investigation of the inst === ### [Mali-400](Hardware#Mali-400): +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER +=== * [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) * [Allwinner A10](Hardware#Allwinner+A10) (Mele A1000, MK802) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) @@ -46,7 +77,14 @@ Documentation for the shader compiler, and the initial investigation of the inst ## Documentation === +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER +=== The documentation is currently kept in the wiki, pages of interest are: Original (Falanx) datasheets: @@ -68,6 +106,14 @@ Lima Documents ## Contribute === +=== + +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER + +=== The Lima driver currently only has some preliminary and highly experimental support. This experimental phase is necessary to gain a full and complete understanding of how the Mali GPUs work. Once more is known, an actual graphics driver (most likely based off of Mesa/Gallium) can be written. There is a lot of interesting work that still needs to be done! @@ -86,3 +132,5 @@ PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS GREETINGS, ANONYMOUS USER + +===
diff --git a/index.mdwn b/index.mdwn index 150cc4f..32fedd9 100644 --- a/index.mdwn +++ b/index.mdwn @@ -3,7 +3,7 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs. -The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary xcvdrivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. +The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. ## News === @@ -82,3 +82,7 @@ Please subscribe to our [mailinglist](http://vlists.pepperfish.net/cgi-bin/mailm === +PLEASE PROTECT THIS CONTENT FROM ANONYMOUS USERS + +GREETINGS, +ANONYMOUS USER
diff --git a/index.mdwn b/index.mdwn index 89e6fde..150cc4f 100644 --- a/index.mdwn +++ b/index.mdwn @@ -3,7 +3,7 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs. -The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. +The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary xcvdrivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. ## News ===
Add FOSDEM news.
diff --git a/index.mdwn b/index.mdwn index 51d2f5b..89e6fde 100644 --- a/index.mdwn +++ b/index.mdwn @@ -7,6 +7,8 @@ The aim of this driver and others such as [freedreno](http://freedreno.github.co ## News === +* 2013-02-06: Libv blogs about [Quake 3 Arena running on top of the limare prototype driver](http://libv.livejournal.com/23886.html). +* 2013-02-02: Libv talks about [Open ARM GPU drivers](https://fosdem.org/2013/schedule/event/operating_systems_open_arm_gpu/) at FOSDEM, and Connor talks about [his compiler work in the Xorg DevRoom](https://fosdem.org/2013/schedule/event/maliisa/). Public goes wild for Q3A running on limare, and Connor fills the DevRoom to the brim! * 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. * 2012-05-27: Linuxtag talk slides and a separate demo of limare was posted on [phoronix](http://www.phoronix.com/scan.php?page=news_item&px=MTEwODA). * 2012-05-26: Lima talk at [Linuxtag Berlin](http://www.linuxtag.org/2012/de/program/program/vortragsdetails.html?no_cache=1&talkid=481): Textured, lighted portal cube, spins away correctly [(full video)](http://blip.tv/opensuse/linuxtag2012-lima-liberating-arm-s-mali-gpu-6166702)!
add a section on latencies (possibly incomplete?)
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index b1dbb60..2f01bb2 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -434,7 +434,7 @@ Unlike a normal CPU, there are no explicit output registers for the ALU's, nor a
# Temporaries
-Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, read-after-write has a latency of 4 cycles (i.e. a temporary cannot be read until 4 instructions after it is written).
+Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields, which are set to 0.
## Output Transformation
@@ -488,6 +488,10 @@ These are the known inputs:
28-31: Register 0 Output [-1, last instruction] (Register/Attribute)
Note: If attribute_load_en is disabled then the attribute slot can be used to load registers too.
+## Latencies
+
+Temporaries have a latency of 4 instructions, i.e. writes take 4 cycles to appear. Registers have a similar latency of 3 instructions. Writes to address registers 1-3 have a latency of 4 instructions. Writes to address register 0 (temporary store) have no latency though, so it can be set in the same instruction as the temporary store itself. The complex1 operation has a latency of 2 cycles.
+
Instruction format:
0-4: Multiply 0 Input A
remove the codethink logo
diff --git a/index.mdwn b/index.mdwn index fc6f7c6..51d2f5b 100644 --- a/index.mdwn +++ b/index.mdwn @@ -80,5 +80,3 @@ Please subscribe to our [mailinglist](http://vlists.pepperfish.net/cgi-bin/mailm === -<p class="alignright">The Lima driver is sponsored by <a href="http://www.codethink.co.uk/2012/01/23/open-source-graphics-drivers/"><img border="0" src="/codethink.png" alt="Codethink" /> -</a></p>
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index ad7e95b..b1dbb60 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -471,12 +471,13 @@ These are the known inputs:
0-3: Register 0 Output [0, current] (Register/Attribute)
4-7: Register 1 Output [0, current] (Register)
- 8-11: Unknown (Never seen)
+ 8: Unused, same as 21? (seen in m200_hw_workarounds.c nop shader)
+ 9-11: Unknown
12-15: Load Result [0, current] (Uniform/Temporary)
16,17: Accumulator 0,1 Output [-1, last instruction]
18,19: Multiplier 0,1 Output [-1, last instruction]
20: Passthrough Output [-1, last instruction]
- 21: Unused
+ 21: Unused/nop (i.e. this ALU is not used during this instruction)
22: Complex Output [-1, last instruction]
22: Identity/Passthrough (0 for add, 1 for multiply)
Accumulator 0,1 Input 1: add(a, -ident) means pass(a)
Add glAlphaFunc reference value.
diff --git a/Render_State.mdwn b/Render_State.mdwn
index ff8992e..3595aa5 100644
--- a/Render_State.mdwn
+++ b/Render_State.mdwn
@@ -47,6 +47,7 @@ The Mali render state is a record of 16 32-bit words (64 bytes). It consists of
0x1C [7] stencil test
00000000 00000000 11111111 11111111 GL_STENCIL_TEST (either all bits are set or not)
+ 00000000 11111111 00000000 00000000 glAlphaFunc reference value: 0.5 = 0x80, 1.0 = 0xFF.
0x20 [8] multisample
00000000 00000000 00000000 00000111 always set? could be another CompareFunc
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 0f19801..ad7e95b 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -113,7 +113,7 @@ There also exists various "pipeline registers" (four of them listed above) which
It seems that varyings (floats) can be loaded in aligned groups of 1, 2, or 4.
This specifies how many to load at once. Note that the alignment affects the addressing;
for example, loading from an index of x at an alignment of 4 is equivalent to loading from 2*x
- at an alignment of 2.
+ and 2*x+1 at an alignment of 2.
00 - no alignment (load 1 float)
01 - alignment by 2 (load 2 floats)
11 - alignment by 4 (load 4 floats)
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index dbe8fd2..0f19801 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -434,7 +434,7 @@ Unlike a normal CPU, there are no explicit output registers for the ALU's, nor a
# Temporaries
-Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, it seems they have a latency of 4 cycles (i.e. a temporary cannot be read until 4 instructions after it is written).
+Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, read-after-write has a latency of 4 cycles (i.e. a temporary cannot be read until 4 instructions after it is written).
## Output Transformation
@@ -560,6 +560,7 @@ Instruction format:
0 - multiply (out = a * b)
1 - complex 1 (inverse, inverse sqrt, etc.)
takes all four inputs as arguments
+ This instruction has a latency of 2 cycles.
3 - complex 2 (inverse, inverse sqrt, etc.)
takes first two inputs as arguments,
the other two are normal (multiply)
whoops
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index ace9bf9..dbe8fd2 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -434,7 +434,7 @@ Unlike a normal CPU, there are no explicit output registers for the ALU's, nor a # Temporaries -Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, it seems they have a latency of 6 cycles (i.e. a temporary cannot be read until 6 instructions after it is written). +Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, it seems they have a latency of 4 cycles (i.e. a temporary cannot be read until 4 instructions after it is written). ## Output Transformation
mali gp temporary stuff
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index c7ae062..ace9bf9 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -434,7 +434,7 @@ Unlike a normal CPU, there are no explicit output registers for the ALU's, nor a
# Temporaries
-Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields.
+Akin to the fragment shader, there are also temporaries, which unlike registers can be indexed using a base register/input. They share the same namespace and method of loading as uniforms. Storing temporaries uses the same fields as storing a register/varying does, except that the "temporary store flag" is enabled, the unknown field is changed, and the complex ALU is used to select the store address instead of the "varying/register store 0" and "varying/register store 1" fields. Also, it seems they have a latency of 6 cycles (i.e. a temporary cannot be read until 6 instructions after it is written).
## Output Transformation
@@ -547,6 +547,7 @@ Instruction format:
4 - inverse sqrt (Partial)
5 - reciprocal (Partial)
9 - passthrough
+ 10 - Set Address Register 0 & Address Register 1 from result of passthrough unit
12 - Set Address Register 0 (Temporary Store address)
13 - Set Address Register 1
14 - Set Address Register 2
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index acc5951..c7ae062 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -471,7 +471,7 @@ These are the known inputs:
0-3: Register 0 Output [0, current] (Register/Attribute)
4-7: Register 1 Output [0, current] (Register)
- 9-11: Unknown (Never seen)
+ 8-11: Unknown (Never seen)
12-15: Load Result [0, current] (Uniform/Temporary)
16,17: Accumulator 0,1 Output [-1, last instruction]
18,19: Multiplier 0,1 Output [-1, last instruction]
diff --git a/index.mdwn b/index.mdwn index c34b4b3..fc6f7c6 100644 --- a/index.mdwn +++ b/index.mdwn @@ -36,7 +36,7 @@ Documentation for the shader compiler, and the initial investigation of the inst * [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) * [Allwinner A10](Hardware#Allwinner+A10) (Mele A1000, MK802) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) -* [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab/Note) +* [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab/Note, Samsung Chromebook) ### [Mali-200](Hardware#Mali-200):
Add new odepush
diff --git a/index.mdwn b/index.mdwn index b6e2c65..c34b4b3 100644 --- a/index.mdwn +++ b/index.mdwn @@ -7,6 +7,7 @@ The aim of this driver and others such as [freedreno](http://freedreno.github.co ## News === +* 2012-12-07: After what can only be described as an eternity, the LinuxTag demo code and tons of other changes have now been pushed to gitorious. Here is [a video of our brand new spinning companion cube](http://www.youtube.com/watch?v=k16ve88d-L0) spinning away at 60Hz. * 2012-05-27: Linuxtag talk slides and a separate demo of limare was posted on [phoronix](http://www.phoronix.com/scan.php?page=news_item&px=MTEwODA). * 2012-05-26: Lima talk at [Linuxtag Berlin](http://www.linuxtag.org/2012/de/program/program/vortragsdetails.html?no_cache=1&talkid=481): Textured, lighted portal cube, spins away correctly [(full video)](http://blip.tv/opensuse/linuxtag2012-lima-liberating-arm-s-mali-gpu-6166702)! * 2012-04-14: Rob Clark announces the [freedreno project](http://bloggingthemonkey.blogspot.co.uk/2012/04/fighting-back-against-binary-blobs.html) inspired by the Lima approach
added Mele A1000
diff --git a/Devices.mdwn b/Devices.mdwn index 966b6c1..5f4e858 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -27,6 +27,8 @@ The [Gooseberry](http://gooseberry.atspace.co.uk/) board is actually a tablet bo The [Hackberry](https://www.miniand.com/products/Hackberry%20A10%20Developer%20Board) development board comes with 1 GB of DDR3 RAM, 4 GB of NAND flash storage, a full-size SDHC card slot, Fast Ethernet, USB host ports, built-in 802.11n Wi-Fi, HDMI output and can be had for 65 USD. +## Mele A1000 + # Exynos 4 ## Origen Board
added some Exynos 4 and 5 devices
diff --git a/Devices.mdwn b/Devices.mdwn index 7fe9c86..966b6c1 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -26,3 +26,25 @@ The [Gooseberry](http://gooseberry.atspace.co.uk/) board is actually a tablet bo ## Hackberry The [Hackberry](https://www.miniand.com/products/Hackberry%20A10%20Developer%20Board) development board comes with 1 GB of DDR3 RAM, 4 GB of NAND flash storage, a full-size SDHC card slot, Fast Ethernet, USB host ports, built-in 802.11n Wi-Fi, HDMI output and can be had for 65 USD. + +# Exynos 4 + +## Origen Board + +## ODROID + +## Samsung Galaxy S II + +## Samsung Galaxy S III + +# Exynos 5 + +This SoC incorporates the Mali-T604 GPU along with 2 Cortex-A15 cores. + +## Arndale Board + +## Samsung Chromebook XE303C12 + +This is, as of December 2012, the only ARM-based Chromebook. It costs 249 USD. + +## Google Nexus 10
added some AllWinner A10 boards
diff --git a/Devices.mdwn b/Devices.mdwn index b0de43f..7fe9c86 100644 --- a/Devices.mdwn +++ b/Devices.mdwn @@ -1,16 +1,28 @@ -This page lists some of the available devices with a mali GPU, together with some useful info about them. The GPL VIOLATOR status for most of the devices is pretty much a given at this point, so let's just mark devices as such unless proven otherwise. +This page lists some of the available devices with a Mali GPU, together with some useful info about them. The GPL VIOLATOR status for most of the devices is pretty much a given at this point, so let's just mark devices as such unless proven otherwise. Be careful where you buy, most cheap shops will not ship from your country but will ship from China. This means that you might end up paying customs, and end up wasting some time at the customs office. # AMLogic 8726-M (Mali 400) ## Zenithink ZT-280 (**GPL VIOLATOR**) -=== The ZT-280 range includes the C71, a 7" tablet with a capacitive display. Can be had for under EUR 100 these days, but add customs and postage to that. ## Point of View ProTab 2XXL (**GPL VIOLATOR**) -=== According to the [spec sheet](http://www.pointofview-online.com/showroom.php?shop_mode=product_detail&product_id=308) provided by its manufacturer/reseller, the ProTab 2XXL features a Mali-400 GPU. This tablet features a 10" capacitive touch-screen, and is very competetively priced - it retails for [about EUR 170](http://geizhals.eu/713232). Point of View publishes "Firmware Updates" in its somewhat chaotic [download area](http://downloads.pointofview-online.com/Drivers/), but there's no source code in sight anywhere. + +# AllWinner A10 + +## Cubieboard + +The [Cubieboard](http://cubieboard.org) comes with 512 or 1024 MB of DDR3 RAM, 4 GB of NAND flash storage, a microSD card slot, Fast Ethernet, USB host ports, a SATA port, HDMI output and can be had for as low as 49 USD. As of December 2012, it is currently only available for pre-order. + +## Gooseberry + +The [Gooseberry](http://gooseberry.atspace.co.uk/) board is actually a tablet board. It comes with 4 GB of on-board storage, 802.11n Wi-Fi, HDMI, and a microSD card slot. Android 4.0 "Ice Cream Sandwich" is officially supported. + +## Hackberry + +The [Hackberry](https://www.miniand.com/products/Hackberry%20A10%20Developer%20Board) development board comes with 1 GB of DDR3 RAM, 4 GB of NAND flash storage, a full-size SDHC card slot, Fast Ethernet, USB host ports, built-in 802.11n Wi-Fi, HDMI output and can be had for 65 USD.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 8de419e..acc5951 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -541,8 +541,6 @@ Instruction format:
7 - max/logical or (a || b)
note: abs(a) is implemented as max(a, -a)
86-89: Complex OpCode
- For complex functions (rcp, sqrt, etc.), the inputs to the multiply ALU0 and
- the input to the complex ALU are the same value.
0 - unused
2 - exp2 (Partial)
3 - log2 (Partial)
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index a28496d..8de419e 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -430,7 +430,7 @@ For more information on the disassembly/decompiling tools see [[Vertex+Disassemb The vertex shader has a scalar VLIW architecture. Each instruction has a field for 2 addition ALU's, 2 multiplication ALU's, a complex ALU, a passthrough ALU, an attribute load unit, a register load unit, a uniform/temporary load unit, and a varying/register/temporary store unit. Instructions are fixed-length - each instruction consists of 4 words. Constants are implemented internally by uniforms. -Unlike a normal CPU, there are no explicit output registers for the ALU's, nor are there any explicit input registers. Instead, the input field(s) for each ALU can directly reference the ALU results from previous instructions (see below). However, there are 16 registers (maybe less?) that can be used when two instructions are too far apart for one to reference the result of the other, or for special cases such as loops. Only one (4-component) register may be loaded & stored per instruction, and storing registers and temporaries shares some of the same fields as storing varyings. +Unlike a normal CPU, there are no explicit output registers for the ALU's, nor are there any explicit input registers. Instead, the input field(s) for each ALU can directly reference the ALU results from previous instructions (see below). However, there are 16 registers that can be used when two instructions cannot be scheduled so that one references the result of the other (either directly, or through one or more passthroughs), or for special cases such as loops. Only one (4-component) register may be loaded & stored per instruction, and storing registers and temporaries shares some of the same fields as storing varyings. # Temporaries
Added some new Exynos options.
diff --git a/Hardware.mdwn b/Hardware.mdwn index e240b13..c6c6cff 100644 --- a/Hardware.mdwn +++ b/Hardware.mdwn @@ -35,7 +35,7 @@ From a driver point of view, very few infrastructural changes are needed for sup ## Mali-T604/T658 -These unified shader designs were announced by ARM but are not currently shipping. Once this hardware is available to lima developers, support for it can be evaluated. +The T604 was first released in November 2012 as part of the Exynos 5250 chipset by Samsung, integrated in the Google Nexus 10 tablet and Samsung Chromebook. This and the as of yet unreleased T658 are of a unified shader design. Once this hardware is available to lima developers, support for it can be evaluated. # SoCs # === @@ -63,7 +63,9 @@ There is a pre-built image of Linaro Android with the Lima(re) demo included. Th ## Samsung Exynos -The [Samsung Exynos](http://en.wikipedia.org/wiki/Exynos) 42xx is a range of ARM Cortex A9 devices clocked between 1.2 and 1.8GHz. They are the only devices currently carrying a Mali-400MP4. The Exynos of course stars in the top selling, high end Samsung android based smartphones and tablets. The best sold phone of 2011, the Samsung Galaxy S II, comes with an Exynos. A [Single Board Computer with a 4210, called origen,](http://www.origenboard.org/) is available with android and ubuntu support. +The [Samsung Exynos](http://en.wikipedia.org/wiki/Exynos) 42xx is a range of ARM Cortex A9 devices clocked between 1.2 and 1.8GHz. They are the only devices currently carrying a Mali-400MP4. The Exynos of course stars in the top selling, high end Samsung android based smartphones and tablets. The best sold phone of 2011, the Samsung Galaxy S II, comes with an Exynos. A [Single Board Computer with a 4210, called origen,](http://www.origenboard.org/) is available with android and ubuntu support. Another option is the [Cotton Candy](http://www.cstick.com) by FXI Tech, a USB/HDMI thumb computer, which in its initial revision has the 4210. + +Exynos 5250 (also seen as Exynos 5) is a dual-core ARM Cortex A15 device clocked at 1.7GHz with a Mali T604. The first releases were part of the first Google Nexus 10, and the first ARM-based Chromebook, both by Samsung. ## Telechips 8902
lower bits of frag shader address
diff --git a/Render_State.mdwn b/Render_State.mdwn
index 61d37d5..ff8992e 100644
--- a/Render_State.mdwn
+++ b/Render_State.mdwn
@@ -59,7 +59,10 @@ The Mali render state is a record of 16 32-bit words (64 bytes). It consists of
00000000 00000000 11110000 00000111 (default in GLES2)
00000000 00000000 11111000 00000111 (default in lima)
- 0x24 [9] shader address (16-aligned)
+ 0x24 [9] shader address
+
+ 11111111 11111111 11111111 11100000 Fragment shader address
+ 00000000 00000000 00000000 00011111 Size of first instruction
0x28 [10] varying types
and/or/xor are for scalar multiply ALU too
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index de1e2df..a28496d 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -207,6 +207,9 @@ There also exists various "pipeline registers" (four of them listed above) which
o - opcode:
00xxx - arg0 * arg1 * 2^x where x is in two's complement format
01000 - not(arg0)
+ 01001 - and(arg0, arg1)
+ 01010 - or(arg0, arg1)
+ 01011 - xor(arg0, arg1)
01100 - notEqual(arg0, arg1)
01101 - lessThan(arg0, arg1)
01110 - lessThanEqual(arg0, arg1)
add link to list of texel formats
diff --git a/index.mdwn b/index.mdwn index 0dd4519..b6e2c65 100644 --- a/index.mdwn +++ b/index.mdwn @@ -61,6 +61,7 @@ Lima Documents * [[MBS+File+Format]] * [[Fragment+Shader+Backend]] * [[Render State]] +* [[Texel Formats]] ## Contribute ===
Big list of texel formats
diff --git a/Texel_Formats.mdwn b/Texel_Formats.mdwn new file mode 100644 index 0000000..dfbf9f2 --- /dev/null +++ b/Texel_Formats.mdwn @@ -0,0 +1,55 @@ +Like many GPUs, the Mali GPU supports a wide range of texel formats: + + alpha flags components + id bpp byt ia ha rb ro r g b a d s l i note + + 00 1 1 1 + 01 1 1 + + 1 + 02 1 1 1 + 03 2 1 + + 1 1 + 04 4 1 4 + 05 4 1 + + 4 + 06 4 1 4 + 07 4 1 + + + 1 1 1 1 + 08 8 1 + + 4 4 + 09 8 1 8 + 0A 8 1 + + 8 + 0B 8 1 8 + 0C 8 1 + 3 3 2 + 0D 8 1 + + + 2 2 2 2 + 0E 16 2 + 5 6 5 + 0F 16 2 + + + 5 5 5 1 + 10 16 2 + + + 4 4 4 4 + 11 16 1 + + 8 8 + 12 16 2 16 + 13 16 2 + + 16 + 14 16 2 16 + 15 N/A 1 8 8 8 + 16 32 1 + + + 8 8 8 8 + 17 32 1 + 8 8 8 + 18 32 4 + + + 10 10 10 2 + 19 32 4 + 11 11 10 + 1A 32 4 + 10 12 10 + 1B 32 2 + + 16 16 + 1C 64 2 + + + 16 16 16 16 + 1D 4 1 Paletted? + 1E 8 1 Paletted? + 20 4 1 ETC1_RGB8 (Ericcon Texture Compression) + 22 16 2 16 Float + 23 16 2 + + 16 Float + 24 16 2 16 Float + 25 32 2 + + 16 16 Float + 26 64 2 + + + 16 16 16 16 Float + 2C 32 4 24 8 Depth/stencil + 2D 64 4 + 2E 48 2 + 16 16 16 + 2F 48 2 + 16 16 16 Float + 32 32 4 ? + 3F 0 0 INVALID + + bpp: bits per pixel + byt: bytes per copy element + ia: is alpha + ha: has alpha + rb/ro: ? + r/g/b/a/d/s/l/i: red/green/blue/alpha/depth/stencil/luminance/intensity
some new bits
diff --git a/Render_State.mdwn b/Render_State.mdwn
index 959771b..61d37d5 100644
--- a/Render_State.mdwn
+++ b/Render_State.mdwn
@@ -22,6 +22,8 @@ The Mali render state is a record of 16 32-bit words (64 bytes). It consists of
0x0C [3] depth test
00000000 00000000 00000000 00000001 GL_DEPTH_TEST
00000000 00000000 00000000 00001110 depthFunc (CompareFunc)
+ 00000000 11111111 00000000 00000000 polygonOffset factor
+ 11111111 00000000 00000000 00000000 polygonOffset units
0x10 [4] depth range
11111111 11111111 00000000 00000000 max(nearVal, farVal)
@@ -47,27 +49,35 @@ The Mali render state is a record of 16 32-bit words (64 bytes). It consists of
00000000 00000000 11111111 11111111 GL_STENCIL_TEST (either all bits are set or not)
0x20 [8] multisample
+ 00000000 00000000 00000000 00000111 always set? could be another CompareFunc
+ 00000000 00000000 00000000 01101000 (0x00006800 "4x MSAA" in lima)
00000000 00000000 00000000 10000000 GL_SAMPLE_ALPHA_TO_COVERAGE
+ 00000000 00000000 00000001 00000000 GL_SAMPLE_ALPHA_TO_ONE
00000000 00000000 11110000 00000000 sampleCoverage (SampleCoverage)
+ 00000000 11000000 00000000 00000000 vertex selector? (00 GL_POINTS 01 GL_LINE* 10 GL_TRIANGLE*)
00000000 00000000 11110000 00000111 (default in GLES2)
00000000 00000000 11111000 00000111 (default in lima)
- 00000000 00000000 00000000 01101000 (0x00006800 "4x MSAA" in lima)
- 0x24 [9] shader address
+ 0x24 [9] shader address (16-aligned)
0x28 [10] varying types
- 0x2C [11] uniforms address
+ 0x2C [11] uniforms address (16-aligned)
- 0x30 [12] textures address
+ 0x30 [12] textures address (16-aligned)
0x34 [13] ?
+ 00000000 00000000 00000001 00000000 ? usually 1
+ 00000000 00000000 00000010 00000000 Enable early Z
+ 00000000 00000000 00010000 00000000 Enable pixel kill
- 0x38 [14] dither (and maybe more)
+ 0x38 [14] dither etc
+ 00000000 00000000 00010000 00000000 glFrontFace (0=GL_CCW, 1=GL_CW)
00000000 00000000 00100000 00000000 GL_DITHER
+ 00000000 00000001 00000000 00000000 set if(uniform_size) in Lima
- 0x3C [15] varyings address
+ 0x3C [15] varyings address (16-aligned)
## Bitfields
clarify first sentence a bit
diff --git a/Render_State.mdwn b/Render_State.mdwn
index 891a734..959771b 100644
--- a/Render_State.mdwn
+++ b/Render_State.mdwn
@@ -1,6 +1,6 @@
# Render state
-The Mali render state is a record of 16 32-bit words (64 bytes). It consists of mainly rasterizer state. When queuing the draw command it is passed `LIMA_PLBU_CMD_RSW_VERTEX_ARRAY` (see `vs_commands_draw_add` in the Lima source).
+The Mali render state is a record of 16 32-bit words (64 bytes). It consists of mainly rasterizer state. When queuing a draw command an address of such a structure is passed with `LIMA_PLBU_CMD_RSW_VERTEX_ARRAY` (see `vs_commands_draw_add` in the Lima source).
0x00 [0] blend color
00000000 00000000 00000000 11111111 blendColor blue component
add link to new Render State page
diff --git a/index.mdwn b/index.mdwn index c07c28b..0dd4519 100644 --- a/index.mdwn +++ b/index.mdwn @@ -60,6 +60,7 @@ Lima Documents * [[Mali_Offline_Shader_Compiler]] * [[MBS+File+Format]] * [[Fragment+Shader+Backend]] +* [[Render State]] ## Contribute ===
Add with my findings about the mali render state word
diff --git a/Render_State.mdwn b/Render_State.mdwn new file mode 100644 index 0000000..891a734 --- /dev/null +++ b/Render_State.mdwn @@ -0,0 +1,132 @@ +# Render state + +The Mali render state is a record of 16 32-bit words (64 bytes). It consists of mainly rasterizer state. When queuing the draw command it is passed `LIMA_PLBU_CMD_RSW_VERTEX_ARRAY` (see `vs_commands_draw_add` in the Lima source). + + 0x00 [0] blend color + 00000000 00000000 00000000 11111111 blendColor blue component + 00000000 11111111 00000000 00000000 blendColor green component + + 0x04 [1] blend color + 00000000 00000000 00000000 11111111 blendColor red component + 00000000 11111111 00000000 00000000 blendColor alpha component + + 0x08 [2] alpha blend + 00000000 00000000 00000000 00000111 modeRGB (BlendEquation) + 00000000 00000000 00000000 00111000 modeAlpha (BlendEquation) + 00000000 00000000 00000111 11000000 srcRGB (ColorBlendFunc) + 00000000 00000000 11111000 00000000 dstRGB (ColorBlendFunc) + 00000000 00001111 00000000 00000000 srcAlpha (AlphaBlendFunc) + 00000000 11110000 00000000 00000000 dstAlpha (AlphaBlendFunc) + ???????? 00000000 00000000 00000000 always 11111100? (TODO: check whether this is GLES1 glAlphaFunc) + + 0x0C [3] depth test + 00000000 00000000 00000000 00000001 GL_DEPTH_TEST + 00000000 00000000 00000000 00001110 depthFunc (CompareFunc) + + 0x10 [4] depth range + 11111111 11111111 00000000 00000000 max(nearVal, farVal) + 00000000 00000000 11111111 11111111 min(nearVal, farVal) + + 0x14 [5] stencil GL_FRONT + 00000000 00000000 00000000 00000111 func (CompareFunc) + 00000000 00000000 00000000 00111000 sfail (StencilOp) + 00000000 00000000 00000001 11000000 dpfail (StencilOp) + 00000000 00000000 00001110 00000000 dppass (StencilOp) + 00000000 11111111 00000000 00000000 ref + 11111111 00000000 00000000 00000000 mask + + 0x18 [6] stencil GL_BACK + 00000000 00000000 00000000 00000111 func (CompareFunc) + 00000000 00000000 00000000 00111000 sfail (StencilOp) + 00000000 00000000 00000001 11000000 dpfail (StencilOp) + 00000000 00000000 00001110 00000000 dppass (StencilOp) + 00000000 11111111 00000000 00000000 ref + 11111111 00000000 00000000 00000000 mask + + 0x1C [7] stencil test + 00000000 00000000 11111111 11111111 GL_STENCIL_TEST (either all bits are set or not) + + 0x20 [8] multisample + 00000000 00000000 00000000 10000000 GL_SAMPLE_ALPHA_TO_COVERAGE + 00000000 00000000 11110000 00000000 sampleCoverage (SampleCoverage) + + 00000000 00000000 11110000 00000111 (default in GLES2) + 00000000 00000000 11111000 00000111 (default in lima) + 00000000 00000000 00000000 01101000 (0x00006800 "4x MSAA" in lima) + + 0x24 [9] shader address + + 0x28 [10] varying types + + 0x2C [11] uniforms address + + 0x30 [12] textures address + + 0x34 [13] ? + + 0x38 [14] dither (and maybe more) + 00000000 00000000 00100000 00000000 GL_DITHER + + 0x3C [15] varyings address + +## Bitfields + + CompareFunc: + 000 GL_NEVER + 001 GL_LESS + 010 GL_EQUAL + 011 GL_LEQUAL + 100 GL_GREATER + 101 GL_NOTEQUAL + 110 GL_GEQUAL + 111 GL_ALWAYS + + StencilOp: + 000 GL_KEEP + 001 GL_REPLACE + 010 GL_ZERO + 011 GL_INVERT + 100 GL_INCR_WRAP + 101 GL_DECR_WRAP + 110 GL_INCR + 111 GL_DECR + + BlendEquation: + 000 GL_FUNC_SUBTRACT + 001 GL_FUNC_REVERSE_SUBTRACT + 010 GL_FUNC_ADD + 100 GL_MIN_EXT + 101 GL_MAX_EXT + + ColorBlendFunc: + 00000 GL_SRC_COLOR + 00001 GL_DST_COLOR + 00010 GL_CONSTANT_COLOR + 00011 GL_ZERO + 00111 GL_SRC_ALPHA_SATURATE + 01000 GL_ONE_MINUS_SRC_COLOR + 01001 GL_ONE_MINUS_DST_COLOR + 01010 GL_ONE_MINUS_CONSTANT_COLOR + 01011 GL_ONE + 10000 GL_SRC_ALPHA + 10001 GL_DST_ALPHA + 11000 GL_ONE_MINUS_SRC_ALPHA + 11001 GL_ONE_MINUS_DST_ALPHA + 10010 GL_CONSTANT_ALPHA + 11010 GL_ONE_MINUS_CONSTANT_ALPHA + + AlphaBlendFunc is the same as ColorBlendFunc, except that the upper bit is missing. + This can be the case because the upper bit determines _ALPHA or _COLOR, and for the the alpha factor + these are equivalent. + + SampleCoverage: + 0000 value=0.00 inverted=FALSE + 0001 value=0.25 inverted=FALSE + 0011 value=0.50 inverted=FALSE + 0111 value=0.75 inverted=FALSE + 1111 value=1.0 inverted=FALSE + 1111 value=0.00 inverted=TRUE + 1110 value=0.25 inverted=TRUE + 1100 value=0.50 inverted=TRUE + 1000 value=0.75 inverted=TRUE + 0000 value=1.00 inverted=TRUE
fix multiplier comparison opcode
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index aa697cf..de1e2df 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -193,7 +193,7 @@ There also exists various "pipeline registers" (four of them listed above) which
01010 - or(arg0, arg1)
01011 - xor(arg0, arg1)
01100 - notEqual(arg0, arg1)
- 01101 - greaterThan(arg0, arg1)
+ 01101 - lessThan(arg0, arg1)
01110 - lessThanEqual(arg0, arg1)
01111 - equal(arg0, arg1)
10000 - min(arg0, arg1)
@@ -208,7 +208,7 @@ There also exists various "pipeline registers" (four of them listed above) which
00xxx - arg0 * arg1 * 2^x where x is in two's complement format
01000 - not(arg0)
01100 - notEqual(arg0, arg1)
- 01101 - greaterThan(arg0, arg1)
+ 01101 - lessThan(arg0, arg1)
01110 - lessThanEqual(arg0, arg1)
10001 - max(arg0, arg1)
10000 - min(arg0, arg1)
SATT is also a possibility for table
diff --git a/MBS+File+Format.mdwn b/MBS+File+Format.mdwn
index 7fa28ee..e54a9a2 100644
--- a/MBS+File+Format.mdwn
+++ b/MBS+File+Format.mdwn
@@ -59,7 +59,7 @@
}
table {
- chunk header; // ="SUNI"/"SVAR"
+ chunk header; // ="SUNI"/"SVAR"/"SATT"
uint32_t count;
symbol symbols[count];
}
add implementation of asin and acos
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 6e61dda..aa697cf 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -339,6 +339,10 @@ There also exists various "pipeline registers" (four of them listed above) which
$temp.x *= $temp.y;
result = atan_pt2 $temp;
+ asin and acos are implemented using atan2, as follows:
+ asin(x) = atan2(x, sqrt(1 - x^2))
+ acos(x) = atan2(sqrt(1 - x^2), x)
+
atan_pt1:
dd ddmm mmaa aaaa AAbb bbbb BBoo oo01
fill in more of the vertex shader format (mainly from mbs_dump)
diff --git a/MBS+File+Format.mdwn b/MBS+File+Format.mdwn
index 38ed1b5..7fa28ee 100644
--- a/MBS+File+Format.mdwn
+++ b/MBS+File+Format.mdwn
@@ -34,7 +34,7 @@
}
symbol {
- chunk header; // ="VUNI"/"VVAR"
+ chunk header; // ="VUNI"/"VVAR"/"VATT"
string symbol;
uint8_t unknown_0; // =0x00
// type:
@@ -71,6 +71,9 @@
frag {
chunk header; // ="CFRA"
+ // version (seems _mali_core_type from mali_ioctl.h)
+ // 0x05 MALI_200
+ // 0x07 MALI_400_PP
uint32_t version; // =5
frag_sta sta;
frag_dis dis;
@@ -80,8 +83,28 @@
dbin code;
}
+ vert_fins {
+ chunk header; // ="FINS"
+ uint32_t unknown_0;
+ uint32_t instructions;
+ uint32_t attrib_prefetch;
+ }
+
+ vertex {
+ chunk header; // ="CVER"
+ // version (seems _mali_core_type from mali_ioctl.h)
+ // 0x02 MALI_GP2
+ // 0x06 MALI_400_GP
+ uint32_t version;
+ vert_fins fins;
+ table uniforms; // ="SUNI"
+ table attributes; // ="SATT"
+ table variants; // ="SVAR"
+ dbin code;
+ }
file {
chunk header; // ="MBS1"
frag fragment;
+ vert vertex;
}
fill in FSTA, FDIS, FBUU
diff --git a/MBS+File+Format.mdwn b/MBS+File+Format.mdwn
index 03bcf8c..38ed1b5 100644
--- a/MBS+File+Format.mdwn
+++ b/MBS+File+Format.mdwn
@@ -12,19 +12,25 @@
frag_sta {
chunk header; // ="FSTA"
- uint32_t unknown_0; // =1
- uint32_t unknown_1; // =1
+ uint32_t stacksize; // fragment stack size
+ uint32_t stackofs; // starting offset
}
frag_dis {
chunk header; // ="FDIS"
- uint32_t unknown_0; // =0
+ uint32_t discard; // 1 if shader has discard instruction
}
frag_buu {
chunk header; // ="FBUU"
- uint32_t unknown_0; // =256
- uint32_t unknown_1; // =0
+ uint8_t reads_color; // gl_FBColor
+ uint8_t writes_color; // gl_FragColor
+ uint8_t reads_depth; // gl_FBDepth
+ uint8_t writes_depth; // ? gl_FragDepth (not supported in GLES2)
+ uint8_t reads_stencil; // gl_FBStencil
+ uint8_t writes_stencil; // ? gl_FragStencil (not supported in GLES2)
+ uint8_t unknown_0;
+ uint8_t unknown_1;
}
symbol {
describe types: add struct, samplerExternalOES
diff --git a/MBS+File+Format.mdwn b/MBS+File+Format.mdwn
index 56d3e0e..03bcf8c 100644
--- a/MBS+File+Format.mdwn
+++ b/MBS+File+Format.mdwn
@@ -31,7 +31,16 @@
chunk header; // ="VUNI"/"VVAR"
string symbol;
uint8_t unknown_0; // =0x00
- uint8_t type; // =0x00
+ // type:
+ // 0x01 float
+ // 0x02 int
+ // 0x03 bool
+ // 0x04 matrix
+ // 0x05 sampler2D
+ // 0x06 samplerCube
+ // 0x08 struct
+ // 0x09 samplerExternalOES
+ uint8_t type;
uint16_t component_count;
uint16_t component_size;
uint16_t entry_count;
@@ -40,7 +49,7 @@
uint8_t precision;
uint32_t invariant; // 1 if "invariant" keyword specified, otherwise 0
uint16_t offset;
- uint16_t index; // Usually -1 (0xFFFF)
+ uint16_t index; // Usually -1 (0xFFFF) otherwise index of parent struct
}
table {
chunk vuni/vvar: invariant
diff --git a/MBS+File+Format.mdwn b/MBS+File+Format.mdwn
index 3127f20..56d3e0e 100644
--- a/MBS+File+Format.mdwn
+++ b/MBS+File+Format.mdwn
@@ -30,15 +30,15 @@
symbol {
chunk header; // ="VUNI"/"VVAR"
string symbol;
- uint8_t type; // =0x00
uint8_t unknown_0; // =0x00
+ uint8_t type; // =0x00
uint16_t component_count;
uint16_t component_size;
uint16_t entry_count;
uint16_t src_stride;
uint8_t dst_stride;
uint8_t precision;
- uint32_t unknown_1; // =0x00000000
+ uint32_t invariant; // 1 if "invariant" keyword specified, otherwise 0
uint16_t offset;
uint16_t index; // Usually -1 (0xFFFF)
}
add logical and/or/xor
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 2c06e6a..6e61dda 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -189,6 +189,9 @@ There also exists various "pipeline registers" (four of them listed above) which
Opcode:
00xxx - arg0 * arg1 * 2^x, where x is in two's-complement format
01000 - not(arg0)
+ 01001 - and(arg0, arg1)
+ 01010 - or(arg0, arg1)
+ 01011 - xor(arg0, arg1)
01100 - notEqual(arg0, arg1)
01101 - greaterThan(arg0, arg1)
01110 - lessThanEqual(arg0, arg1)
note additional multiply for atan2
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 348bed8..2c06e6a 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -331,6 +331,11 @@ There also exists various "pipeline registers" (four of them listed above) which
atan_pt1 takes the (scalar) input and produces a 3-component vector.
atan_pt2 takes the vector and produces the final output.
+ Unlike atan_pt1, you need to do an additional multiply between atan2_pt1 and atan_pt2:
+ $temp.xyz = atan2_pt1 y, x;
+ $temp.x *= $temp.y;
+ result = atan_pt2 $temp;
+
atan_pt1:
dd ddmm mmaa aaaa AAbb bbbb BBoo oo01
note special varying source values for inputting into textureCube()
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index e5994dd..348bed8 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -127,8 +127,10 @@ There also exists various "pipeline registers" (four of them listed above) which
However, I haven't been able to test this theory because I haven't gotten the compiler
to produce a value for O other than 11.
s - source:
- 00pp - varying
+ 00pp - normal varying
01pp - register (see second instruction format)
+ 1000 - varying, input to textureCube()
+ 1001 - register, input to textureCube()
1011 - gl_FragCoord
1100 - gl_PointCoord
1101 - gl_FrontFacing
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index b639a37..e5994dd 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -381,7 +381,7 @@ There also exists various "pipeline registers" (four of them listed above) which
control[16], Branch/Discard
Branch:
- 0 0011 tttt tttt tttt tttt tttt tttt ttt0 0000 0000 0000 0000 0000 0ccc aaaa aabb bbbb 0000
+ 0 0011 tttt tttt tttt tttt tttt tttt ttt0 0000 0000 0000 0000 0000 0ccc aaaa aabb bbbb 0000
Discard:
0 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0111 1111 0000 0000 0000 0011
add discard
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index c33dc9c..b639a37 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -32,7 +32,7 @@ Apparently, most GPU's processes fragments in groups of 2x2; I suspect ours does
13: {31, 1} Scalar Addition ALU
14: {30, 1} Vec4-Scalar Multiply/Transcendental Scalar ALU
15: {41, 1} Temporary Write/Framebuffer Read
- 16: {73, 2} Branch
+ 16: {73, 2} Branch/Discard
17: {64, 2} Vec4 Constant Fetch 0
18: {64, 2} Vec4 Constant Fetch 1
19..24: { } Scheduling
@@ -378,10 +378,14 @@ There also exists various "pipeline registers" (four of them listed above) which
Note: since gl_FBDepth is a float, and the alignment is set to 1,
this instr will always set the x component of the specified destination register.
- control[16], branch
+ control[16], Branch/Discard
+ Branch:
0 0011 tttt tttt tttt tttt tttt tttt ttt0 0000 0000 0000 0000 0000 0ccc aaaa aabb bbbb 0000
+ Discard:
+ 0 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0111 1111 0000 0000 0000 0011
+
c - condition:
bit 0 - jump if a > b
bit 1 - jump if a = b
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 995a488..c33dc9c 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -101,7 +101,6 @@ There also exists various "pipeline registers" (four of them listed above) which
control[7], Varying Fetch
- 0xmdii3C60
00 mmmm dddd iiii iiOO 00oo oo00 0aa0 ssss
Or, for loading from a register (used for loading texture coordinates from a register):
00 mmmm dddd SSSS SSSS Anrr rr00 0000 01pp
Galaxy Note (GT-N7000) also has Mali-400
diff --git a/index.mdwn b/index.mdwn index 8fe3c3a..c07c28b 100644 --- a/index.mdwn +++ b/index.mdwn @@ -35,7 +35,7 @@ Documentation for the shader compiler, and the initial investigation of the inst * [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) * [Allwinner A10](Hardware#Allwinner+A10) (Mele A1000, MK802) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) -* [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab) +* [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab/Note) ### [Mali-200](Hardware#Mali-200):
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 85a6ff4..995a488 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -155,13 +155,6 @@ There also exists various "pipeline registers" (four of them listed above) which
The coordinates for the texture fetch are always the output of the varying load.
- The actual sampler index (i.e. which sampler unit to use) is passed in by the driver
- as a uniform vec2, as indicated in the symbol table, and read by the sampler unit before
- actually performing the texture sample. I suspect the use of 2 indices may have to do with
- the "virtualized textures" feature (see datasheet on front page), but the driver doesn't
- seem to implement this. Note that the index is aligned, just like for varying vec2 loads,
- so, for example, an index of 1 tells the processor to load uniform[0].zw.
-
s - sampler index (offset into uniform table)
o - sampler index register offset enable
c - sampler index offset register
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index cdbaf01..85a6ff4 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -247,7 +247,7 @@ There also exists various "pipeline registers" (four of them listed above) which
01110 - min(arg0, arg1)
10000 - sum3 - dest.xyzw = sum of first 3 components of arg1
10001 - sum4 - dest.xyzw = sum of all components of arg1
- Note: the output is broadcast to all channels -
+ Note: for sum3 and sum4, the output is broadcast to all channels -
you can use the write mask to select which component to write to
10100 - dFdx(arg0, arg1)
10101 - dFdy(arg0, arg1)
@@ -383,8 +383,8 @@ There also exists various "pipeline registers" (four of them listed above) which
s - source
11 - gl_FBColor
10 - gl_FBDepth
- Note: since gl_FBDepth is a float, this instr will always set the x component
- of the specified destination register.
+ Note: since gl_FBDepth is a float, and the alignment is set to 1,
+ this instr will always set the x component of the specified destination register.
control[16], branch
update sum3 and sum4
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index c17bd74..cdbaf01 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -245,7 +245,10 @@ There also exists various "pipeline registers" (four of them listed above) which
01010 - lessThanEqual(arg0, arg1)
01111 - max(arg0, arg1)
01110 - min(arg0, arg1)
- 10001 - dest.w = sum of all components of arg1
+ 10000 - sum3 - dest.xyzw = sum of first 3 components of arg1
+ 10001 - sum4 - dest.xyzw = sum of all components of arg1
+ Note: the output is broadcast to all channels -
+ you can use the write mask to select which component to write to
10100 - dFdx(arg0, arg1)
10101 - dFdy(arg0, arg1)
Note: dFdx(x) is actually implemented as dFdx(-x, x) (same for dFdy)
diff --git a/index.mdwn b/index.mdwn index 64e9df3..8fe3c3a 100644 --- a/index.mdwn +++ b/index.mdwn @@ -33,7 +33,7 @@ Documentation for the shader compiler, and the initial investigation of the inst ### [Mali-400](Hardware#Mali-400): * [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) -* [Allwinner A10](Hardware#Allwinner+A10) +* [Allwinner A10](Hardware#Allwinner+A10) (Mele A1000, MK802) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) * [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab)
describe the texture sampler unit more thoroughly
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 3824551..c17bd74 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -151,11 +151,18 @@ There also exists various "pipeline registers" (four of them listed above) which
control[8], Texture fetch
- 00111001000000000001ssssssssssssottttt00000b000000ccccccrrrrrr
+ 00 1110 0100 0000 0000 01ss ssss ssss ssot tttt 0000 0b00 0000 cccc ccrr rrrr
The coordinates for the texture fetch are always the output of the varying load.
- s - sampler index
+ The actual sampler index (i.e. which sampler unit to use) is passed in by the driver
+ as a uniform vec2, as indicated in the symbol table, and read by the sampler unit before
+ actually performing the texture sample. I suspect the use of 2 indices may have to do with
+ the "virtualized textures" feature (see datasheet on front page), but the driver doesn't
+ seem to implement this. Note that the index is aligned, just like for varying vec2 loads,
+ so, for example, an index of 1 tells the processor to load uniform[0].zw.
+
+ s - sampler index (offset into uniform table)
o - sampler index register offset enable
c - sampler index offset register
t - sampler type
@@ -223,8 +230,7 @@ There also exists various "pipeline registers" (four of them listed above) which
11 - round to integer
control[12], Vec4 Addition ALU
-
- 0xoaaA?b??
+
iooo ooMM mmmm dddd CCaa aaaa aaAA AADD bbbb bbbb BBBB
i - whether to get Argument 1 from the multiplication ALU (below)
added framebuffer read stuff
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 2f8afb3..3824551 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -31,7 +31,7 @@ Apparently, most GPU's processes fragments in groups of 2x2; I suspect ours does
12: {44, 1} Vec4 Addition ALU
13: {31, 1} Scalar Addition ALU
14: {30, 1} Vec4-Scalar Multiply/Transcendental Scalar ALU
- 15: {41, 1} Temporary Write
+ 15: {41, 1} Temporary Write/Framebuffer Read
16: {73, 2} Branch
17: {64, 2} Vec4 Constant Fetch 0
18: {64, 2} Vec4 Constant Fetch 1
@@ -348,7 +348,9 @@ There also exists various "pipeline registers" (four of them listed above) which
a - source (vector)
A - swizzle descriptor
- control[15], Temporary Write
+ control[15], Temporary Write/Framebuffer Read
+
+ Temporary Write:
i iiii iiii iiii iiio rrrr rr00 0000 a0ss ssss 00dd
@@ -364,6 +366,17 @@ There also exists various "pipeline registers" (four of them listed above) which
o - register offset enable
r - offset register
+ Framebuffer Read:
+
+ 0 0000 0000 0000 0000 0000 0000 0000 10dd dd00 11ss
+
+ d - destination register
+ s - source
+ 11 - gl_FBColor
+ 10 - gl_FBDepth
+ Note: since gl_FBDepth is a float, this instr will always set the x component
+ of the specified destination register.
+
control[16], branch
0 0011 tttt tttt tttt tttt tttt tttt ttt0 0000 0000 0000 0000 0000 0ccc aaaa aabb bbbb 0000
add link to freedreno's swanky new site
diff --git a/index.mdwn b/index.mdwn index 5544f70..64e9df3 100644 --- a/index.mdwn +++ b/index.mdwn @@ -3,7 +3,7 @@ Lima is an open source graphics driver which supports Mali-200 and Mali-400 GPUs. -The aim of this driver is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. Lima is going to solve this for you, but some time is needed still to get there. +The aim of this driver and others such as [freedreno](http://freedreno.github.com) is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. ## News ===
turns out there are 128 stages
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 3f00398..2f8afb3 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -2,7 +2,7 @@ ## Fragment Shader Architecture -The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. Each unit can affect/produce results which are used by all later units, as if all 6 registers are passed between each unit in the pipeline (see "Lima Fragment Pipeline" below). Furthermore, to reduce register pressure, there are a number of "pipeline registers". A pipeline register is a direct connection between two units in the pipeline, in addition to the normal registers which are passed between every unit. For more details on registers (including pipeline registers), see the "Registers" section below. To overcome the pipeline stall issues inherent in such a long pipeline (~256 stages), the architecture is likely barrelled and interleaves execution of a large number (~256) of fragments at once, and scheduling is done by the machine in order to minimize stalls. +The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. Each unit can affect/produce results which are used by all later units, as if all 6 registers are passed between each unit in the pipeline (see "Lima Fragment Pipeline" below). Furthermore, to reduce register pressure, there are a number of "pipeline registers". A pipeline register is a direct connection between two units in the pipeline, in addition to the normal registers which are passed between every unit. For more details on registers (including pipeline registers), see the "Registers" section below. To overcome the pipeline stall issues inherent in such a long pipeline (128 stages for Mali-200, see [this page](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka12787.html)), the architecture is likely barrelled and interleaves execution of a large number of fragments at once, and scheduling is done by the machine in order to minimize stalls. The instruction stream is compressed down from a maximum of 18-words per instruction dependant on what units are in use. The remaining bits give each unit individual instructions and constants.
Updated and cleaned up vertex terminology, now clearer and the same as the latest source.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 833ebd6..3f00398 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -434,115 +434,116 @@ it would seem that pass.op5 performs the opposite of pass.op4.
These are the known inputs:
- 0-3: current instruction, attribute load result
- 4-7: current instruction, register load result
- 12-15: current instruction, uniform load result
- 16, 17: last instruction, acc ALU0/1 results
- 18, 19: last instruction, mul ALU0/1 results
- 20: last instruction, passthrough unit (bits 111-115)
- 21: unused
- 22: identity/passthrough (0 for add, 1 for multiply)
- For addition some_reg + -r22 means to passthrough some_reg.
- For multiplication some_reg * r22 also means to passthrough some_reg.
- This register also means last instruction, complex ALU result
- when it is in the "input 0" field (instead of input 1)
- as well as when used in the complex/passthrough ALU's.
- 23: two instructions ago, passthrough unit
- 24, 25: two instructions ago, acc ALU0/1 results
- 26, 27: two instructions ago, mul ALU0/1 results
- 28-31: last instruction, attribute load result
+ 0-3: Register 0 Output [0, current] (Register/Attribute)
+ 4-7: Register 1 Output [0, current] (Register)
+ 9-11: Unknown (Never seen)
+ 12-15: Load Result [0, current] (Uniform/Temporary)
+ 16,17: Accumulator 0,1 Output [-1, last instruction]
+ 18,19: Multiplier 0,1 Output [-1, last instruction]
+ 20: Passthrough Output [-1, last instruction]
+ 21: Unused
+ 22: Complex Output [-1, last instruction]
+ 22: Identity/Passthrough (0 for add, 1 for multiply)
+ Accumulator 0,1 Input 1: add(a, -ident) means pass(a)
+ Multiplier 0,1 Input 1: mul(a, ident) means pass(a)
+ 23: Passthrough Output [-2, two instructions ago]
+ 24,25: Accumulator 0,1 Output [-2, two instructions ago]
+ 26,27: Multiplier 0,1 Output [-2, two instructions ago]
+ 28-31: Register 0 Output [-1, last instruction] (Register/Attribute)
Note: If attribute_load_en is disabled then the attribute slot can be used to load registers too.
Instruction format:
- 0-4: multiply ALU0 input 0
- 5-9: multiply ALU0 input 1
- 10-14: multiply ALU1 input 0
- 15-19: multiply ALU1 input 1
- 20: multiply ALU0 negate
- 21: multiply ALU1 negate
- 22-26: add ALU0 input 0
- 27-31: add ALU0 input 1
- 32-36: add ALU1 input 0
- 37-41: add ALU1 input 1
- 42: add ALU0 input 0 negate
- 43: add ALU0 input 1 negate
- 44: add ALU1 input 0 negate
- 45: add ALU1 input 1 negate
- 46-54: uniform/temporary/global (max 304) load
- 55-57: uniform offset register select
- 1 - temporary/uniform load offset 0
- 2 - temporary/uniform load offset 1
- 3 - temporary/uniform load offset 2
- 7 - no offset
- 58-61: attribute/register load
- 62: attribute load enable (load attribute in attribute slot)
- 63-66: register load
- 67: temporary store 0 enable
- 68: temporary store 1 enable
- 69: branch
- 70: branch target low (< 0x100)
- 71-73: varying/register/temporary store input 0
- 74-76: varying/register/temporary store input 1
- 77-79: varying/register/temporary store input 2
- 80-82: varying/register/temporary store input 3
- 0 - add ALU0 output
- 1 - add ALU1 output
- 2 - mul ALU0 output
- 3 - mul ALU1 output
- 4 - passthrough ALU output
- 6 - complex ALU output
- 7 - no input (do not store)
- 83-85: add ALU0/1 opcode
+ 0-4: Multiply 0 Input A
+ 5-9: Multiply 0 input B
+ 10-14: Multiply 1 Input A (Wide-Operation Input C)
+ 15-19: Multiply 1 Input B (Wide-Operation Input D)
+ 20: Multiply 0 Output Negate
+ 21: Multiply 1 Output Negate
+ 22-26: Accumulator 0 Input A
+ 27-31: Accumulator 0 Input B
+ 32-36: Accumulator 1 Input A
+ 37-41: Accumulator 1 Input B
+ 42: Accumulator 0 Input A Negate
+ 43: Accumulator 0 Input B Negate
+ 44: Accumulator 1 Input A Negate
+ 45: Accumulator 1 Input B Negate
+ 46-54: Load Address (Uniform/Temporary)
+ 55-57: Load Offset (Uniform/Temporary)
+ 0 - Address Register 0? (Never seen)
+ 1 - Address Register 1
+ 2 - Address Register 2
+ 3 - Address Register 3
+ 4-6 - Unknown (Never seen)
+ 7 - Unused (No offset)
+ 58-61: Register 0 Address (Register/Attribute)
+ 62: Register 0 Attribute (Load attribute in Register 0 unit)
+ 63-66: Register 1 Address
+ 67: Store 0 Temporary (Store Temporary in Store 0)
+ 68: Store 1 Temporary (Store Temporary in Store 1)
+ 69: Branch
+ 70: Branch Target Low (< 0x100)
+ 71-73: Store 0 Input X (Register/Varying/Temporary)
+ 74-76: Store 0 Input Y (Register/Varying/Temporary)
+ 77-79: Store 1 Input Z (Register/Varying/Temporary)
+ 80-82: Store 1 Input W (Register/Varying/Temporary)
+ 0 - Accumulator 0 Output
+ 1 - Accumulator 1 Output
+ 2 - Multiplier 0 Output
+ 3 - Multiplier 1 Output
+ 4 - Passthrough Output
+ 5 - Unknown
+ 6 - Complex Output
+ 7 - Unused (Don't store)
+ 83-85: Accumulator (0 & 1) opcode
0 - add
1 - floor
2 - sign
- 4 - src0 >= src1 / step(src1, src0)
- 5 - src0 < src1
- 6 - min/and
- 7 - max/or
+ 3 - unknown
+ 4 - greater-equal/step (a >= b)
+ 5 - less-than (src0 < src1)
+ 6 - min/logical and (a && b)
+ 7 - max/logical or (a || b)
note: abs(a) is implemented as max(a, -a)
- 86-89: complex ALU opcode
+ 86-89: Complex OpCode
For complex functions (rcp, sqrt, etc.), the inputs to the multiply ALU0 and
the input to the complex ALU are the same value.
0 - unused
- 2 - exp2
- 3 - log2
- 4 - inverse sqrt
- 5 - inverse
+ 2 - exp2 (Partial)
+ 3 - log2 (Partial)
+ 4 - inverse sqrt (Partial)
+ 5 - reciprocal (Partial)
9 - passthrough
- 12 - temporary store address
- 13 - temporary/uniform load offset 0 set
- 14 - temporary/uniform load offset 1 set
- 15 - temporary/uniform load offset 2 set
- 90-93: varying/register store 0
- 94: varying/register/temporary store 0 destination
- 0 - temporary/register
- 1 - varying
- 95-98: varying/register store 1
- 99: varying/register/temporary store 1 destination
- 0 - temporary/register
- 1 - varying
- 100-102: multiply ALU opcode
- 0 - multiply
+ 12 - Set Address Register 0 (Temporary Store address)
+ 13 - Set Address Register 1
+ 14 - Set Address Register 2
+ 15 - Set Address Register 3
+ 90-93: Store 0 Address (Varying/Register/Temporary)
+ 94: Store 0 Varying (Store Varying in Store 0)
+ 95-98: Store 1 Address (Varying/Register/Temporary)
+ 99: Store 1 Varying (Store Varying in Store 1)
+ 100-102: Multiply (0 & 1) OpCode
+ 0 - multiply (out = a * b)
1 - complex 1 (inverse, inverse sqrt, etc.)
takes all four inputs as arguments
3 - complex 2 (inverse, inverse sqrt, etc.)
takes first two inputs as arguments,
the other two are normal (multiply)
- 4 - mul0_src1 ? mul1_src0 : mul0_src0 (note: mul1_src1 = 21 because it is unused)
- 103-105: passthrough opcode
- 2 - pass
- 6 - clamp(input, uniform.x, uniform.y)
- 106-110: complex ALU input
- 111-115: passthrough input
- 116-119: unknown
- 0 - normal
- 12 - temporary write
- 13 - branch
- 120-127: branch target (absolute, 0 is 1st instruction of program)
+ 4 - select (out = (b ? a : c), wide operation)
+ 5-7: unknown
+ 103-105: Passthrough OpCode
+ 2 - passthrough (out = in)
+ 6 - clamp (out = max(min(in, uniform.x), uniform.y))
+ 0-1,3-5,7: unknown
+ 106-110: Complex Input
(Diff truncated)
Added a really pretty compiler diagram.
diff --git a/Fragment+Shader+Backend.mdwn b/Fragment+Shader+Backend.mdwn index a327b67..140f130 100644 --- a/Fragment+Shader+Backend.mdwn +++ b/Fragment+Shader+Backend.mdwn @@ -13,3 +13,7 @@ Allocation for Irregular Architectures](http://user.it.uu.se/~svenolof/wpo/Alloc It seems to me that the main problem isn't scheduling (mostly an issue of finding the right heuristics and doing the actual grunt work to see if you can add an instruction to a packet) or register allocation (thinking of using the above-linked algorithm), but how the two should interact. Mainly, the issue has to do with how to deal with register coalescing and spills. Due to the architecture's pipelined nature and abundance of pipeline registers, scheduling has to be able to change the semantics of the program. Furthermore, scheduling would be constrained in how it could pipeline together operations (replacing normal registers with pipeline registers) if register allocation were to be performed first, because it would be harder to determine if it's legal to replace a normal register with a pipeline register. On the other side, scheduling will have the tendency to "hide" certain reads and writes, either because a register was replaced with a pipeline register, or because the instruction writes to a register that isn't the overall destination for an instruction packet (for example, a varying load unit when an ALU is also being used). Certainly, the register allocator will want to take advantage of that reduction in register pressure. Therefore, it seems that the best option is to have an instruction scheduling pass before register allocation. The difficulty with that, though, is that modern graph-coloring register allocators expect to be able to change the program semantics, even in the middle of allocation. Iterated register coalescing, for example, interleaves register coalescing/copy folding passes into the process of reducing the interference graph. However, doing so changes the way that instructions can be ordered and gives new opportunities to the scheduler, and therefore can change the interference graph. Again, adding spill code (temporary reads and writes) can once again change the structure of the program and therefore the interference graph. This breaks the guarantee implicit in both stages that the interference graph won't be changed + +# Pretty Compiler Picture + +[<img src="http://img545.imageshack.us/img545/3179/compilerb.png">](http://img545.imageshack.us/img545/3179/compilerb.png)
Added attribute register loading.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index a4c77c0..833ebd6 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -451,6 +451,7 @@ These are the known inputs:
24, 25: two instructions ago, acc ALU0/1 results
26, 27: two instructions ago, mul ALU0/1 results
28-31: last instruction, attribute load result
+Note: If attribute_load_en is disabled then the attribute slot can be used to load registers too.
Instruction format:
@@ -474,8 +475,8 @@ Instruction format:
2 - temporary/uniform load offset 1
3 - temporary/uniform load offset 2
7 - no offset
- 58-61: attribute load
- 62: attribute load enable
+ 58-61: attribute/register load
+ 62: attribute load enable (load attribute in attribute slot)
63-66: register load
67: temporary store 0 enable
68: temporary store 1 enable
Updated vertex pipeline diagram (minor fix).
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 03cd653..a4c77c0 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -544,4 +544,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img191.imageshack.us/img191/6044/limavertexpipeline.png">](http://img191.imageshack.us/img191/6044/limavertexpipeline.png) +[<img src="http://img441.imageshack.us/img441/7590/limavertexpipelinen.png">](http://img441.imageshack.us/img441/7590/limavertexpipelinen.png)
Updated vertex pipeline diagram (minor fix).
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index cd8ac10..03cd653 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -544,4 +544,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img98.imageshack.us/img98/6044/limavertexpipeline.png">](http://img98.imageshack.us/img98/6044/limavertexpipeline.png) +[<img src="http://img191.imageshack.us/img191/6044/limavertexpipeline.png">](http://img191.imageshack.us/img191/6044/limavertexpipeline.png)
Updated vertex pipeline diagram.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 88fe378..cd8ac10 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -544,4 +544,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img440.imageshack.us/img440/6044/limavertexpipeline.png">](http://img440.imageshack.us/img440/6044/limavertexpipeline.png) +[<img src="http://img98.imageshack.us/img98/6044/limavertexpipeline.png">](http://img98.imageshack.us/img98/6044/limavertexpipeline.png)
Added 3rd load address register.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 6018d57..88fe378 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -472,6 +472,7 @@ Instruction format:
55-57: uniform offset register select
1 - temporary/uniform load offset 0
2 - temporary/uniform load offset 1
+ 3 - temporary/uniform load offset 2
7 - no offset
58-61: attribute load
62: attribute load enable
@@ -512,6 +513,7 @@ Instruction format:
12 - temporary store address
13 - temporary/uniform load offset 0 set
14 - temporary/uniform load offset 1 set
+ 15 - temporary/uniform load offset 2 set
90-93: varying/register store 0
94: varying/register/temporary store 0 destination
0 - temporary/register
Updated vertex pipeline diagram (minor fix).
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index ca10b66..6018d57 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -542,4 +542,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img546.imageshack.us/img546/6044/limavertexpipeline.png">](http://img546.imageshack.us/img546/6044/limavertexpipeline.png) +[<img src="http://img440.imageshack.us/img440/6044/limavertexpipeline.png">](http://img440.imageshack.us/img440/6044/limavertexpipeline.png)
Updated vertex pipeline diagram.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 84f482c..ca10b66 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -542,4 +542,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img13.imageshack.us/img13/6044/limavertexpipeline.png">](http://img13.imageshack.us/img13/6044/limavertexpipeline.png) +[<img src="http://img546.imageshack.us/img546/6044/limavertexpipeline.png">](http://img546.imageshack.us/img546/6044/limavertexpipeline.png)
Added multiple load offset registers.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 096fab9..84f482c 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -470,7 +470,8 @@ Instruction format:
45: add ALU1 input 1 negate
46-54: uniform/temporary/global (max 304) load
55-57: uniform offset register select
- 1 - temporary/uniform load offset
+ 1 - temporary/uniform load offset 0
+ 2 - temporary/uniform load offset 1
7 - no offset
58-61: attribute load
62: attribute load enable
@@ -509,7 +510,8 @@ Instruction format:
5 - inverse
9 - passthrough
12 - temporary store address
- 13 - temporary/uniform load offset
+ 13 - temporary/uniform load offset 0 set
+ 14 - temporary/uniform load offset 1 set
90-93: varying/register store 0
94: varying/register/temporary store 0 destination
0 - temporary/register
Changed vertex control opcodes to flags, updated vertex pipeline diagram.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 9f4706b..096fab9 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -475,12 +475,10 @@ Instruction format:
58-61: attribute load
62: attribute load enable
63-66: register load
- 67-70: control opcode
- 0 - nop
- 1 - temporary store
- 2 - ??? (something to do with temporaries...)
- 4 - branch to branch target + 256
- 12 - branch
+ 67: temporary store 0 enable
+ 68: temporary store 1 enable
+ 69: branch
+ 70: branch target low (< 0x100)
71-73: varying/register/temporary store input 0
74-76: varying/register/temporary store input 1
77-79: varying/register/temporary store input 2
@@ -542,4 +540,4 @@ Instruction format:
## Lima Vertex Pipeline
-[<img src="http://img72.imageshack.us/img72/6044/limavertexpipeline.png">](http://img72.imageshack.us/img72/6044/limavertexpipeline.png)
+[<img src="http://img13.imageshack.us/img13/6044/limavertexpipeline.png">](http://img13.imageshack.us/img13/6044/limavertexpipeline.png)
add/modify control opcode for vertex shaders
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 8499f2e..9f4706b 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -475,10 +475,12 @@ Instruction format:
58-61: attribute load
62: attribute load enable
63-66: register load
- 67: temporary store enable
- 68-70: control opcode
+ 67-70: control opcode
0 - nop
- 6 - branch
+ 1 - temporary store
+ 2 - ??? (something to do with temporaries...)
+ 4 - branch to branch target + 256
+ 12 - branch
71-73: varying/register/temporary store input 0
74-76: varying/register/temporary store input 1
77-79: varying/register/temporary store input 2
Updated vertex pipeline diagram.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 91ba0c6..8499f2e 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -540,4 +540,4 @@ Instruction format: ## Lima Vertex Pipeline -[<img src="http://img441.imageshack.us/img441/6044/limavertexpipeline.png">](http://img441.imageshack.us/img441/6044/limavertexpipeline.png) +[<img src="http://img72.imageshack.us/img72/6044/limavertexpipeline.png">](http://img72.imageshack.us/img72/6044/limavertexpipeline.png)
Added vertex pipeline diagram.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 3401d0f..91ba0c6 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -536,3 +536,8 @@ Instruction format:
12 - temporary write
13 - branch
120-127: branch target (absolute, 0 is 1st instruction of program)
+
+
+
+## Lima Vertex Pipeline
+[<img src="http://img441.imageshack.us/img441/6044/limavertexpipeline.png">](http://img441.imageshack.us/img441/6044/limavertexpipeline.png)
fixed texture fetch coordinate load, again
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 10bbb4e..3401d0f 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -108,8 +108,7 @@ There also exists various "pipeline registers" (four of them listed above) which
m - Mask, (0001 = float, 0011 = vec2, 0111 = vec3, 1111 = vec4)
d - Destination Register
- Mali200: Writing to register 15 here loads coordinates for the texture sampler.
- Mali400: The input to the sampler is always the output of this unit.
+ Note: writing to register 15 discards the output (used for loading texture coordinates)
i - Varying Index
a - alignment
It seems that varyings (floats) can be loaded in aligned groups of 1, 2, or 4.
@@ -154,6 +153,8 @@ There also exists various "pipeline registers" (four of them listed above) which
00111001000000000001ssssssssssssottttt00000b000000ccccccrrrrrr
+ The coordinates for the texture fetch are always the output of the varying load.
+
s - sampler index
o - sampler index register offset enable
c - sampler index offset register
Mali200/400 sampler co-ordinate difference documented.
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 86978b5..10bbb4e 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -108,7 +108,8 @@ There also exists various "pipeline registers" (four of them listed above) which
m - Mask, (0001 = float, 0011 = vec2, 0111 = vec3, 1111 = vec4)
d - Destination Register
- Note: writing to register 15 here loads coordinates for the texture sampler.
+ Mali200: Writing to register 15 here loads coordinates for the texture sampler.
+ Mali400: The input to the sampler is always the output of this unit.
i - Varying Index
a - alignment
It seems that varyings (floats) can be loaded in aligned groups of 1, 2, or 4.
remove bogus restriction from fragment shader intro, added section on pipeline registers
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 406b7ed..86978b5 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -2,7 +2,7 @@ ## Fragment Shader Architecture -The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. Usually, each unit can affect/produce results which are used by all later units (see "Lima Fragment Pipeline" below). In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. The result of the multiplication unit can be used as the input of the addition unit, in order to implement Fused Multiply-Add and other combinations. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. Note that, as shown by the pipeline diagram, there is only one write port, and therefore only one (vector) register can be written to per instruction. The register written for the entire instruction is chosen by the machine as the register written by the last enabled unit. Although earlier units can write to different registers, the effects of those writes will be ignored beyond the current instruction. To overcome the pipeline stall issues inherent in such a long pipeline, the architecture is likely barrelled and interleaves execution of a large number of fragments at once, and scheduling is done by the machine in order to minimize stalls. +The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. Each unit can affect/produce results which are used by all later units, as if all 6 registers are passed between each unit in the pipeline (see "Lima Fragment Pipeline" below). Furthermore, to reduce register pressure, there are a number of "pipeline registers". A pipeline register is a direct connection between two units in the pipeline, in addition to the normal registers which are passed between every unit. For more details on registers (including pipeline registers), see the "Registers" section below. To overcome the pipeline stall issues inherent in such a long pipeline (~256 stages), the architecture is likely barrelled and interleaves execution of a large number (~256) of fragments at once, and scheduling is done by the machine in order to minimize stalls. The instruction stream is compressed down from a maximum of 18-words per instruction dependant on what units are in use. The remaining bits give each unit individual instructions and constants.
diff --git a/index.mdwn b/index.mdwn index 151079d..5544f70 100644 --- a/index.mdwn +++ b/index.mdwn @@ -46,6 +46,13 @@ Documentation for the shader compiler, and the initial investigation of the inst The documentation is currently kept in the wiki, pages of interest are: +Original (Falanx) datasheets: + +* [Mali200 Product](http://web.archive.org/web/20060515063019/http://www.falanx.no/download/Mali200_Product.pdf) +* [Mali Geometry Product Spec](http://web.archive.org/web/20060515063211/http://www.falanx.no/download/Mali%20Geometry%20Product%20Spec%20USL.pdf) + +Lima Documents + * [[Lima+Assembler]] * [[Lima+ISA]] * [[Fragment+Assembly+Syntax]]
Added info on dumping malisc symbols
diff --git a/Mali_Offline_Shader_Compiler.mdwn b/Mali_Offline_Shader_Compiler.mdwn index f9b759c..350f472 100644 --- a/Mali_Offline_Shader_Compiler.mdwn +++ b/Mali_Offline_Shader_Compiler.mdwn @@ -14,3 +14,11 @@ Full documentation can be found at [[MBS+File+Format]]. There's a tool in our git tree called mbs_dump which will dump out an MBS file in a readable form, it also takes various options for how to disassemble/decompile the fragment/vertex code. For more info on this tool and how to use it read [[Lima+Assembler]]. + +#Reverse Engineering + +It turns out that the Mali developers were kind enough to leave all the debug symbols in the final binary, and their underlying code is clean enough that it's possible to see a lot of what's going on just via the function/symbol names. + +To view the symbols do the following: + + objdump -t `which malisc`
add gl_PointSize
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn
index 0b2ae31..406b7ed 100644
--- a/Lima+ISA.mdwn
+++ b/Lima+ISA.mdwn
@@ -403,9 +403,15 @@ Akin to the fragment shader, there are also temporaries, which unlike registers
gl_Position is implemented internally as a varying; it seems that it is hard-coded to varying 0. The compiler implements some transforms internally to convert the value calculated for gl_Position in the shader to the actual value sent to the hardware. In particular (in pseudocode):
+ uniform vec4 gl_mali_ViewportTransform[2];
gl_position_actual.w = clamp(1.0 / gl_Position.w, -1e10, 1e10);
gl_position_actual.xyz = gl_Position.xyz * gl_position_actual.w * gl_mali_ViewportTransform[0].xyz + gl_mali_ViewportTransform[1].xyz;
+gl_PointSize is also implemented internally as a varying. However, its position doesn't appear to be fixed. There are also some transforms involved:
+
+ uniform vec4 gl_mali_PointSizeParameters;
+ gl_PointSize_actual = clamp(gl_PointSize, gl_mali_PointSizeParameters.x, gl_mali_PointSizeParameters.y) * gl_mali_PointSizeParameters.z;
+
## Complex functions
Complex functions are implemented using multiply ALU opcodes 1 and 3, as well as various complex ALU opcodes. The computation looks like this:
Uploaded malisc symbol table.
diff --git a/malisc+symbols.c b/malisc+symbols.c new file mode 100644 index 0000000..abc076a --- /dev/null +++ b/malisc+symbols.c @@ -0,0 +1,1091 @@ + +/bin/malisc: file format elf32-i386 + +SYMBOL TABLE: +08048134 l d .interp 00000000 .interp +08048148 l d .note.ABI-tag 00000000 .note.ABI-tag +08048168 l d .note.gnu.build-id 00000000 .note.gnu.build-id +0804818c l d .hash 00000000 .hash +080482d4 l d .gnu.hash 00000000 .gnu.hash +08048300 l d .dynsym 00000000 .dynsym +080485b0 l d .dynstr 00000000 .dynstr +08048732 l d .gnu.version 00000000 .gnu.version +08048788 l d .gnu.version_r 00000000 .gnu.version_r +080487f8 l d .rel.dyn 00000000 .rel.dyn +08048810 l d .rel.plt 00000000 .rel.plt +08048940 l d .init 00000000 .init +08048970 l d .plt 00000000 .plt +08048be0 l d .text 00000000 .text +0808fd7c l d .fini 00000000 .fini +0808fd98 l d .rodata 00000000 .rodata +08096c18 l d .eh_frame 00000000 .eh_frame +08097efc l d .ctors 00000000 .ctors +08097f04 l d .dtors 00000000 .dtors +08097f0c l d .jcr 00000000 .jcr +08097f10 l d .dynamic 00000000 .dynamic +08097ff0 l d .got 00000000 .got +08097ff4 l d .got.plt 00000000 .got.plt +08098098 l d .data 00000000 .data +080980a0 l d .bss 00000000 .bss +00000000 l d .comment 00000000 .comment +00000000 l df *ABS* 00000000 crtstuff.c +08097efc l O .ctors 00000000 __CTOR_LIST__ +08097f04 l O .dtors 00000000 __DTOR_LIST__ +08097f0c l O .jcr 00000000 __JCR_LIST__ +08048c10 l F .text 00000000 __do_global_dtors_aux +080980c4 l O .bss 00000001 completed.7021 +080980c8 l O .bss 00000004 dtor_idx.7023 +08048c70 l F .text 00000000 frame_dummy +00000000 l df *ABS* 00000000 crtstuff.c +08097f00 l O .ctors 00000000 __CTOR_END__ +08096c18 l O .eh_frame 00000000 __FRAME_END__ +08097f0c l O .jcr 00000000 __JCR_END__ +0808fd50 l F .text 00000000 __do_global_ctors_aux +00000000 l df *ABS* 00000000 driver.c +00000000 l df *ABS* 00000000 commandline.c +08049064 l F .text 0000022e parse_hardware_revision +08090720 l O .rodata 0000000c CSWTCH.41 +00000000 l df *ABS* 00000000 essl_test_system.c +00000000 l df *ABS* 00000000 compiler.c +08049d78 l F .text 00000059 examine_error +08049edf l F .text 00000088 allocate_compiler_context +08090748 l O .rodata 00000008 CSWTCH.7 +00000000 l df *ABS* 00000000 error_reporting.c +08090bc0 l O .rodata 00000020 CSWTCH.76 +0804a234 l F .text 0000003d increase_buf +0804a271 l F .text 0000005d write_internal_compiler_error +08090a58 l O .rodata 00000168 CSWTCH.73 +00000000 l df *ABS* 00000000 essl_list.c +0804aa4b l F .text 000000b6 split_and_merge +00000000 l df *ABS* 00000000 essl_mem.c +0804abc7 l F .text 0000004f allocate_block +00000000 l df *ABS* 00000000 compiler_options.c +00000000 l df *ABS* 00000000 essl_stringbuffer.c +0804af7b l F .text 0000008a _essl_string_buffer_reserve +00000000 l df *ABS* 00000000 essl_target.c +00000000 l df *ABS* 00000000 output_buffer.c +00000000 l df *ABS* 00000000 frontend.c +0804b85e l F .text 000000a9 function_partial_sort +00000000 l df *ABS* 00000000 typecheck.c +0804bf00 l F .text 00000047 type_is_or_has_sampler +0804bf47 l F .text 0000003c type_is_or_has_array +0804c012 l F .text 0000015e check_lvalue +0804c170 l F .text 00000102 typecheck_array_size +0804e5f7 l F .text 0000009a typecheck +00000000 l df *ABS* 00000000 preprocessor.c +0804e7a1 l F .text 000000db read_scanner_token +0804e87c l F .text 00000045 push_if_stack_entry +0804e8c1 l F .text 0000004a encounter_command +08092818 l O .rodata 0000009c command_strings +0804e90b l F .text 0000007d add_predefined_macro +0804ea5a l F .text 0000011c unary +0804efec l F .text 0000004f logical_inclusive_or +0804eb76 l F .text 0000010e multiplicative +0804ec84 l F .text 00000072 additive +0804ecf6 l F .text 0000009c bitwise_shift +0804ed92 l F .text 000000b2 relational +0804ee44 l F .text 00000081 equality +0804eec5 l F .text 00000046 bitwise_and +0804ef0b l F .text 00000046 bitwise_exclusive_or +0804ef51 l F .text 00000046 bitwise_inclusive_or +0804ef97 l F .text 00000055 logical_and +0804f03b l F .text 0000006a get_pp_token +0804f0a5 l F .text 000000aa peek_pp_token +0804f14f l F .text 00000137 defined_operator +0804f286 l F .text 00000065 generate_integer_token +0804f2eb l F .text 00000cda replace_macro +0804ffc5 l F .text 0000037e directive_constant_expression +08050343 l F .text 000003da skip_tokens +00000000 l df *ABS* 00000000 lang.c +08092ac0 l O .rodata 00000024 extension_names +08092ae4 l O .rodata 0000000c CSWTCH.12 +00000000 l df *ABS* 00000000 callgraph.c +08052434 l F .text 00000061 record_func +08052495 l F .text 00000167 note_calls +00000000 l df *ABS* 00000000 precision.c +08052740 l F .text 00000022 type_has_precision_qualification +08092b38 l O .rodata 00000024 CSWTCH.24 +08052762 l F .text 00000053 get_default_precision_for_type +080527b5 l F .text 0000004e new_type_conversion +08052803 l F .text 000000a3 insert_bitwise_casts_for_children_with_specific_type +080528a6 l F .text 0000005c insert_bitwise_casts_for_children +08052902 l F .text 0000033b insert_bitwise_casts +08052c3d l F .text 0000014f get_type_with_set_precision +08052d8c l F .text 0000006e set_precision_qualifier_for_node +08052dfa l F .text 000000b4 propagate_precision_upward +08052eae l F .text 000000f7 propagate_default_precision_upward +080534ae l F .text 000000e8 calculate_precision +00000000 l df *ABS* 00000000 global_variable_inlining.c +0805367c l F .text 000000d5 find_and_rewrite_nodes +08053751 l F .text 0000032d visit_function +08092bf0 l O .rodata 00000008 CSWTCH.8 +00000000 l df *ABS* 00000000 middle.c +00000000 l df *ABS* 00000000 control_deps.c +08053ea4 l F .text 00000065 symbol_for_node +08053f09 l F .text 0000003c add_dependency +080543a4 l F .text 000000df addresses_identical +00000000 l df *ABS* 00000000 optimise_loop_entry.c +080548e0 l F .text 00000071 clone_exp +08054951 l F .text 00000192 optimise_loop_entry_stmt +00000000 l df *ABS* 00000000 optimise_inline_functions.c +08054b2f l F .text 00000268 clone_node +08054d97 l F .text 000001fe clone_basic_block +08054f95 l F .text 00000045 remove_control_dependent_op_node +00000000 l df *ABS* 00000000 optimise_basic_blocks.c +00000000 l df *ABS* 00000000 optimise_constant_fold.c +08055a20 l F .text 00000141 constant_fold +00000000 l df *ABS* 00000000 eliminate_complex_ops.c +08055f98 l F .text 0000005c is_expensive_matrix_result +08055ff4 l F .text 000000f6 replace_returns +080560ea l F .text 00000066 create_index_int_constant +08056150 l F .text 00000d4c process_single_node +08092c50 l O .rodata 00000078 CSWTCH.131 +080573af l F .text 000001a2 explode_struct_comparison +08057551 l F .text 0000011f store_reload_variable +08057670 l F .text 000001e0 rewrite_component_wise_matrix_op +08056e9c l F .text 0000009c process_node +00000000 l df *ABS* 00000000 ssa.c +08057850 l F .text 0000009d var_hash_fun +080578ed l F .text 0000003c node_stack_push +08057929 l F .text 000000e0 insert_phi_node +08057a09 l F .text 0000005e clone_address +08057a67 l F .text 00000064 create_dummy_symbol +08057acb l F .text 0000005b node_stack_get_or_create +08057b26 l F .text 00000059 node_stack_node_get_or_create +08057b7f l F .text 00000050 node_stack_get_or_create_top +08057bcf l F .text 00000334 ssa_rename +08058265 l F .text 00000129 var_equal_fun +00000000 l df *ABS* 00000000 conditional_select.c +00000000 l df *ABS* 00000000 static_cycle_count.c +00000000 l df *ABS* 00000000 mali200_target.c +08058b0c l F .text 00000007 cycles_for_jump +08058b13 l F .text 0000002b cycles_for_block +08058b3e l F .text 0000000a is_variable_in_indexable_memory +00000000 l df *ABS* 00000000 mali200_type.c +08058ca4 l F .text 00000078 internal_type_alignment +00000000 l df *ABS* 00000000 mali200_driver.c +00000000 l df *ABS* 00000000 mali200_instruction.c +08092e78 l O .rodata 00000064 CSWTCH.113 +08092edc l O .rodata 00000040 CSWTCH.116 +08059443 l F .text 000004fa handle_input +00000000 l df *ABS* 00000000 mali200_slot.c +0805a810 l F .text 00000148 can_be_replaced_by +08092f1c l O .rodata 00000010 CSWTCH.12 +00000000 l df *ABS* 00000000 mali200_regalloc.c +0805af54 l F .text 0000008f init_regalloc_context +0805afe3 l F .text 00000086 reset_allocations +0805b069 l F .text 0000006b prepare_ranges_for_coloring +0805b155 l F .text 00000110 allocate_all_ranges +00000000 l df *ABS* 00000000 mali200_register_integration.c +0805b518 l F .text 00000390 integrate_instruction +08092f60 l O .rodata 0000015c CSWTCH.9 +00000000 l df *ABS* 00000000 mali200_spilling.c +0805b95c l F .text 00000143 put_load +0805ba9f l F .text 000000ab put_store +0805be5f l F .text 0000012e complete_spill_range +080930bc l O .rodata 00000006 spillname +080930c4 l O .rodata 00000040 mask_n_comps +00000000 l df *ABS* 00000000 mali200_word_insertion.c +0805c37c l F .text 0000007a insert_cycle_into_instructions +00000000 l df *ABS* 00000000 mali200_emit.c +0805c7a4 l F .text 00000065 in_sub_reg +0805c809 l F .text 000000c0 opcode_of_mult +080933a8 l O .rodata 00000014 CSWTCH.93 +0805c8c9 l F .text 00000154 opcode_of_add (Diff truncated)
index: add link to fragment shader backend doc
diff --git a/index.mdwn b/index.mdwn index 2561862..151079d 100644 --- a/index.mdwn +++ b/index.mdwn @@ -52,6 +52,7 @@ The documentation is currently kept in the wiki, pages of interest are: * [[Vertex+Disassembly]] * [[Mali_Offline_Shader_Compiler]] * [[MBS+File+Format]] +* [[Fragment+Shader+Backend]] ## Contribute ===
add page on fragment shader backend
diff --git a/Fragment+Shader+Backend.mdwn b/Fragment+Shader+Backend.mdwn new file mode 100644 index 0000000..a327b67 --- /dev/null +++ b/Fragment+Shader+Backend.mdwn @@ -0,0 +1,15 @@ +## Notes on fragment shader backend + +I'm just using this page for now to collect my thoughts on how to write the backend for the fragment processor, and to collect links that may be useful for future reference. I'm focusing on the backend, since it's the most difficult part of the compiler due to the fragment shader's novel architecture. + +# Links + +* [Retargetable Graph-Coloring Register +Allocation for Irregular Architectures](http://user.it.uu.se/~svenolof/wpo/AllocSCOPES2003.20030626b.pdf) - used by various mesa backends, should work well for us +* [Iterated Register Coalescing](http://www.cs.cmu.edu/afs/cs/academic/class/15745-s07/www/papers/george.pdf) - standard technique for register coalescing + +# Thoughts + +It seems to me that the main problem isn't scheduling (mostly an issue of finding the right heuristics and doing the actual grunt work to see if you can add an instruction to a packet) or register allocation (thinking of using the above-linked algorithm), but how the two should interact. Mainly, the issue has to do with how to deal with register coalescing and spills. Due to the architecture's pipelined nature and abundance of pipeline registers, scheduling has to be able to change the semantics of the program. Furthermore, scheduling would be constrained in how it could pipeline together operations (replacing normal registers with pipeline registers) if register allocation were to be performed first, because it would be harder to determine if it's legal to replace a normal register with a pipeline register. On the other side, scheduling will have the tendency to "hide" certain reads and writes, either because a register was replaced with a pipeline register, or because the instruction writes to a register that isn't the overall destination for an instruction packet (for example, a varying load unit when an ALU is also being used). Certainly, the register allocator will want to take advantage of that reduction in register pressure. Therefore, it seems that the best option is to have an instruction scheduling pass before register allocation. + +The difficulty with that, though, is that modern graph-coloring register allocators expect to be able to change the program semantics, even in the middle of allocation. Iterated register coalescing, for example, interleaves register coalescing/copy folding passes into the process of reducing the interference graph. However, doing so changes the way that instructions can be ordered and gives new opportunities to the scheduler, and therefore can change the interference graph. Again, adding spill code (temporary reads and writes) can once again change the structure of the program and therefore the interference graph. This breaks the guarantee implicit in both stages that the interference graph won't be changed
added note about restriction on register writes
diff --git a/Lima+ISA.mdwn b/Lima+ISA.mdwn index 0cf5cbf..0b2ae31 100644 --- a/Lima+ISA.mdwn +++ b/Lima+ISA.mdwn @@ -2,7 +2,7 @@ ## Fragment Shader Architecture -The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. Usually, each unit can affect/produce results which are used by all later units (see "Lima Fragment Pipeline" below). In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. The result of the multiplication unit can be used as the input of the addition unit, in order to implement Fused Multiply-Add and other combinations. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. To overcome the pipeline stall issues inherent in such a long pipeline, the architecture is likely barrelled and interleaves execution of a large number of fragments at once, and scheduling is done by the machine in order to minimize stalls. +The architecture consists of a large pipeline, consisting of a number of vector and scalar units which can be enabled and disabled by the control word. Usually, each unit can affect/produce results which are used by all later units (see "Lima Fragment Pipeline" below). In particular, there are two vector ALU's, one of which can do addition, another which can do multiplication. The result of the multiplication unit can be used as the input of the addition unit, in order to implement Fused Multiply-Add and other combinations. There are also two similar scalar ALU's, and the "combiner" capable of executing scalar-vector multiplies as well as various complex/transcdental functions (sqrt, rcp, sin, cos, exp2, etc.). Furthermore, there are varying, uniform/temporary, and texture load units, a temporary store unit, and a branching unit for implementing jumps. Note that, as shown by the pipeline diagram, there is only one write port, and therefore only one (vector) register can be written to per instruction. The register written for the entire instruction is chosen by the machine as the register written by the last enabled unit. Although earlier units can write to different registers, the effects of those writes will be ignored beyond the current instruction. To overcome the pipeline stall issues inherent in such a long pipeline, the architecture is likely barrelled and interleaves execution of a large number of fragments at once, and scheduling is done by the machine in order to minimize stalls. The instruction stream is compressed down from a maximum of 18-words per instruction dependant on what units are in use. The remaining bits give each unit individual instructions and constants.
diff --git a/Hardware.mdwn b/Hardware.mdwn index 7e9fb86..e240b13 100644 --- a/Hardware.mdwn +++ b/Hardware.mdwn @@ -58,6 +58,9 @@ A Dual Core ARM Cortex A9 running at 1GHz, which includes a Mali-400 MP1. Most e Plenty of information, which might be very snowball specific, can be found on [the igloo community website](http://igloocommunity.org/). +There is a pre-built image of Linaro Android with the Lima(re) demo included. There are three files, [system.tar.bz2](http://snapshots.linaro.org/android/~joe-burmeister/test-lima-snowball/27/target/product/snowball/system.tar.bz2), [boot.tar.bz2](http://snapshots.linaro.org/android/~joe-burmeister/test-lima-snowball/27/target/product/snowball/boot.tar.bz2) and [userdata.tar.bz2](http://snapshots.linaro.org/android/~joe-burmeister/test-lima-snowball/27/target/product/snowball/userdata.tar.bz2) that you put onto a SD card following [Linaro's image installation instructions](https://wiki.linaro.org/Platform/Android/ImageInstallation). + + ## Samsung Exynos The [Samsung Exynos](http://en.wikipedia.org/wiki/Exynos) 42xx is a range of ARM Cortex A9 devices clocked between 1.2 and 1.8GHz. They are the only devices currently carrying a Mali-400MP4. The Exynos of course stars in the top selling, high end Samsung android based smartphones and tablets. The best sold phone of 2011, the Samsung Galaxy S II, comes with an Exynos. A [Single Board Computer with a 4210, called origen,](http://www.origenboard.org/) is available with android and ubuntu support.
diff --git a/Mali_Offline_Shader_Compiler.mdwn b/Mali_Offline_Shader_Compiler.mdwn index af2a33f..f9b759c 100644 --- a/Mali_Offline_Shader_Compiler.mdwn +++ b/Mali_Offline_Shader_Compiler.mdwn @@ -12,4 +12,5 @@ Full documentation can be found at [[MBS+File+Format]]. #Extracting the program binary -So far, I've written a very simple C program which extracts the program itself from the malisc output - it just gets the data from the DBIN tag. I've uploaded it [here](http://pastebin.com/aF9c1GKG) for now. +There's a tool in our git tree called mbs_dump which will dump out an MBS file in a readable form, it also takes various options for how to disassemble/decompile the fragment/vertex code. +For more info on this tool and how to use it read [[Lima+Assembler]].
diff --git a/index.mdwn b/index.mdwn index 9923b7e..2561862 100644 --- a/index.mdwn +++ b/index.mdwn @@ -32,14 +32,14 @@ Documentation for the shader compiler, and the initial investigation of the inst ### [Mali-400](Hardware#Mali-400): -* [AMLogic 8726-M](Hardware#AMLogic+8726-M) +* [AMLogic 8726-M](Hardware#AMLogic+8726-M) (Zenithink C71) * [Allwinner A10](Hardware#Allwinner+A10) * [ST-Ericsson Novathor](Hardware#ST-Ericsson+Novathor) -* [Samsung Exynos](Hardware#Samsung+Exynos) +* [Samsung Exynos](Hardware#Samsung+Exynos) (Galaxy S2/S3/Tab) ### [Mali-200](Hardware#Mali-200): -* [Telechips 8902](Hardware#Telechips+8902), [8803](Hardware#Telechips+8803) +* [Telechips 8902](Hardware#Telechips+8902), [8803](Hardware#Telechips+8803) (Haipad MID701) ## Documentation ===