Art 3D/VR challenge – week 2 - looking for performance
Previous week | Next week coming soon... |
Introduction
This article is continuation of Art 3D/VR challenge.
Please note, I don't describe all possible technologies, knowledge, science, models and approaches you can find on the search engine.
FOLLOWING INFORMATION IS MY PERSONAL EXPERIENCE I HAVE GAINED OVER LAST 10+ YEARS OF MY CAREER. BY APPLYING PRESENTED MODELS AND STACK I CAN GUARANTEE TO ACHIEVE FAIR ENOUGH RESULTS.
Performance utilisation
As we are going to process a lot of data; we have to make sure about high availability, reliability and scalability on each level of hardware and software stack.
There are the most important characteristics we should keep in mind when selecting the right platform:
- Short response time for a given piece of work
- High throughput (rate of processing work)
- Low utilization of computing resource(s)
- High availability of the computing system or application
- Fast (or highly compact) data compression and decompression
- High bandwidth
- Short data transmission time
Basically, each characteristic affects hardware and software.
REQUIREMENTS
In previous blogpost "Abstract overview" I have listed and compared some of hardware and software technologies, techniques and standards we could consider in this challenge. In current blogpost I will short the list with the most important hardware and software components we should to include in the VR device.
Hardware
The device has to be very efficient bridge, converter and gateway for media. To achieve this goal we have to make sure the interfaces provide highest possible throughput. There is the list of basic interfaces we should consider:
Interfaces | Priority | Description |
CSI-2 | HIGH | min. 2 for v1 |
USB 3.0 | HIGH | min. 2 for v1 |
HDMI output | HIGH | min. 1 for v1 |
HDMI input | LOW | is optional in v1 |
AUDIO output | LOW | is optional in v1 |
AUDIO input | HIGH | min. 1 for v1 |
Ethernet | HIGH | min. 1 for v1 |
PCIe | MEDIUM | is optional in v1; min. 1 for v2 |
GPIO | HIGH | min. 8 pins for v1 |
After collecting RAW data on the memory we have to make sure to compute and push it out with highest possible rate. As we gonna collect multiple FullHD RAW streams, process, encode and push it out; we require extremely high BUS/RAM throughput and hardware acceleration on each level of media processing.
Computing | Priority | Description |
CPU multicore | HIGH | min. 4 cores for v1 |
GPU | HIGH | high efficient architecture is required |
Dedicated Video Chip | HIGH | can be integrated in CPU or GPU chip for v1 |
DDR4-RAM | MEDIUM | faster is better |
eMMC | HIGH | high IO embedded storage |
Basically the perfect hardware setup should comes with full software stack. There are many producers like Intel, Nvidia, AMD which provides such solutions.
Software
Software stack is the most important part. We should choose the platform which comes with the most advanced, most up to date science as a software.
Frameworks | Priority | Description |
Graphics | HIGH | OpenGL preferred |
Vision | HIGH | VisionWorks and OpenCV preferred |
Parallel computing | HIGH | CUDA and OpenCL preferred |
Multimedia | HIGH | GStreamer, OpenMAX preferred |
Deep learning | LOW | optional for v1, preferred cuDNN |
Standards | Priority | Description |
VAAPI | HIGH | Ubuntu 14.04 or higher |
VDPAU | HIGH | Ubuntu 14.04 or higher |
GLX | HIGH | Ubuntu 14.04 or higher |
TCP/UDP | HIGH | Ubuntu 14.04 or higher |
As the operating system I would recommend Ubuntu in the latest version (15.04/Vivid). It comes with the up to date libraries for all required technologies by this challenge.
Physical
Smaller is better! The challenge is about bringing portable, energy efficient, plug&play device. We should at least focus and the size, noise and power consumption.
Component | Priority | Description |
Small size motherboard | HIGH | best is ITX or smaller |
FAN and radiator | HIGH | best without FAN, small radiator |
Support
Ideally if the supplier of hardware provides world wide full support for all segments.
Manufacture
Ideally if supplier of components provides long term products cycle.
Budget
We should focus on low budget solutions with aim on customer market.
MARKET OVERVIEW
Please have a look on very short list of available mini computers. They differ by size, power consumption, number of cores, GPU model, hardware multimedia support, performance of interfaces, storage and price!
I will give a short comment for pros & cons of each one. Please follow by URL to gain more knowledge.
96Boards HiKey Board (LeMaker)
See more: http://www.96boards.org/products/ce/hikey/start/
Pros: Octa-Core 64bit CPU, 2GB DDR3, Bluetooth
Cons: missing GPU, only USB 2.0, no CSI-2, no Ethernet
Verdict: NO, missing graphic acceleration and efficient camera ports
SNAPDRAGON 805 DEVELOPMENT KIT
See more: http://shop.intrinsyc.com/collections/product-development-kits
Pros: Adreno™ 420 GPU, Hexagon™ DSP, Krait® 450 CPU quad-core 2.5GHz, 2x MIPI-DSI 4-lane, MIPI-CSI 4-lane
Cons: no VP8 hardware encoding, single USB 3.0, high price
Verdict: YES, this platform provides 90% of features we need for 3D/VR device
Nvidia Development Kit - Jetson TK1
See more: http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html
Pros: NVIDIA 4-Plus-1™ Quad-Core ARM® Cortex™-A15 CPU, NVIDIA Kepler GPU with 192 CUDA Cores, hardware VP8 encoding, low price
Cons: single 2 lane CSI-2, single USB 3.0
Verdict: YES, this platform provides 75% of features we need for 3D/VR device
Nvidia Development Kit - Jetson TX1
See more: http://www.nvidia.com/object/jetson-tx1-module.html
Pros: 64-bit ARM® A57 CPUs, 1 TFLOP/s 256-core with NVIDIA Maxwell™ Architecture, 4 GB LPDDR4 | 25.6 GB/s, Up to 6 cameras | 1400 Mpix/s, VP8 hardware encoding
Cons: high price
Verdict: YES, this platform provides more than 100% of features we need for 3D/VR device
Gigabyte GeForce Mini-ITX with ITX platform
See more: http://www.gigabyte.com/products/product-page.aspx?pid=5252
This is very flexible and configurable ITX platform. We can install many types of CPUs, RAM and GPU chipsets.
Pros: configurable, supports all features
Cons: high price, quite big
Verdict: YES, this platform provides more than 100% of features we need for 3D/VR
Mali-T604 Low-cost Development Board
See more: http://malideveloper.arm.com/news/mali-t604-low-cost-development-board/
Pros: Quad-core Mali-T604 GPU
Cons: Dual-core Cortex-A15 CPU, only OpenGL ES 2.0, no USB 3.0, only CSI v1, no VP8 hardware encoder
Verdict: NO, missing OpenGL 3+ and efficient camera ports
QUICK BENCHMARK
My research lead me to give a try with Nvidia Development Kit - Jetson TK1. Comparing most of development kits available on the market (right now) the Jetson TK1 seems to be a good match for initial benchmarking. It is built with mobile performance in mind, supported by huge community and under continues development of Nvidia researchers.
Jetson TK1 is built on top of Tegra K1 chipset based on "NVIDIA Kepler GPU with 192 CUDA Cores". Supported by "NVIDIA 4-Plus-1™ Quad-Core ARM® Cortex™-A15 CPU" gives a lot of flexibility and performance in media processing for decoding and encoding up to 2160p.
Nvidia’s Tegra K1 (codenamed "Logan") features ARM Cortex-A15 cores in a 4+1 configuration similar to Tegra 4, or Nvidia's 64-bit Project Denver dual-core processor as well as a Kepler graphics processing unit with support for Direct3D 12, OpenGL ES 3.1, CUDA 6.5 and OpenGL 4.4/OpenGL 4.5.[59] Nvidia claims that it outperforms both the Xbox 360 and the PS3, whilst consuming significantly less power
What does it mean for us?
In short the goal is to capture 2 RAW video streams of 1080p, stitch them, process by video filters, add OpenGL features and output to media format as VP8/Opus over network and digital media over HDMI.
Development Kit comes with pre-installed Ubuntu for Tegra R21.4. It has pre-installed core of GStreamer multimedia framework in version 1.2.4. Nvidia delivers also gst-omx plugins which allows to use all features of Tegra encoding/decoding and also to easily operate on CSI-2/USB (cameras) and HDMI interfaces end points.
In my opinion Nvidia solutions is exactly what is needed for portable 3D/VR device. I decided to take the next step and decided on my...
MY FINAL CHOICE
I have made very intensive testing and benchmarking of Jetson TK1. It looks very promising on every level. Hardware is efficient enough to handle RAW media via interfaces and efficient enough to process, encode and stream video/audio in realtime up to 30fps.
Taking all aspects and facts into account I am going to bet on the next generation of Jetson called TX1.
NVIDIA Jetson TX1 with GPU-accelerated parallel processing is the world’s leading embedded visual computing platform. It features high-performance, low-energy computing for deep learning and computer vision making the Jetson platform ideal for compute-intensive embedded projects like drones, autonomous robotic systems, Advanced Driver Assistance Systems (ADAS), mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson TX1 to explore the future of embedded computing.
Have a brief look on the facts about Jetson TX1 platform which fits all requirements of VR device challenge:
SUPER COMPUTING PLATFORM...
...AND VERY TINY...
...AND WITH INCREDIBLE SOFTWARE STACK...
Jetson TX1 feels to be perfect match for stereoscopic device. It will be released to the market in middle of March 2016. It brings a higher efficiency, next generation of ARM CPU, next generation of Nvidia GPU, next generation of DDR RAM, next generation of CSI, next generation of USB, next generation of Ubuntu for Tegra, next generation of software stack and many many other improvements!
SUMMARY
I am very happy with current state of project. Taking into account all of my research I am pretty sure to deliver 3D/VR advanced device in next 12-16 weeks.
I am about to order Jetson TX1 which will be released in Europe in March 2016. Hopefully it can be delivered to London (where I relocate in 3 days) on time!
For now, I will focus on more detailed benchmarking of Jetson TK1.
Next step
During the next week I will focus on looking into detailed benchmarking and performance characteristics of Jetson platforms...
Topic: Art 3D/VR challenge – week 3 - FullHD processing on portable GPU vs CPU
Contribution
Feel free to contact me if you are interested in meeting the team and contribution to this project in any programming language (go, php, ruby, js, node.js, objective-c, java...). This project is parked on Github.
See my contact page if required.
Resources
Multiple parts of my blogpost have its source in remote articles, blogposts and wiki for which I have no rights. I am not able to link all external sources to my blogpost. I would like to say thank you to everyone who shares the knowledge publicly. If you think I have illegally used any of your thoughts, products, patents please let me know and I will fix the issue asap.
© COPYRIGHT KRZYSZTOF STASIAK 2016. ALL RIGHTS RESERVED
Comments