Learning-based System Prototyping and Performance Optimization for Large-Scale Networked Computer Systems
Our research interests are in the area of large-scale networked computer systems, ranging from social media networks, cloud computing platforms, green data centre and big data systems. Leveraging a unique background of rigorous analytical training at MIT and system engineering experience at Cisco, I have been leading our team to bridge the gap between theory and practice, by extending theoretical insights into system prototyping and performance optimization. At NTU, we have worked on applications of machine learning, optimization theory, queuing theory and information theory, to tackle practical challenges in a variety of Internet-scale system projects, including a social TV platform, a modular data centre testbed and a big-data solution over GPGPU virtualization, to name a few. In this process, we have refined a research framework, called learning-based system prototyping and performance optimization for large-scale networked computing systems, in which some of these systems have leaped over the barrier between academia and industry to become commercial products and services.
Learning-based System Prototyping and Performance Optimization
At NTU, we have been refining a learning-based approach to system prototyping and performance optimization for large-scale networked computing systems. Its workflow, as illustrated within the box, pivots over an innovation cycle of agile system prototyping and data-driven performance optimization. It starts with an architectural concept, and is further substantiated with a reference system design, which in turn serves as a blueprint for a system prototype built upon open-source libraries and proprietary implementations. The resulting prototype is then released for public trial to collect live operational data, which is supplemented with related datasets from its ecosystem. This combined set of public and private data is fed into a learning engine to generate models for performance optimizations. Actionable guidelines from this joint learning-optimization theory are then put into action over the reference design, via the emerging paradigm of software-defined systems. We expect the final deliverables of this research practice as refined products or services, which can be commercialized with less additional efforts. As we go through each cycle, high-quality publications can be obtained as by-products.
Toward Green Data Centre as an Interruptible Load for Grid Stabilization
Data centre has emerged as the critical infrastructure to fuel Internet innovations (e.g., cloud computing, big data). However, data centre typically consumes a huge amount of electrical power, leading to energy waste due to its low unitization and aggravating the instability challenge of power grid with volatile yields. In this research, we propose a ground-breaking concept of flipping data centre’s power burden into an opportunity to stabilize power grid with fluctuating supply. Specifically, we aim to develop technical and scientific solutions with an arbitrage-free economic model to enable data centre as an interruptible load (i.e. power load that can be scaled down temporally and spatially) to stabilize power grid with volatile renewables due to varying weather conditions. The technical solution leverages a transformative power analytics framework, i.e., embedded software as sensors, in which software hooks are embedded into a range of data centre subsystems, from chip to system to application level, to log ICT activities and power usage in a fine-grained, real-time manner. Data collected are then analyzed, via whitebox (e.g., kernel methods) and blackbox (e.g., deep learning networks) approaches, to construct system power models, which are used to develop (near)-optimal algorithms for energy-efficient data centre operations across computing, power distribution and cooling subsystems. This holistic system monitoring and optimization framework strives to reduce the overall power consumption of data centre, and enable spatial and temporal shifting of workloads in a network of geo-distributed data centres to mitigate the grid instability resulted from stochastic renewable yields.
Data Centre Energy Map
Towards Outside Air Cooling and Energy-Efficient ICT Operations for Modular Data Center in Tropical Environment
The primary objective of this research program is to demonstrate the practicality and cost-effectiveness of a modular data center, equipped with outside air cooling and energy-efficient ICT operations, in a tropical environment like Singapore. It takes an integrated approach towards a data center design for the future in terms of its major sub-systems: IT equipment (servers, storage and network), power supply infrastructure (including UPS and back-up power) and cooling systems (mode of cooling, type of equipment and systems design). Such an integrated approach enables matching of the IT needs with the individual sub-system requirements with minimal overprovision of resources. It also allows the key concerns of data center operations (response to the IT demands with minimal latency, adequacy of resources to ensure continuous equipment uptime, and operational conditions which minimizes failures rates) to be achieved concurrently without compromising on energy efficiency. As the industry is marked by high variability in its operational environment, there is a tendency for it to at a low part-load for a substantial proportion of time, hence a highly modularized design, coupled with the ability to ramp up resources within a short time is another desirable feature. The effectiveness of this integrated approach is being demonstrated via two leading applications, including HTTP video streaming and big-data analytics.
Multi-Screen Cloud Social TV for Value-Added Content Services
This project aims to develop our patent-pending cloud-centric media technologies into a multi-screen cloud-based social TV platform, for which a system prototype will be implemented for feasibility and usability studies. Research on big-data analytics on metadata and social data will be pursued to further improve user experience. Two value-added applications (e.g., video streaming over multiple screens and real-time TV advertisement tracking) will be studied to establish the business value of this technology.
Our multi-screen Social TV technology has been touted, by global media (1600+ news articles from 29+ countries), as an innovative technology to transform the traditional “laid-back” TV viewing behavior with the proactive “lean-forward” social networking experience, marrying TV to the social networking lifestyle of today. This platform, when fully developed and commercialized, would transform the value of TV and potentially save it from the similar downfall of newspapers. In our system, examples of salient and sticky features include, but not limited to, virtual living room experience that allows remote viewers to watch TV programs together with text, audio and video communication modalities, video teleportation experience that allows viewers seamlessly to migrate programs across different screens (e.g., TV, smartphone and tablet) with minimum learning. Moreover, to meet the requirements from various customers, the platform will provide a set of Application Programming Interfaces (APIs) for other developers to design, implement and deploy novel value-added content services for specific customer needs (e.g., elderly home care, TV ad workflow redesign, real-time TV shopping, collaborative e-learning, autism diagnosis and assistive treatment, to name a few. In this research, we plan to customize our solution targeted at two high-value TV applications, including an immersive TV watching service across multiple screens and a real-time TV advertisement tracking service, for which a campus-wide trial will be conduct at NTU.
Our novel technology outperforms other commercially available solutions of similar usage, by providing the most comprehensive features to meet end user’s needs in every occasion, from social networking to potentially home care monitoring, while offering the required scalability to support a large number of concurrent users. Adoption will be extremely easy through highly intuitive human-computer interfaces. Initial discussions have generated high commercial interest in our technology, via a desire for collaboration enquires with specific customer needs from TV vendors, service providers and OTT content providers. These needs dictate new features to be introduced into our system prototype and additional R&D efforts on big data analytics on social data and metadata to provide higher value to our customers.
Toward Learning-based Thermal Comfort Models to Instill Behavioral Changes for Greener, Smarter and Healthier Building in the Tropics via Pervasive Sensing
This research proposes to develop online thermal comfort models, via a deep-learning approach, and apply them for behavioral studies to drive “greener, smarter and healthier buildings” in the tropics (e.g., Singapore). Leveraging privacy-preserving data analytics over information acquired from smartphone crowdsourcing and in-situ wearables measurements, we plan to develop and validate an integrative, economical and scalable thermal comfort management system, with the following technical aims:
· To validate the canonical PMV model in the tropics via privacy-preserving data mining;
· To develop an online personalized thermal comfort model via a deep-learning approach; and
· To derive a unified utility mechanism for thermal comfort to instill behavioral changes in building occupants for greener, smarter and healthier buildings in Singapore.
Our solution builds upon our expertise in pervasive sensing and data analytics, and focuses on applied R&D with commercialization interest in smart buildings. First, we will develop a human-centric solution to leverage wearable devices (i.e., wristband) and mobile devices (e.g., smartphone) for crowdsourcing user preference and in-situ measurements. Second, we will perform privacy-preserving data analytics to transform the canonical thermal comfort model (i.e., PMV) into an online paradigm for behavioral studies. Finally, analytical insights will be validated in the SinBerBest testbed for energy efficiency. Working with local and international partners, we will showcase our R&D outcomes locally and globally. Our expected deliverables include an integrated thermal comfort management system, as well as a light-weight mobile application, which would have been well tested in Singapore and can scale up via a cloud data service for mass adoption with our commercialization partners globally.
Toward Joint IT-Thermal Optimization to Improve Energy Efficiency for High-Ambient Temperature Data Centre in the Tropics via Learning-based Algorithms
In this research, we propose to develop learning-based algorithms for joint IT-thermal optimization to improve energy efficiency for high-ambient temperature (enterprise) data center in the tropics. To tackle the paramount challenge of the siloed approach to IT and facility systems in data center, we plan to adopt an interdisciplinary approach to develop advanced energy-efficient technologies, with the following specific aims:
· To develop a data-driven mathematical framework, based on the highly-touted Deep Q-Networks (DQN), for controlling and optimizing large-scale systems with unknown dynamics and objectives. It first optimizes a low-cost hybrid sensing technique (i.e., UbiSense), combining our patented “software-as-sensors” technology (for ICT system performance counters, e.g., CPU usage, memory, I/O, etc) with strategically-deployed physical sensors (for ambient temperature, humidity, noise and airflow), for data centre monitoring. The hybrid dataset is then fed into a deep-learning engine to train a set of sophisticated and domain-specific models to capture the profound relationship between IT and non-IT systems.
· To apply and validate robust learning-based algorithms for joint IT-Thermal optimization, in harmony with the complex interplay between IT systems and non-IT systems. The joint optimization aims to increase the energy efficiency of enterprise data center operations while providing the desired system reliability and performance to ensure business continuity, under the extreme system dynamics of Singapore’s tropical weather.
· To conduct system trials for technology validation and commercialization with our private data center testbed at NTU and a public data center testbed provided by our government partner (i.e., National Super Computing Center). These trials will prepare us well for potential technology licenses and spin-off opportunities.
We believe that our holistic approach, based on emerging machine-learning approaches, stands out as a novel and practical solution to address the technical and operational challenges in running data centers in a higher ambient temperature environment. We pioneer in learning-based algorithms for joint IT-thermal optimization, adopting a data-centric approach, compared to existing model-based approaches. The practicality of our proposed solution has been previously endorsed by the data center industry with the 2015 Data Centre Dynamics Awards - APAC. It is expected with good confidence that our solution will reduce the energy consumption of data centers in Singapore by 20% in its full potential, leading to significant economic savings and environmental benefits for Singapore to materialize the envisioned digital economy transformation.
Cost-Optimal Mobile Computing in the Cloud
This research aims to resolve an eminent tussle between the growing usage of smart phones and their resource-constrained nature, by offloading computation tasks from handsets to a cloud infrastructure dynamically. In particular, we introduce a new concept of VMlet, which represents a service container on a virtual machine, executing a set of computing tasks dynamically offloaded from smartphones. The research challenge is how to optimally decompose a mobile application, represented by a directed graph for its workflow, into a set of virtual machines for cost-optimal execution. We formulate this challenge as a constrained optimization problem, with an objective to minimize a chosen cost metric (e.g., monetary cost or energy usage) for either the mobile user or the mobile service provider, under the constraint of quality of service (QoS) requirements (e.g., delay deadline). Our analytical framework builds on our previous work in task offloading for mobile cloud. Moreover, by solving a series of progressively-challenging sub-problems, we will develop distributed algorithms to solve the optimization problem and implement a prototype of the platform service for feature verification and performance optimization. Our expected deliverable will include a suit of middleware, in which the front-end software will be developed under the popular mobile platforms and the back-end software will run over public cloud platforms, and two pilot applications to showcase the platform capabilities. The software package will be offered via a Platform-as-a-Service (PaaS) model for application development, under an open-source license. The flexibility it offers to application developers would help to improve the brand awareness of emerging mobile devices and infrastructure cloud platforms.
by Yonggang Wen