5/03 speaker:
本週邀請到國立台灣大學電機工程學系的 王奕翔教授,前來進行演講。
王奕翔教授為UC.柏克萊大學博士,專長為消息理論(Information Theory)、無線通訊(Wireless Communications)、網路資訊與資料處理(Networked Information and Data Processing)等。機會難得,敬請踴躍出席。
講者簡介如下:
Bio:
I-Hsiang Wang received his Ph.D. in Electrical Engineering and Computer Sciences from University of California at Berkeley, USA, in 2011. From 2011 to 2013, he was a postdoctoral research associate in the School of Computer and Communication Sciences (IC) at École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. In Fall 2013, he joined National Taiwan University, where he is now an associate professor. Prof. Wang’s expertise lies in information theory, statistical learning, and networked data processing. He received the Berkeley Vodafone Fellowship in 2006 and 2007. He was a finalist of the Best Student Paper Award of IEEE International Symposium on Information Theory, 2011. He won the 2017 IEEE Information Theory Society Taipei Chapter and IEEE Communications Society Taipei/Tainan Chapters Best Paper Award for Young Scholars, and the 2016 National Taiwan University Distinguished Teaching Award. He served on the technical program committees of flagship conferences in information theory, including IEEE International Symposium on Information Theory (ISIT) and IEEE Information Theory Workshop (ITW).
講題與摘要如下:
Title:On Source Anonymity in Heterogeneous Statistical Inference
Abstract:
Statistical inference is a fundamental task in data science, where a decision maker aims to determine a hidden parameter based on the data it collects, as well as how the data depends on the target parameter statistically. In many modern applications such as crowdsourcing and sensor networks, data is heterogeneous and collected from various sources following different distributions. These sources, however, may be anonymous to the decision maker due to considerations in identification costs and privacy. Since the distribution becomes unknown, it is unclear how to carry out optimal inference, and hence the impact of source anonymity on the performance of statistical inference remains elusive. In this talk, I will present our recent work towards settling this question for binary hypothesis testing. Considering the anonymity of data sources, it is natural to formulate it as a composite hypothesis testing problem. First, we propose an optimal test called mixture likelihood ratio test, a randomized threshold test based on the ratio of the uniform mixture of all the possible distributions under one hypothesis to that under the other hypothesis. Second, we focus on the Neyman-Pearson setting and characterize the error exponent of the worst-case type-II error probability as the dimension of data tends to infinity while the proportion among the dimensions of different data sources remains constant. It turns out that the optimal exponent is a generalized divergence between the two families of distributions under the two hypotheses. Our results elucidate the price of anonymity in heterogeneous hypothesis testing and can be extended to more general inference tasks.