科学研究在经历了实验科学、理论科学、计算科学阶段后,进入了数据密集型科学阶段,与之相伴的是大数据时代的到来.大数据泛指规模达到几百TB,甚至PB级的数据①,其典型的特征是分布、异构、低质量等.尽管传统数据库管理技术(特别是商业关系型数据库)在过去40年间取得了巨大成功,但是这些技术和系统无法有效管理支持数据密集型科学与工程(Data-Intensive Science and Engineering,DISE)的大数据.文中探讨数据密集型科学与工程的具体需求和现实挑战.它涵盖的内容表现在4个层面,包括数据存储与组织、计算方法、数据分析以及用户接口技术等.同时,数据质量、数据安全、数据监护等内容也需要在各层面得到重视.文中尝试梳理了数据密集型科学与工程的整体架构,回顾了相关领域的新近发展,分析了面临的挑战,探讨了未来的研究方向.
As one kind of social media, microblogs are widely used for sensing the real-world. The popularity of mi- croblogs is an important measurement for evaluation of the influencial of pieces of information. The models and mod- eling techniques for popularity of microblogs are studied in this paper. A huge data set based on Sina Weibo, one of the most popular microblogging services, is used in the study. First, two different types of popularity, namely number of retweets and number of possible views are defined, while their relationships are discussed. Then, the temporal dynamics, in- cluding lifecycles and tipping-points, of tweets' popularity are studied. For modeling the temporal dynamics, a piece- wise sigmoid model is used. Empirical studies show the ef- fectiveness of our modeling methods.