自食恶果是什么意思| 泻立停又叫什么名字| 乙状结肠炎吃什么药| 屋里喷什么消毒最好| 胃不好适合吃什么食物| 88年属龙是什么命| 好难过这不是我要的结果什么歌| 姻亲是什么意思| 脑淤血是什么原因引起的| 痢疾是什么意思| 月经期间吃什么补血| 血压低吃什么药见效快| 不怕热是什么体质| 舌系带短会有什么影响| 2.7是什么星座| 刘邦是什么星座| 应无所住而生其心什么意思| 鱼肝油有什么功效| 曹操是什么样的人| 金丝玉是什么玉| 什么叫一个周期| 法脉是什么意思| 出冷汗是什么原因| 总想小便是什么原因| 牙齿发麻是什么原因| 甘薯是什么东西| 手串14颗代表什么意思| 雨花茶是什么茶| 梦见对象出轨什么征兆| 什么叫根管治疗| 12月16是什么星座| miles是什么意思| 海带是什么植物| 吃什么对脾胃有好处| 空降是什么意思| 浔是什么意思| 点痣用什么方法最好| 舌头两边有齿痕是什么原因| 奔头是什么意思| 蛰居是什么意思| 耳膜破了有什么症状| n是什么牌子| 生猴子是什么意思| 五十知天命什么意思| 感冒了吃什么饭菜合适| 蔓越莓有什么功效| 羊肉炖什么补肾壮阳| 增肌吃什么最好| 神经紊乱吃什么药| 厉兵秣马什么意思| 偏激是什么意思| 农历七月份是什么星座| 越什么越什么| 多吃蓝莓有什么好处| 心病是什么病有哪些症状| 梅毒抗体阳性说明什么| 后背麻木是什么原因| 最聪明的动物是什么| 蝌蚪吃什么食物| 吃什么可以提升白细胞| 为什么总是梦见一个人| 沙眼衣原体是什么病| 什么是潮喷| 蔡徐坤粉丝名叫什么| 应景是什么意思| 已归档是什么意思| 彩虹像什么| 血糖高的可以吃什么水果| 脚没力气是什么原因| 车牌号选什么数字吉利| 家里蟑螂多是什么原因| 腋窝淋巴结肿大挂什么科| 看耳朵挂什么科| 多发息肉是什么意思| 淋巴门结构可见是什么意思| 接济是什么意思| 路痴是什么原因造成的| 卡介苗预防什么疾病| 乳腺瘤不能吃什么| 什么方法避孕最安全有效| 陕西有什么烟| alpha是什么| 胰腺分泌什么| 口若悬河是指什么生肖| 6月20日什么星座| 艾滋病窗口期是什么意思| 汗味重是什么原因| 好运是什么生肖| 感冒咳嗽一直不好是什么原因| delsey是什么牌子| 长得什么| 解析是什么意思| 犬瘟是什么原因引起的| 喉咙痛不能吃什么东西| jz是什么意思| 本能是什么意思| 木薯粉是什么做的| 电磁炉用什么锅| 穷搬家富挪坟是什么意思| 什么叫压力| 世界八大奇迹是什么| 镜花水月什么意思| 嘴上有痣代表什么| 什么是反射| 胎监是检查什么的| 12月17日什么星座| 早上打碎碗是什么兆头| 痛风可以吃什么肉类和蔬菜| 面部油腻是什么原因| 绝对零度是什么意思| 碳酸饮料喝多了有什么危害| 采阴补阳是什么意思| 印代表什么| 林彪为什么叛逃| 胎儿生物物理评分8分什么意思| 尿白细胞定量高是什么意思| 吃什么补维生素b12| 本我是什么意思| 右肋骨下方隐隐疼痛是什么原因| 离子四项是检查什么的| 钾是什么| 属猴的跟什么属相最配| 一泻千里是什么意思| 小儿支气管炎咳嗽吃什么药好得快| 中之人什么意思| 盛是什么意思| 南京市市长什么级别| 子宫增大是什么原因造成的| 泡奶粉用什么水最好| 智齿发炎吃什么消炎药| 四肢肿胀是什么原因引起的| 做梦梦到屎什么意思| 车震是什么意思| 过分是什么意思| 闻思修是什么意思| 天雨粟鬼夜哭什么意思| 高胆红素血症是什么病| 冬天有什么水果| 鸡叫是什么时辰| 黄疸是什么症状| 月经不正常去医院检查什么项目| 直采是什么意思| 梦见自己鞋子破了是什么意思| 突然头晕恶心是什么原因| 清华大学什么时候成立| 爱吐口水是什么原因| 静脉曲张吃什么药| 做面包用什么面粉| 天地银行是什么意思| 甲状腺挂什么科| 红糖和黑糖有什么区别| 人大是干什么的| 白蜡金命五行缺什么| 肚脐周围疼是什么原因| 铁皮石斛有什么功效| 身体皮肤痒是什么原因| cm是什么单位| 金风玉露是什么意思| 西昌火把节是什么时候| 精囊在什么位置| 得过且过什么意思| 局气什么意思| 眼睛浮肿是什么原因| 肺纤维化是什么症状| 脚后跟干裂用什么药膏| 迷糊是什么原因| 黑洞长什么样| 乙肝表面抗原是什么意思| 11.2是什么星座| 局灶癌变是什么意思| 香港警司是什么级别| ua是什么意思| 识大体是什么意思| 什么国家的钱最值钱| 双胞胎是什么意思| 家里出现蛇是什么征兆| 做梦大便是什么意思| 磬是什么乐器| 胃炎吃什么药效果好| 威图手机为什么那么贵| 兔子的眼睛为什么是红色的| jumper是什么衣服| 青光眼是什么原因引起的| 什么样的天山| 为什么心率过快| 消化不良反酸吃什么药| 突然暴瘦是什么原因| ieg是什么意思| 复原乳是什么意思| 大姨妈能吃什么水果| 吃什么药提高免疫力| 肠痈是什么意思| 血清铁蛋白高是什么原因| 为什么会有盆腔炎| 土生土长是什么生肖| 崖柏手串有什么功效| 窦性心律不齐有什么危害| 梦见牙碎了是什么预兆| green是什么颜色| bp是什么单位| 裂纹舌是什么原因| 倾字五行属什么| 胜字五行属什么| 肌张力高吃什么药| 诱发电位是检查什么病的| 低密度脂蛋白偏高是什么意思| 有市无价是什么意思| 睡觉就做梦是什么原因| 吕布是什么生肖| 介质是什么意思| 心穷是什么意思| 腿上起水泡是什么原因| 老年阴道炎用什么药| 老子是什么时期的人| 大头菜是什么菜| 九个月宝宝吃什么辅食| 胆囊炎吃什么蔬菜好| 男人吃什么壮阳| 吃什么对肠道好| 大腿为什么会长妊娠纹| 生日送百合花代表什么| 奶粉水解什么意思| 眼痒用什么眼药水| 风湿病挂什么科| 置之不理的置是什么意思| 1990年属马的是什么命| 男人后背有痣代表什么| 老人越来越瘦是什么原因| 人渣是什么意思| 打歌是什么意思| 微信是什么时候开始有的| 女性更年期挂什么科| 生理期可以吃什么| 色纸是什么| 腰椎退行性变是什么病| 开颌是什么意思| 发量少适合什么发型| 三七和田七有什么区别| 白开水喝多了有什么危害| 星星是什么的眼睛| 为什么会有湿气| 拔牙什么时候拔最好| 简单明了是什么意思| 大便粘便池是什么原因| 红花有什么功效| 倒贴是什么意思| 荷兰的国花是什么花| 眼睛不舒服是什么原因引起的| 褒义词什么意思| 什么植物最老实| 什么的白桦| national是什么牌子| 失去理智什么意思| 撸铁是什么意思| 冲任失调是什么意思| 蚊子最怕什么植物| 红花有什么作用| 尿酸高早餐吃什么| 血压和血糖有什么关系| 长痘要忌口什么东西| 把子肉是什么肉| 偶像包袱是什么意思| 把你的心我的心串一串是什么歌| 血压的低压高是什么原因| 百度

河北省人民政府 关于推进“互联网+”行动的实施意见

Update 3: 7 Years Of YouTube Scalability Lessons In 30 Minutes and YouTube Strategy: Adding Jitter Isn't A Bug

Update 2: YouTube Reaches One Billion Views Per Day. That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour. 

Update: YouTube: The Platform. YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway.

YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google?

Information Sources

  1. Google Video

Platform

  1. Apache
  2. Python
  3. Linux (SuSe)
  4. MySQL
  5. psyco, a dynamic python->C compiler
  6. lighttpd for video instead of Apache

What's Inside?

The Stats

  1. Supports the delivery of over 100 million videos per day.
  2. Founded 2/2005
  3. 3/2006 30 million video views/day
  4. 7/2006 100 million video views/day
  5. 2 sysadmins, 2 scalability software architects
  6. 2 feature developers, 2 network engineers, 1 DBA

Recipe for handling rapid growth

while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}


This loop runs many times a day.

Web Servers

  1. NetScalar is used for load balancing and caching static content.
  2. Run Apache with mod_fast_cgi.
  3. Requests are routed for handling by a Python application server.
  4. Application server talks to various databases and other informations sources to get all the data and formats the html page.
  5. Can usually scale web tier by adding more machines.
  6. The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
  7. Python allows rapid flexible development and deployment. This is critical given the competition they face.
  8. Usually less than 100 ms page service times.
  9. Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
  10. For high CPU intensive activities like encryption, they use C extensions.
  11. Some pre-generated cached HTML for expensive to render blocks.
  12. Row level caching in the database.
  13. Fully formed Python objects are cached.
  14. Some data are calculated and sent to each application so the values are cached in local memory. This is an underused strategy. The fastest cache is in your application server and it doesn't take much time to send precalculated data to all your servers. Just have an agent that watches for changes, precalculates, and sends.

Video Serving

  • Costs include bandwidth, hardware, and power consumption.
  • Each video hosted by a mini-cluster. Each video is served by more than one machine.
  • Using a a cluster means:
    - More disks serving content which means more speed.
    - Headroom. If a machine goes down others can take over.
    - There are online backups.
  • Servers use the lighttpd web server for video:
    - Apache had too much overhead.
    - Uses epoll to wait on multiple fds.
    - Switched from single process to multiple process configuration to handle more connections.
  • Most popular content is moved to a CDN (content delivery network):
    - CDNs replicate content in multiple places. There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
    - CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory.
  • Less popular content (1-20 views per day) uses YouTube servers in various colo sites.
    - There's a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
    - Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won't always be your performance savior.
    - Tune RAID controller and pay attention to other lower level issues to help.
    - Tune memory on each machine so there's not too much and not too little.
  • Serving Video Key Points

    1. Keep it simple and cheap.
    2. Keep a simple network path. Not too many devices between content and users. Routers, switches, and other appliances may not be able to keep up with so much load.
    3. Use commodity hardware. More expensive hardware gets the more expensive everything else gets too (support contracts). You are also less likely find help on the net.
    4. Use simple common tools. They use most tools build into Linux and layer on top of those.
    5. Handle random seeks well (SATA, tweaks).

    Serving Thumbnails

  • Surprisingly difficult to do efficiently.
  • There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
  • Thumbnails are hosted on just a few machines.
  • Saw problems associated with serving a lot of small objects:
    - Lots of disk seeks and problems with inode caches and page caches at OS level.
    - Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
    - A high number of requests/sec as web pages can display 60 thumbnails on page.
    - Under such high loads Apache performed badly.
    - Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
    - Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
    - With so many images setting up a new machine took over 24 hours.
    - Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
  • To solve all their problems they started using Google's BigTable, a distributed data store:
    - Avoids small file problem because it clumps files together.
    - Fast, fault tolerant. Assumes its working on a unreliable network.
    - Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
    - For more information on BigTable take a look at Google Architecture, GoogleTalk Architecture, and BigTable.
  • Databases

    1. The Early Years
      - Use MySQL to store meta data like users, tags, and descriptions.
      - Served data off a monolithic RAID 10 Volume with 10 disks.
      - Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
      - They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
      - Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
      - Updates cause cache misses which goes to disk where slow I/O causes slow replication.
      - Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
      - One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
    2. The later years:
      - Went to database partitioning.
      - Split into shards with users assigned to different shards.
      - Spreads writes and reads.
      - Much better cache locality which means less IO.
      - Resulted in a 30% hardware reduction.
      - Reduced replica lag to 0.
      - Can now scale database almost arbitrarily.

    Data Center Strategy

    1. Used manage hosting providers at first. Living off credit cards so it was the only way.
    2. Managed hosting can't scale with you. You can't control hardware or make favorable networking agreements.
    3. So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
    4. Use 5 or 6 data centers plus the CDN.
    5. Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
    6. Video bandwidth dependent, not really latency dependent. Can come from any colo.
    7. For images latency matters, especially when you have 60 images on a page.
    8. Images are replicated to different data centers using BigTable. Code
      looks at different metrics to know who is closest.

    Lessons Learned

    1. Stall for time. Creative and risky tricks can help you cope in the short term while you work out longer term solutions.
    2. Prioritize. Know what's essential to your service and prioritize your resources and efforts around those priorities.
    3. Pick your battles. Don't be afraid to outsource some essential services. YouTube uses a CDN to distribute their most popular content. Creating their own network would have taken too long and cost too much. You may have similar opportunities in your system. Take a look at Software as a Service for more ideas.
    4. Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening.
    5. Shard. Sharding helps to isolate and constrain storage, CPU, memory, and IO. It's not just about getting more writes performance.
    6. Constant iteration on bottlenecks:
      - Software: DB, caching
      - OS: disk I/O
      - Hardware: memory, RAID
    7. You succeed as a team. Have a good cross discipline team that understands the whole system and what's underneath the system. People who can set up printers, machines, install networks, and so on. With a good team all things are possible.

     

    On reddit.

    c8是什么意思 什么东西能吸水 靶向治疗是什么意思 胆固醇高吃什么最好 怀孕什么东西不能吃
    助产士一般什么学历 74年出生属什么生肖 肌钙蛋白高说明什么 乡镇党委书记是什么级别 夫妻肺片是什么
    落地生根是什么生肖 吃什么有奶 便黑色大便是什么情况 检查喉咙挂什么科 甘油三脂高是什么意思
    防晒衣什么面料好 梅菜在北方叫什么菜 什么叫银屑病 大刀阔斧是什么意思 为什么大便会拉出血
    大云是什么烟clwhiglsz.com 大枕大池有什么危害shenchushe.com 大便拉不出来是什么原因hcv8jop8ns7r.cn ray是什么意思hcv8jop1ns0r.cn 版记是什么hcv8jop7ns1r.cn
    农历正月初一是什么节日kuyehao.com 女人胸疼是什么原因hkuteam.com 什么叫宫腔粘连hcv9jop1ns2r.cn 王字旁的字跟什么有关hcv9jop3ns5r.cn 太平猴魁属于什么茶类hcv8jop7ns9r.cn
    尿发黄什么原因wuhaiwuya.com 玉皇大帝叫什么名字hcv8jop7ns9r.cn 夜晚尿频尿多是什么原因hcv8jop6ns2r.cn 总胆汁酸高说明什么hcv8jop4ns9r.cn 肝风内动是什么原因造成的yanzhenzixun.com
    舌头麻木吃什么药hcv8jop5ns9r.cn 小暑节气吃什么hcv8jop3ns8r.cn 大肠杆菌属于什么菌hcv9jop6ns3r.cn 肾虚吃什么补hcv9jop3ns5r.cn 声援是什么意思hcv8jop3ns0r.cn
    百度