the old way

-----------

 

chained queries

 

tightly coupled code -- db abstraction layer is great, but avoid mysql,

too

no abstract scaling

 

 

the new way

-----------

data access separated from your data storage

services oriented architecture

 

data is requested from a service

data requests are ran in parallel

data requests are asynchronous

data layer is loosely coupled

scalability is abstracted

 

options

-------

requests over HTTP

NY Times DBSlayer

Danga's Gearman, a queue or weird version of map reduce. difficult to explain.

kinda like explaining memcache in 2002.

DIY

 

HTTP w/PHP

----------

1. Group requests for data at the top

2. Open a socket foreach request

    a. sockets must be non-blocking

    b. make sure to TCP_NODELAY

3 use __get() to block for results

4. see services digg request

 

DBSlayer

--------

Small HTTP daemon written in C

Uses JSON for communications

Connection pooling

Load balancing and failover

Tightly coupled to MySQL

Tightly coupled to SQL

no intelligence

 

Gearman

-------

Highly scalabe queuing system

Simple/Efficient binary protocol

Jobs can return results (e.g. data)

Sets of jobs are ran in parallel

Queue can scale linearly

PHP, Perl, Python, Ruby, C clients

Poorly documented

Not very "robust" -- opportunity for coding

Great for logging and crawling

 

Do It Yourself

--------------

Highly customized solutions (Flickr)

Extremely efficient for custom cases

Customize your protocols

Requires more resources

 

What goes in the services layer?

--------------------------------

smart caching

data mapping and distribution -- 1) db hash file, 2) mysql, 3) finances in orcl

intelligent grouping of data results

partitioning logic

 

DO WANT!

--------

Intelligently group data into endpoints

 

    User End Point

    user settings

    user profile data

    10 most recent friends

    10 most recent diggs

 

Version Your endpoints

 

Bundle and group requests

 

EPIC FAIL!

----------

no teeny endpoints -- send lots o'data and carve it

Not running SOA requests in parallel

 

Net_Gearman

 

 

How do you transition over?

---------------------------

one framework

data access layer

abstracted query

migrate data by making user's data as read only

chain of responsibility pattern -- apc, memcache, http, mysql

dependency injection

 

What caused digg to go for SOA? Write saturation on the master. Sysadmins were not happy with site performance.

 

10,000 requests per second

 

phpcs for sniffing php for proper documentation


Page Information

  • 2 months ago [history]
  • View page source
  • You're not logged in
  • No tags yet learn more

Wiki Information

Recent PBwiki Blog Posts