Keynote - Social Coding Contract

  • startups consume opensource
  • poor contributors
  • opensource - intersection between philosophy and economics
  • awfulness vs progress - more as time goes on until theres a breaking point
  • opensource is progress
    • .c and .h files out there randomly
    • makefiles help
    • jar files
    • gemfile (bundler) - dependencies discovered and resolved
    • npm - huge tree of dependencies - windows max file path limit
  • dependencies are hard to update
    • come back a year and try to update everything
    • downstream changes with loose versioning
    • need -> convenience -> complexity -> risk -> mystery
  • maintainer
    • early adopter - awesome, then disappears
    • late adopter - stable vs stale - ask for unseen features, ask for more
      • better customres than users? paid options/dual license?
    • should be ok to say no, (but help you to do it)
    • trolls :-(
    • no maintainer is forever
    • what happens when maintainer leaves goes away etc
  • cloud services
    • all are going to shut down at some point
    • most opensource is centralized from convenience
    • what if rubygems/npm disappear? what would decentralized repo look like?
  • adoption requires trust
    • explicit trust vs implicit trust
    • marketing
  • security
    • open
    • uncomfortable number of people read source to exploit it
    • as software gets more popular, importance of audit goes up, but motivation to audit goes down because implicit trust of popular project
  • communication
    • scale up fidelity - chat, pairing, video, in person
  • systems programmers
    • conservative, cautious
    • embedded and realtime failures have grave consequences
  • using dependencies
    • understanding debt - outsourcing understanding of a problem

@schneems - Richard Schneeman - Text distance - github.com/schneems/going_the_distance bit.ly/going_the_distance

  • Edge (edit) distance = cost to change one word into another
  • Hamming distance (signal distance)
    • measures errors in a string, only if same length
    • detect and correct erros in binary and telecommunications
    • does not include insert/delete, only substitution
  • Levenstein distance - http://en.wikipedia.org/wiki/Levenshtein_distance
    • calculate deletion/insertion
    • calculate distance between every substring
    • take minimum
    • recursive O(n^n) 1647 iterations?
    • has a lot of repeats
    • calculate with a matrix
      • sorta example "" => "saturday" = row 1->7 "" => "sunday" = col 1->6 s -> a = 1 su -> s = (0)+1 sa -> su = (0)+1 sat -> sun = (1)+1 = 2
      • final cost is last entry in the matrix
      • 48 interations m*n
  • Peter Norvig has paper on how Google did spelling
    • word counts - higher count higher probability
    • edit distance between words, weighted with probability
    • cache found corrections
  • did_you_mean gem
    • in better_errors - helps find no method errors
  • more algorithms
    • Jaro Winkler distance - for large number of distance calculations
    • rosetta code

Benchmarking Ruby - github.com/davy/benchmarking_ruby - @davystevenson

  • why benchmark
    • certainty in performance
    • benchmark in gems, ruby itself
  • require 'benchmark'
    • Benchmark.bm { report { n.times { stuff } }; }
    • Pros: easy, in stdlib
    • Cons: variable fiddling, difficult output, boilerplate code
  • benchmark/ips - iterations per second
    • like benchmark
    • how many iterations in 100ms
    • run for 5 seconds
    • comparison
    • Pros: a bit less fiddly, measures iterations per second, same syntax, compare
    • Cons: gem, snapshot view into performance
  • benchmark/bigo - determine big-o of operation
    • generator, generate array etc
    • size changing over time
    • print/generate charts
    • Pros: range of inputs, charts,
    • Cons: gem, longer runtime, not always applicable
    • compare, compares to known big-o notation lines
    • can constrain parameters of test set
  • how to benchmark effectively
    • is environment consistent?
    • verify/test actual results
    • dont change behavior over runs
    • only modifying one thing?
    • accidently mutating objects?
    • set controls (understand how other code to setup benchmark affects)
    • using random effectively? (is random going to change behavior)
    • what are you benchmarking - best case, worst case, or average case?
  • Terraformer
    • converts types
    • geography related tasks
    • convex hull algorithm
      • jarvis march (simple code) based on input size, worse if points are on edge
      • monotone chain (better, doesnt change based on types of objects)
      • test average and worst cases to pick
  • Conclusion
    • Verify assumptions, control changes
    • Learn more about code, ruby
    • When Benchmark? just use benchmark.ips
    • When Benchmark.ips? all the time? good for spot checks and expansive analysis
    • When Benchmark.bigo? range of sizes, results in chart form, ok if it takes a while
  • More
    • Writing Fast Ruby - Erik Michaels-Ober
    • Fast ruby repo
    • Derailed benchmarks - for rails, ram required over time

Ruby performance secrets - Alexander Dymo

  • ruby-performance-book.com
  • how to understand whats wrong, how to find your own perf tips & best practices
  • how is slow sometimes? garbage collection
  • how to check object creation / allocation
    • profiler - needs patched ruby (railsexpress) - 2.1.4 no patch right now
    • gem install ruby-prof
    • kcachegrind/qcachegrind
    • ruby-prof -p call-tree --mode=allocations > out
    • look at source code
    • gdb - r -e 'ruby code'
    • memoization and block conversion are tough - create objects
  • func! modify in place (except when it doesnt), should be faster (except when it isnt)
    • ruby-prof -p call-tree --mode=memory > out
  • challenge all tips/tricks/practices
  • dont guess, profile
  • do not need patched intrepreter for cpu ruby-prof, only memory/alloc
  • ObjectSpace. - profiler better for whole view

REAL WORLD RUBY PERFORMANCE AT SCALE - Aaron Quint - @aq - paperless post - github/quirkey /paperlesspost

  • tips and tricks are the clifs notes of tech learning
  • step 1: acceptance - its your fault
    • x doesnt scale is bs: scale for what? to what degree? with what hardware? etc
  • step 2: diagnosis
    • metrics - measurements - mmmnumbers - milliseconds
    • playing golf (lowest num lines/code)
    • vertical, request descending through stack - horizontal are tiers
    • vertical - single action or code path
    • horizontal - more nodes, faster hardware
  • step 3: treatment
  • features vs speed - opposing forces
    • fast means being stable too
  • example - json for days - paper browser
    • json -> javascript
    • cache invalidated
    • shared cached sections
    • uncached perf still a problem
    • ppprofiler
      • benchmark
      • rblineprof
      • notification counts
      • memoryprofiler
      • markdown output
  • profiling numbers are relative
  • stackprof and stackprof-remote
    • read between the lines, other stuff going on
    • samples whats on stack at a given point
    • doesnt allocate heap memory, doesnt greatly affect perf
    • flamegraphs - tall and wide are places to optimize
  • sometimes it is your fault, for being cheap, sometimes you can throw money at the problem
  • start with a hitlist
    • start at the top, work way down (most used -> least used)
    • number of req * 90% response time = “total worst time”
    • like newrelic already has
  • big wins are not the point
    • small changes over time
    • keep doing it
  • if you’re not failing, you’re not being honest
  • dont just make tools, learn to use them