RubyConf 2014 - Day 3
by
Keynote - Social Coding Contract
- startups consume opensource
- poor contributors
- opensource - intersection between philosophy and economics
- awfulness vs progress - more as time goes on until theres a breaking point
- opensource is progress
- .c and .h files out there randomly
- makefiles help
- jar files
- gemfile (bundler) - dependencies discovered and resolved
- npm - huge tree of dependencies - windows max file path limit
- dependencies are hard to update
- come back a year and try to update everything
- downstream changes with loose versioning
- need -> convenience -> complexity -> risk -> mystery
- maintainer
- early adopter - awesome, then disappears
- late adopter - stable vs stale - ask for unseen features, ask for more
- better customres than users? paid options/dual license?
- should be ok to say no, (but help you to do it)
- trolls :-(
- no maintainer is forever
- what happens when maintainer leaves goes away etc
- cloud services
- all are going to shut down at some point
- most opensource is centralized from convenience
- what if rubygems/npm disappear? what would decentralized repo look like?
- adoption requires trust
- explicit trust vs implicit trust
- marketing
- security
- open
- uncomfortable number of people read source to exploit it
- as software gets more popular, importance of audit goes up, but motivation to audit goes down because implicit trust of popular project
- communication
- scale up fidelity - chat, pairing, video, in person
- systems programmers
- conservative, cautious
- embedded and realtime failures have grave consequences
- using dependencies
- understanding debt - outsourcing understanding of a problem
@schneems - Richard Schneeman - Text distance - github.com/schneems/going_the_distance bit.ly/going_the_distance
- Edge (edit) distance = cost to change one word into another
- Hamming distance (signal distance)
- measures errors in a string, only if same length
- detect and correct erros in binary and telecommunications
- does not include insert/delete, only substitution
- Levenstein distance - http://en.wikipedia.org/wiki/Levenshtein_distance
- calculate deletion/insertion
- calculate distance between every substring
- take minimum
- recursive O(n^n) 1647 iterations?
- has a lot of repeats
- calculate with a matrix
- sorta example
"" => "saturday" = row 1->7 "" => "sunday" = col 1->6 s -> a = 1 su -> s = (0)+1 sa -> su = (0)+1 sat -> sun = (1)+1 = 2
- final cost is last entry in the matrix
- 48 interations m*n
- sorta example
- Peter Norvig has paper on how Google did spelling
- word counts - higher count higher probability
- edit distance between words, weighted with probability
- cache found corrections
- did_you_mean gem
- in better_errors - helps find no method errors
- more algorithms
- Jaro Winkler distance - for large number of distance calculations
- rosetta code
Benchmarking Ruby - github.com/davy/benchmarking_ruby - @davystevenson
- why benchmark
- certainty in performance
- benchmark in gems, ruby itself
-
require 'benchmark'
Benchmark.bm { report { n.times { stuff } }; }
- Pros: easy, in stdlib
- Cons: variable fiddling, difficult output, boilerplate code
-
benchmark/ips
- iterations per second- like benchmark
- how many iterations in 100ms
- run for 5 seconds
- comparison
- Pros: a bit less fiddly, measures iterations per second, same syntax, compare
- Cons: gem, snapshot view into performance
-
benchmark/bigo
- determine big-o of operation- generator, generate array etc
- size changing over time
- print/generate charts
- Pros: range of inputs, charts,
- Cons: gem, longer runtime, not always applicable
- compare, compares to known big-o notation lines
- can constrain parameters of test set
- how to benchmark effectively
- is environment consistent?
- verify/test actual results
- dont change behavior over runs
- only modifying one thing?
- accidently mutating objects?
- set controls (understand how other code to setup benchmark affects)
- using random effectively? (is random going to change behavior)
- what are you benchmarking - best case, worst case, or average case?
- Terraformer
- converts types
- geography related tasks
- convex hull algorithm
- jarvis march (simple code) based on input size, worse if points are on edge
- monotone chain (better, doesnt change based on types of objects)
- test average and worst cases to pick
- Conclusion
- Verify assumptions, control changes
- Learn more about code, ruby
- When Benchmark? just use benchmark.ips
- When Benchmark.ips? all the time? good for spot checks and expansive analysis
- When Benchmark.bigo? range of sizes, results in chart form, ok if it takes a while
- More
- Writing Fast Ruby - Erik Michaels-Ober
- Fast ruby repo
- Derailed benchmarks - for rails, ram required over time
Ruby performance secrets - Alexander Dymo
- ruby-performance-book.com
- how to understand whats wrong, how to find your own perf tips & best practices
- how is slow sometimes? garbage collection
- how to check object creation / allocation
- profiler - needs patched ruby (railsexpress) - 2.1.4 no patch right now
- gem install ruby-prof
- kcachegrind/qcachegrind
ruby-prof -p call-tree --mode=allocations > out
- look at source code
- gdb -
r -e 'ruby code'
- memoization and block conversion are tough - create objects
-
func!
modify in place (except when it doesnt), should be faster (except when it isnt)ruby-prof -p call-tree --mode=memory > out
- challenge all tips/tricks/practices
- dont guess, profile
- do not need patched intrepreter for cpu ruby-prof, only memory/alloc
- ObjectSpace. - profiler better for whole view
REAL WORLD RUBY PERFORMANCE AT SCALE - Aaron Quint - @aq - paperless post - github/quirkey /paperlesspost
- tips and tricks are the clifs notes of tech learning
- step 1: acceptance - its your fault
- x doesnt scale is bs: scale for what? to what degree? with what hardware? etc
- step 2: diagnosis
- metrics - measurements - mmmnumbers - milliseconds
- playing golf (lowest num lines/code)
- vertical, request descending through stack - horizontal are tiers
- vertical - single action or code path
- horizontal - more nodes, faster hardware
- step 3: treatment
- features vs speed - opposing forces
- fast means being stable too
- example - json for days - paper browser
- json -> javascript
- cache invalidated
- shared cached sections
- uncached perf still a problem
- ppprofiler
- benchmark
- rblineprof
- notification counts
- memoryprofiler
- markdown output
- profiling numbers are relative
- stackprof and stackprof-remote
- read between the lines, other stuff going on
- samples whats on stack at a given point
- doesnt allocate heap memory, doesnt greatly affect perf
- flamegraphs - tall and wide are places to optimize
- sometimes it is your fault, for being cheap, sometimes you can throw money at the problem
- start with a hitlist
- start at the top, work way down (most used -> least used)
- number of req * 90% response time = “total worst time”
- like newrelic already has
- big wins are not the point
- small changes over time
- keep doing it
- if you’re not failing, you’re not being honest
- dont just make tools, learn to use them