19 November 2014

RubyConf 2014 - Day 3

startups consume opensource
poor contributors
opensource - intersection between philosophy and economics
awfulness vs progress - more as time goes on until theres a breaking point
opensource is progress
- .c and .h files out there randomly
- makefiles help
- jar files
- gemfile (bundler) - dependencies discovered and resolved
- npm - huge tree of dependencies - windows max file path limit
dependencies are hard to update
- come back a year and try to update everything
- downstream changes with loose versioning
- need -> convenience -> complexity -> risk -> mystery
maintainer
- early adopter - awesome, then disappears
- late adopter - stable vs stale - ask for unseen features, ask for more
  - better customres than users? paid options/dual license?
- should be ok to say no, (but help you to do it)
- trolls :-(
- no maintainer is forever
- what happens when maintainer leaves goes away etc
cloud services
- all are going to shut down at some point
- most opensource is centralized from convenience
- what if rubygems/npm disappear? what would decentralized repo look like?
adoption requires trust
- explicit trust vs implicit trust
- marketing
security
- open
- uncomfortable number of people read source to exploit it
- as software gets more popular, importance of audit goes up, but motivation to audit goes down because implicit trust of popular project
communication
- scale up fidelity - chat, pairing, video, in person
systems programmers
- conservative, cautious
- embedded and realtime failures have grave consequences
using dependencies
- understanding debt - outsourcing understanding of a problem

@schneems - Richard Schneeman - Text distance - github.com/schneems/going_the_distance bit.ly/going_the_distance

Edge (edit) distance = cost to change one word into another
Hamming distance (signal distance)
- measures errors in a string, only if same length
- detect and correct erros in binary and telecommunications
- does not include insert/delete, only substitution
Levenstein distance - http://en.wikipedia.org/wiki/Levenshtein_distance
- calculate deletion/insertion
- calculate distance between every substring
- take minimum
- recursive O(n^n) 1647 iterations?
- has a lot of repeats
- calculate with a matrix
  - sorta example
    "" => "saturday" = row 1->7 "" => "sunday" = col 1->6 s -> a = 1 su -> s = (0)+1 sa -> su = (0)+1 sat -> sun = (1)+1 = 2
  - final cost is last entry in the matrix
  - 48 interations m*n
Peter Norvig has paper on how Google did spelling
- word counts - higher count higher probability
- edit distance between words, weighted with probability
- cache found corrections
did_you_mean gem
- in better_errors - helps find no method errors
more algorithms
- Jaro Winkler distance - for large number of distance calculations
- rosetta code

Benchmarking Ruby - github.com/davy/benchmarking_ruby - @davystevenson

why benchmark
- certainty in performance
- benchmark in gems, ruby itself
require 'benchmark'
- Benchmark.bm { report { n.times { stuff } }; }
- Pros: easy, in stdlib
- Cons: variable fiddling, difficult output, boilerplate code
benchmark/ips - iterations per second
- like benchmark
- how many iterations in 100ms
- run for 5 seconds
- comparison
- Pros: a bit less fiddly, measures iterations per second, same syntax, compare
- Cons: gem, snapshot view into performance
benchmark/bigo - determine big-o of operation
- generator, generate array etc
- size changing over time
- print/generate charts
- Pros: range of inputs, charts,
- Cons: gem, longer runtime, not always applicable
- compare, compares to known big-o notation lines
- can constrain parameters of test set
how to benchmark effectively
- is environment consistent?
- verify/test actual results
- dont change behavior over runs
- only modifying one thing?
- accidently mutating objects?
- set controls (understand how other code to setup benchmark affects)
- using random effectively? (is random going to change behavior)
- what are you benchmarking - best case, worst case, or average case?
Terraformer
- converts types
- geography related tasks
- convex hull algorithm
  - jarvis march (simple code) based on input size, worse if points are on edge
  - monotone chain (better, doesnt change based on types of objects)
  - test average and worst cases to pick
Conclusion
- Verify assumptions, control changes
- Learn more about code, ruby
- When Benchmark? just use benchmark.ips
- When Benchmark.ips? all the time? good for spot checks and expansive analysis
- When Benchmark.bigo? range of sizes, results in chart form, ok if it takes a while
More
- Writing Fast Ruby - Erik Michaels-Ober
- Fast ruby repo
- Derailed benchmarks - for rails, ram required over time

Ruby performance secrets - Alexander Dymo

ruby-performance-book.com
how to understand whats wrong, how to find your own perf tips & best practices
how is slow sometimes? garbage collection
how to check object creation / allocation
- profiler - needs patched ruby (railsexpress) - 2.1.4 no patch right now
- gem install ruby-prof
- kcachegrind/qcachegrind
- ruby-prof -p call-tree --mode=allocations > out
- look at source code
- gdb - r -e 'ruby code'
- memoization and block conversion are tough - create objects
func! modify in place (except when it doesnt), should be faster (except when it isnt)
- ruby-prof -p call-tree --mode=memory > out
challenge all tips/tricks/practices
dont guess, profile
do not need patched intrepreter for cpu ruby-prof, only memory/alloc
ObjectSpace. - profiler better for whole view

REAL WORLD RUBY PERFORMANCE AT SCALE - Aaron Quint - @aq - paperless post - github/quirkey /paperlesspost

tips and tricks are the clifs notes of tech learning
step 1: acceptance - its your fault
- x doesnt scale is bs: scale for what? to what degree? with what hardware? etc
step 2: diagnosis
- metrics - measurements - mmmnumbers - milliseconds
- playing golf (lowest num lines/code)
- vertical, request descending through stack - horizontal are tiers
- vertical - single action or code path
- horizontal - more nodes, faster hardware
step 3: treatment
features vs speed - opposing forces
- fast means being stable too
example - json for days - paper browser
- json -> javascript
- cache invalidated
- shared cached sections
- uncached perf still a problem
- ppprofiler
  - benchmark
  - rblineprof
  - notification counts
  - memoryprofiler
  - markdown output
profiling numbers are relative
stackprof and stackprof-remote
- read between the lines, other stuff going on
- samples whats on stack at a given point
- doesnt allocate heap memory, doesnt greatly affect perf
- flamegraphs - tall and wide are places to optimize
sometimes it is your fault, for being cheap, sometimes you can throw money at the problem
start with a hitlist
- start at the top, work way down (most used -> least used)
- number of req * 90% response time = “total worst time”
- like newrelic already has
big wins are not the point
- small changes over time
- keep doing it
if you’re not failing, you’re not being honest
dont just make tools, learn to use them

tags:

OnyxRaven

Thoughts, tips, and code from a software engineer

RubyConf 2014 - Day 3

@schneems - Richard Schneeman - Text distance - github.com/schneems/going_the_distance bit.ly/going_the_distance

Benchmarking Ruby - github.com/davy/benchmarking_ruby - @davystevenson

Ruby performance secrets - Alexander Dymo

REAL WORLD RUBY PERFORMANCE AT SCALE - Aaron Quint - @aq - paperless post - github/quirkey /paperlesspost

RubyConf 2014 - Day 3

Keynote - Social Coding Contract

@schneems - Richard Schneeman - Text distance - github.com/schneems/going_the_distance bit.ly/going_the_distance

Benchmarking Ruby - github.com/davy/benchmarking_ruby - @davystevenson

Ruby performance secrets - Alexander Dymo

REAL WORLD RUBY PERFORMANCE AT SCALE - Aaron Quint - @aq - paperless post - github/quirkey /paperlesspost