Categories
Uncategorized

Data science reading list for Monday, October 29, 2018: The worst data science article, 5 basic stats concepts you need to know, Bayes, democratization, and web scraping

A terrible “data skills” article that you should read, but only as a warning

I remember the hype that surrounded the web in the late 1990s. I also remember the copious amount of well-intentioned misinformation that made the rounds as writers attempted to capitalize on that hype. It’s now data science’s turn, if this bit of “advertorial” in Harvard Business Review — Prioritize Which Data Skills Your Company Needs with This 2×2 Matrix — is any indication.

Written by Chris Littlewood, chief innovation and product officer of filtered.com (I’m not going to help them by linking to their site), a company that purports to use AI to “lift productivity by making learning recommendations”, the article clearly highlight’s the author’s ignorance and HBR’s willingness to publish any article that has to do with data or data science. To the credit of the readers, a number of them registered with the site simply to be able to post comments pointing out how nonsensical the article was.

Treat this article as an object lesson in technology hype, as well a sign that data science skills are seen as valuable.

The 5 Basic Statistics Concepts Data Scientists Need to Know

Forget that the article mentioned above said that mathematics and statistics aren’t useful data skills — you can’t do data science without them! You’ll need to understand these 5 concepts (in addition to others):

  1. Statistical features
  2. Probability distributions
  3. Dimensionality reduction
  4. Under- and oversampling
  5. Bayesian statistics

This article in Towards Data Science provides a brief overview.

Data Skeptic: Bayesian Updating

One of the better data science podcasts out there is Kyle Polich’s Data Skeptic, which has been around since 2014 and has over 400 episodes. The podcast features short mini-episodes explaining high level concepts in data science, and longer interview segments with researchers and practitioners.

I’ve just started working my way through this podcast, and have used the example in episode 5, Bayesian Updating, to explain Bayes’ Theorem to people who avoiding studying probability and stats. Give it a listen, then check out the rest of the podcast episodes!

The Democratization of Data Science

Here’s a Harvard Business Review article on data science that’s actually worth reading:

Intelligent people find new uses for data science every day. Still, despite the explosion of interest in the data collected by just about every sector of American business — from financial companies and health care firms to management consultancies and the government — many organizations continue to relegate data-science knowledge to a small number of employees.

That’s a mistake — and in the long run, it’s unsustainable. Think of it this way: Very few companies expect only professional writers to know how to write. So why ask only professional data scientists to understand and analyze data, at least at a basic level?

Data Science Skills: Web scraping using python

Another article from Towards Data Science:

One of the first tasks that I was given in my job as a Data Scientist involved Web Scraping. This was a completely alien concept to me at the time, gathering data from websites using code, but is one of the most logical and easily accessible sources of data. After a few attempts, web scraping has become second nature to me and one of the many skills that I use almost daily.

In this tutorial I will go through a simple example of how to scrape a website to gather data on the top 100 companies in 2018 from Fast Track. Automating this process with a web scraper avoids manual data gathering, saves time and also allows you to have all the data on the companies in one structured file.

Categories
Uncategorized

Maritime DevCon: June 18th in Moncton

martime dev con

If you’re a developer out in the Maritimes, you might want to check out Derek Hatchard’s Maritime Dev Con, which takes place on June 18th in Moncton. It’s a single-afternoon, two-track conference – which means you should be able to take time out to attend it – covering a number of topics including:

  • .NET and ASP.NET
  • Java
  • iPhone development
  • Ruby
  • Python
  • Groovy
  • NoSQL and MongoDB
  • “Rockstar Estimating Skills”

Maritime Dev Con has a registration fee that won’t hurt your wallet – it’s a mere CAD$19!

I’m a big fan of small, regional gatherings like Maritime Dev Con and its western counterpart Prairie DevCon. Each region has its own specializations and needs that a by-locals, for-locals conference can do a better job of serving, and the smaller size of these conferences allows for more back-and-forth between audience and presenter, and between attendees. Support your local conference!

This article also appears in Canadian Developer Connection.

Categories
Uncategorized

If You Speak Python, Science Also Needs Your Brain!

Samuel L. Jackson from "Snakes on a Plane" talking on a phone and holding a snake: "Python! Do you speak it?"

Yesterday, I wrote about an opportunity to help a University of Toronto grad student build a tool to help programmers build and debug database queries. Today, I present a similar opportunity.

Once again, it’s one of the grad students of Greg Wilson, whose name you might recognize – he’s the co-editor of the must-read O’Reilly book Beautiful Code, and this student, Mike Conley, is looking for Python (and yes, IronPython counts) coders. If you’re an undergraduate programming student in the Greater Toronto Area, you’ll want to check this out:

Subjects are needed to take part in a study concerning peer evaluation and grading. Participants will be asked to complete small, fun programming exercises, and peer grade other submissions. Time needed for the study is approximately 1.5hrs and takes place in person in the Bahen Center at the University of Toronto.

Subjects should be undergraduate computer science students with programming experience in Python.

Participants will be entered into a draw for a $100 Best Buy gift card.

If you think you can help Mike with his project, drop him a line!

This article also appears in Canadian Developer Connection.

Categories
Uncategorized

Open Source Language Roundtable Webcast: Wednesday, July 22nd

oscon_language_roundtable

O’Reilly’s conference on Open Source, OSCON, takes place this week in San Jose, California. One of the events taking place at OSCON is the Open Source Language Roundtable, the abstract for which appears below:

We all have our favorite languages in our tool-belt, but is there a ‘best’ overall language? If anyone can hash that out, it will be the members of this roundtable discussion, some of the stars of the open source language space. This wide-ranging session, hosted and moderated by the O’Reilly Media editorial staff, and broadcast live on the web, will try to identify the best and worst features of each language, and which are best for various types of application development.

The roundtable will me moderated by O’Reilly Media’s James Turner and will cover the following languages, listed below with the corresponding panelist:

  • Java: Rod Johnson (SpringSource)
  • Perl: Jim Brandt (Perl Foundation)
  • PHP: Laura Thomason (Mozilla)
  • Python: Alex Martelli (Google)
  • Ruby: Brian Ford (Engine Yard)

You can catch this roundtable even if you’re not going to be at OSCON because O’Reilly is webcasting the event. It takes place this Wednesday, July 22nd at 10pm EDT (7 pm Pacific) and is expected to run 90 minutes. It costs nothing to catch the webcast and you’ll even be able to ask the panelists questions via chat, but you’ll need to register.

Categories
Uncategorized

Hanselman Podcast on IronPython / A Great Book Deal

This article also appears in Canadian Developer Connection.

Cover of "IronPython in Action"

When I got into web development, I considered myself a latecomer to the game, and that was in 1999. In the five years I’d been working professionally as a developer, my apps were strictly desktop – multimedia CD-ROM stuff done in Director (then a product of Macromedia) and business productivity apps written in pre-.NET VB and Java-a-la-JBuilder.

The company with whom I’d landed a contract had a contrarian tech lead. It seemed that the web app world was building their stuff on Linux, Perl and MySQL, and this guy was all about BSD, Python and PostgreSQL. In 1999 terms, he was a freak even amongst the freaks.

I had a pretty full schedule that summer, followed by a one-week vacation at Burning Man, followed by the start of my contract at this new company. The tech lead wanted me to be ready to do some coding on my first day in, so I brought a copy of O’Reilly’s Learning Python along with my laptop to Black Rock Desert, hoping to squeeze in some hacking time at the big desert bacchanal. Luckily, Burning Man is pretty mellow during the day, and in an additional stroke of luck, the neighbouring camp was sharing AC power from their “eggbeater” windmill. I learned Python by writing sample apps in an extremely distracting environment, and because of that, I fell quite in love with the language. Any language that you can learn while naked people playing the tuba on unicycles are circling you has to be a good one.

That’s why I’m glad to see that implementations like IronPython exist, and that they tie into things like the .NET framework and Silverlight. IronPython’s performance is quite close to standard Python, and I use it along with IronRuby as my scripting language for automating tasks and doing little “housekeeping” things on my systems. I’m not using IronPython to the degree that Michael Foord is – he’s using it for full-on .NET applications instead of C# or VB! Scott Hanselman talks with him about working with IronPython as his primary development language in the latest edition of his Hanselminutes podcast.

As an added bonus, the blog entry for the podcast has a special limited-time coupon code that will save you 40% off the price of Manning Publications’ IronPython in Action (which Foord co-wrote), and the discount applies to both the dead-tree and PDF versions of the book. At 40% off, the PDF version is a mere USD$16.50 (CAD$20.14 at the time of this writing).

Categories
Uncategorized

Named Parameters in Method Calls: Python Si, Ruby No

"Hello My Name Is" sticker In an earlier article, Default and Named Parameters in C# 4.0 / Sith Lord in Training, I wrote about how C# 4.0 – that’s the version coming out with the next release of Visual Studio, known as Visual Studio 2010 – is going to provide support for named parameters.

In that article, I also incorrectly stated that Ruby supported named parameters. Luckily, Jörg W Mittag spotted my mistake an corrected me in a comment. I’ve since corrected the article and thought I’d show you how I got it wrong in the first place.

Ruby and My Named Parameter Goof

I had a vague recollection of Ruby accepting named parameters. I figured I’d be empirical and fired up irb – the Ruby REPL shell – and put together a quick little method to see if the recollection was correct:

# Ruby 1.8.6
def test_named_params(first, second)
    puts "#{first}\n#{second}"
end

Once put together, I made some test calls to the method:

# irb session (Ruby 1.8.6) irb(main):> test_named_params("alpha", "beta") alpha beta

=> nil irb(main):> test_named_params(first = "alpha", second = "beta") alpha beta

=> nil

Seeing that the interpreter didn’t choke on that named parameter call, I thought to myself “Vague recollection confirmed, Ruby supports named parameters!” and wrote the blog article.

Had my brain actually been firing on all cylinders, I would’ve given the method a proper test by providing the named parameters out of the order in which they appear in the method signature. Here’s what I would’ve seen:

# irb session (Ruby 1.8.6)
irb(main):> test_named_params(second = "alpha", first = "beta")
alpha
beta

=> nil

Uh-oh. If named parameters worked, the first output line would be “beta” and the second would be “alpha”. Clearly something’s wrong with my recollection.

Let’s try some non-existent named parameters – say, ones involving current entertainemtn news headlines — just to see what happens:

# irb session (Ruby 1.8.6)
irb(main):> test_named_params(lindsay_lohan_dui = "alpha",
jim_cramer_smackdown = "beta")

alpha

beta

=> nil

Even with nonsensical named parameters, the method is still accepting the values in order. Why is that?

Just about everything in Ruby has a return value (which can be anything, including nil). You can see for yourself in irb – here’s a quick do-nothing method definition:

irb(main)> def doNothing
irb(main)> end
=> nil

As you can see. defining a method returns a value of nil.

As Jorg pointed out, Ruby assignment statements return a value: the value used in the assigment. Once again, for proof, I’ll use an example from an irb session. In the example below, assigning the string "alpha" to the variable first also returns the string "alpha":

# irb session (Ruby 1.8.6)
irb(main):> first = "alpha"
=> "alpha"

In the call to test_named_params, the Ruby interpreter was interpreting my “named parameters” as assignment statements. first = "alpha" evaluates to plain old "alpha", but so does second = "alpha" (and for that matter, so does lindsay_lohan_dui = "alpha"). Each assignment statement in my parameter list was evaluated, and then those values were passed to method in positional order.

Python Supports Named Parameters

After getting the comment from Jorg and correcting my article, I wondered why I thought Ruby supported named parameters. Then it hit me – it’s Python.

So I fired up the Python REPL and put together this quick little method:

# Python 3.0
def test_named_params(first, second):
    print("%s\n%s" % (first, second))

And this time, I decided to be a little more thorough in my testing:

# Python 3.0 REPL
>>> test_named_params("alpha", "beta")
alpha
beta

>>> test_named_params(first = "alpha", second = "beta")
alpha
beta

>>> test_named_params(second = "alpha", first = "beta")
beta
alpha

And some additional searching on the web confirmed that yes, Python method calling does in fact support named parameters.

So in conclusion, when it comes to named parameters, it’s Python si, Ruby no…and C# pronto.

Categories
Uncategorized

Security Through Python

While this is actually not about computer security nor about the Python programming language (which I’ll cover in some articles soon), I thought my Pythonista readers might find it amusing…

Guy freaking out at seeing two pythons on the dashboard of a car

…for the real story behind this photo, see this article in the Daily Mail.