Categories
Uncategorized

Data science reading list for Wednesday, November 7, 2018: The job — working together to build trust, the kinds of data scientist, why mothers should do data science, and why not to be a generalist

To build trust in data science, work together

From the Cornell Chronicle:

As data science systems become more widespread, effectively governing and managing them has become a top priority for practitioners and researchers. While data science allows researchers to chart new frontiers, it requires varied forms of discretion and interpretation to ensure its credibility. Central to this is the notion of trust – how do we reliably know the trustworthiness of data, algorithms and models?

The kinds of data scientist

From Harvard Business Review:

In 2012, HBR dubbed data scientist “the sexiest job of the 21st century”. It is also, arguably, the vaguest. To hire the right people for the right roles, it’s important to distinguish between different types of data scientist. There are plenty of different distinctions that one can draw, of course, and any attempt to group data scientists into different buckets is by necessity an oversimplification. Nonetheless, I find it helpful to distinguish between the deliverables they create. One type of data scientist creates output for humans to consume, in the form of product and strategy recommendations. They are decision scientists. The other creates output for machines to consume like models, training data, and algorithms. They are modeling scientists.

Three reasons why mothers should consider a career in data science

From LinkedIn:

For several women, the time during their pregnancy is one of overwhelming happiness, and at times, worry. We worry about things like childbirth and not knowing what to do with our baby after he or she is born. Women with careers have an added worry; we think about how this adorable new addition to our family will impact our careers.

One thing that I’ve discovered over the past four years is that having certain skills can reduce uncertainty around our careers. I’m a mom of two little girls and have a career in data that has provided me with the more flexibility and less stress. Below, I outline the three reasons why mothers should consider a career in data science.

Why you shouldn’t be a data science generalist

From Towards Data Science:

I work at a data science mentorship startup, and I’ve found there’s a single piece of advice that I catch myself giving over and over again to aspiring mentees. And it’s really not what I would have expected it to be.

Rather than suggesting a new library or tool, or some resume hack, I find myself recommending that they first think about what kind of data scientist they want to be.

The reason this is crucial is that data science isn’t a single, well-defined field, and companies don’t hire generic, jack-of-all-trades “data scientists”, but rather individuals with very specialized skill sets.

To see why, just imagine that you’re a company trying to hire a data scientist. You almost certainly have a fairly well-defined problem in mind that you need help with, and that problem is going to require some fairly specific technical know-how and subject matter expertise. For example, some companies apply simple models to large datasets, some apply complex models to small ones, some need to train their models on the fly, and some don’t use (conventional) models at all.

Each of these calls for a completely different skill set, so it’s especially odd that the advice that aspiring data scientists receive tends to be so generic: “learn how to use Python, build some classification/regression/clustering projects, and start applying for jobs.”

 

Categories
Uncategorized

A new mantra worth considering

Click the image to see it at full size.

Categories
Uncategorized

Many businesses still follow Jurassic Park’s model for IT spending…

(They also should’ve done a better job vetting the one IT guy they hired.)

Categories
Current Events Tampa Bay Uncategorized

What’s happening in the Tampa Bay tech/entrepreneur/nerd scene (Week of Monday, November 5, 2018)

Every week, I compile a list of events for developers, technologists, tech entrepreneurs, and nerds in and around the Tampa Bay area. We’ve got a lot of events going on this week, and here they are!

Monday, November 5

 

Tuesday, November 6

I’m not eligible to vote in the U.S. (I’m a Canadian citizen here on a green card), but if you are, go vote before you do anything extracurricular today!

 

 

Wednesday, November 7

Thursday, November 8

Friday, November 9

Saturday, November 10

 

Sunday, November 11

Categories
Uncategorized

Data science reading list for Friday, November 2, 2018: Education, aspirations, and job descriptions

With Student Interest Soaring, Berkeley Creates New Data-Sciences Division

From Chronicle of Higher Education:

Berkeley’s move follows MIT’s announcement last month that it was investing $1 billion in a new college of artificial intelligence. But leaders at Berkeley say their disclosure of the division today was driven by an imminent international search for a director, who will hold the title of associate provost, putting the program on an institutional par with Berkeley’s colleges and schools. They explain that in creating a division rather than a new college, they are reflecting the way data science has become woven into every discipline.

Berkeley has been planning the division for four years, said David Culler, interim dean for data sciences, and has been rolling it out incrementally through a new data-sciences major approved last year, and corresponding growth in data-science courses. Enrollment in “Foundations of Data Science” has soared from 100 in 2015 to 1,300 in 2018. Enrollment in the upper-level “Principles and Techniques of Data Science” has grown from 100 in 2016 to 800 students. The emerging program has served as a “pilot” for the division, which is now set to evolve under a new director.

The core of the data-science curriculum, said Culler, is computer science and statistics, with additional depth courses in optimization and visualization. But students will also be required to have a “domain emphasis” that would most likely synthesize material from various other departments. For instance, a data-science student’s exploration of social inequality might include courses in sociology, ethnic studies, economics, and philosophy.

‘With a basic degree, you can learn data science on the job’

From SiliconRepublic:

Next week at the National Analytics Conference, [Jennifer Cruise from the Aon Centre for Innovation and Analytics] will be on a panel where she expects to discuss several aspects and challenges that businesses face relating to data, including how to deal with the abundance of information that is now available and, of course, the key issues of skills and resources.

“You can only truly exploit the data if you get the right people in that space, and there’s a double whammy,” she said. “On the one hand, you have a lack of hands-on resources. Skilled data scientists are hard to come by and things are changing quickly, so people who are qualified need to stay on top of things. Then, you also have a gap in the leadership space – the people who can advise you how to turn [data] into revenue for your company, or how to use your data to become more operationally efficient.”

8 common questions from aspiring data scientists, answered

From Tech in Asia:

So, you want to be become a data scientist? Great. But you have zero experience and have no clue how to get started in this field. I get it. I’ve been there and I definitely feel you. This is why this post is for you.

All the questions below came from the community through my LinkedIn post, email, and other channels. I hope that by sharing my experience, you will be enlightened on how to pursue a data science career and make your learning journey fun.

OPINION: How to craft effective data science job descriptions

Graphs made with vintage wooden blocks , pawns and other vintage wooden toys. Compartments with a bar chart, wooden lines and pawns.

From ComputerWorldHK:

In today’s data science job market, demand far outstrips supply, said Chris Nicholson, co-founder and CEO of artificial intelligence and deep learning company Skymind, and co-creator of the open source framework Deeplearning4j. That means organizations must resist the temptation to seek candidates with every last required data science skill in favor of hiring for potential and then training on the job, he said.

“A lot of data science has to do with statistics, math and experimentation—so you’re not necessarily looking for someone with a computer science or software engineering background, though they should have some programming experience,” Nicholson said. “You want folks from physical science, math, physics, natural sciences backgrounds; people who are trained to think about statistical ideas and use computational tools. They need to have the ability to look at data and use tools to manipulate it, explore correlations and produce data models that make predictions.”

Because a data scientist’s job isn’t to engineer entire systems, minimal programming experience is fine, Nicholson said. After all, most organizations can rely on software engineering, DevOps, or IT teams to build, manage and maintain infrastructure in support of data science efforts. Instead, strong data science candidates often have a background in science and should be proficient with data science tools in one or more different stacks.

Categories
Uncategorized

Data science reading list for Thursday, November 1, 2018: Free data science books for beginners with limited budgets

If you want to get into data science with a limited budget, this reading list is for you — it’s all about data science and related books that you can get for free!

Allen B. Downey’s free Python and math books

          

Allen B. Downey is a believer in free books, and has a whole article explaining why. Here are its concluding paragraphs:

A free book is the root of a tree of potential adaptations, translations, and entirely new books that branch out from the original. Free books transform readers into proof-readers, editors, anthologists, correspondents, contributors, collaborators, writers and authors.

If you are thinking about writing a book, start soon, release early and often, give up control but do a little policing, keep a contributor list, and make it free.

He’s written a number of free books, and the ones most applicable to data science are:

Bayesian Methods for Hackers

Bayesian Methods for Hackers is described as “an intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view”, and its key chapters are available online, for free, in Jupyter notebook form. The method for reading it that the authors recommend is to clone the book’s Jupyter notebook repo and run it on your local machine.

The Python Data Science Handbook

Another Python/data science book in Jupyter notebook form! This one assumes that you’re familiar with Python, as it’s all about the libraries that are most used for data science and machine learning: NumPy, Pandas, Matplotlib, and Scikit-Learn.

You can read it online.

R Programming for Data Science

From the book site:

This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science.

This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code. With the fundamentals provided in this book, you will have a solid foundation on which to build your data science toolbox.

This book is available for free in PDF, EPUB, and MOBI formats (there’s a $20 suggested price, but you can pay what you want).

R for Data Science

From the book site:

This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.

You can read it online.

The Art of Data Science

From the book site:

This book writes down the process of data analysis with a minimum of technical detail. What we describe is not a specific “formula” for data analysis, but rather is a general process that can be applied in a variety of situations. Through our extensive experience both managing data analysts and conducting our own data analyses, we have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of our experience in a format that is applicable to both practitioners and managers in data science.

This book is available for free in PDF, EPUB, and MOBI formats (there’s a $20 suggested price, but you can pay what you want).

Categories
Uncategorized

Jupyter rising

First of all, if you’re interested in a one-day conference that also gets you a chance to enjoy Florida’s warm winter and Disney World as well, check out DevFest Florida 2019. It takes place on Saturday, January 19, 2019, and I’ll be giving the Jumping into Jupyter Notebooks presentation, which will largely be a hands-on code-along-with-me exercise (or just watch, if you like) showing just what you can do with a Jupyter notebook, Python, and some data. If you know a little Python and are new to Jupyter notebooks, data science, or both, you’ll want to catch my presentation!

At this point, you might be asking “What are Jupyter Notebooks, anyway?”

 

Jupyter notebooks are a kind of computational notebook, a class of software that creates documents that mix:

  • Stuff that you’d expect to find in a typical document, such text, pictures, and multimedia, and
  • stuff that you wouldn’t expect to find in a typical document, such as code and its output.

To borrow a paragraph from an article that just appeared in nature, Why Jupyter is data scientists’ computational notebook of choice:

Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document. Computational notebooks have been around for decades, but Jupyter in particular has exploded in popularity over the past couple of years. This rapid uptake has been aided by an enthusiastic community of user–developers and a redesigned architecture that allows the notebook to speak dozens of programming languages — a fact reflected in its name, which was inspired, according to co-founder Fernando Pérez, by the programming languages Julia (Ju), Python (Py) and R.

You may want to think of Jupyter notebooks as a wiki with a REPL. Its contents are divided into cells, which contain either:

  • Narrative content, which you enter in Markdown, and
  • Code — and if it runs, its output — which you can enter in Python or nearly four dozen other programming languages.

Jupyter notebooks’ format lends itself well to a number of research and educational uses. Once again, from the nature article:

Computational notebooks are essentially laboratory notebooks for scientific computing. Instead of pasting, say, DNA gels alongside lab protocols, researchers embed code, data and text to document their computational methods. The result, says Jupyter co-creator Brian Granger at California Polytechnic State University in San Luis Obispo, is a “computational narrative” — a document that allows researchers to supplement their code and data with analysis, hypotheses and conjecture.

For data scientists, that format can drive exploration. Notebooks, Barba says, are a form of interactive computing, an environment in which users execute code, see what happens, modify and repeat in a kind of iterative conversation between researcher and data. They aren’t the only forum for such conversations — IPython, the interactive Python interpreter on which Jupyter’s predecessor, IPython Notebook, was built, is another. But notebooks allow users to document those conversations, building “more powerful connections between topics, theories, data and results”, Barba says.

Researchers can also use notebooks to create tutorials or interactive manuals for their software. This is what Mackenzie Mathis, a systems neuroscientist at Harvard University in Cambridge, Massachusetts, did for DeepLabCut, a programming library her team developed for behavioural-neuroscience research. And they can use notebooks to prepare manuscripts, or as teaching aids. Barba, who has implemented notebooks in every course she has taught since 2013, related at a keynote address in 2014 that notebooks allow her students to interactively engage with — and absorb material from — lessons in a way that lectures cannot match. “IPython notebooks are really a killer app for teaching computing in science and engineering,” she said.

Ed. note: Before they were called Jupyter notebooks, they were called IPython notebooks.

Jupyter notebooks have recently received big boosts from big names. One of them is economist Paul Romer, who won the 2018 Nobel Prize in Economics — he’s a convert from Mathematica to Python and Jupyter notebooks:

Another big Jupyter booster isn’t from academia — it’s Netflix, where Jupyter notebooks are the most popular data tool:

Keep an eye on Jupyter notebooks. I’m pretty sure you’ll see them more often quite soon.

Getting into Jupyter notebooks

If you’re interested in trying them out, you may find these links handy: