Category: Quick Tips (page 2 of 3)

Data science data type equivalents

In this post, we will explore equivalents of data types between the two popular languages of data science: R and Python.

Despite the fact that R has traditionally been the language favored by statisticians for data analaysis, Python has emerged as a serious contender in the last few years. Being a flexible, general-purpose programming language, Python has drawn attention from the community especially with the development of two particular packages, Numpy and Pandas. These packages offer support for the R-like data types, a list of which you can see below:

R Python
vector numpy.array
matrix/array numpy.matrix
dataframe pandas.DataFrame, pandas.Series (single column)
list list
factor pandas.DataFrame (with dtype category)

Why do programmers prefer to work on Macs?

Have you ever come across a coder working? Perhaps while getting morning coffee at a Starbucks, you may have seen a computer screen like this: white text, black background, and lots of words that probably make no sense to a bystander. But why is it that these people often work on Macs? To explain, I have to go into some history on programming.

Around 1970, the world received two technological developments that shaped programming of our modern day. These were the lingua franca programming language C, and the popular operating system UNIX which was written with it. Since then, C has become and remained one of the most popular languages. UNIX, however, was a proprietary system and was not accessible to the common user. For that reason, groups such as the Free Software Foundation (FSF) started rewriting free versions of UNIX software in the 1980s. These efforts were combined with the Linux operating system in the early 1990s, and became the popular open source GNU/Linux ecosystem of today, just in time for the dotcom boom.

When the World Wide Web spread, open source software popularized in tandem due to its accessibility and availability. Examples included the popular web server Apache, the web programming language PHP, and the open source database MySQL. This software, again due to availability, was used on GNU/Linux and together made up the LAMP stack. These technologies continue to power a large percentage of web applications today.

When developers work on a local copy of code on their computer, they try to mirror the components of a live system closely for compatibility. So in part, the preference is due to this legacy — both for compatibility, and to be able to run on a familiar system. Mac is also a UNIX system under the hood, and offers many of the same command-line utilities available on any UNIX that are useful to a programmer.

Other common reasons that can’t be overlooked, of course, hardware (speed, battery life, etc.), software compatibility (Apple makes both its own hardware and software), and simply community preference.

Running long processes without needing to be logged in

Ever need to index a large amount of objects overnight but don’t want to leave your computer in the office while is working?  Maybe you have a lot of commands you need to type and if your internet connection goes down, you have to start over from scratch?

Here is a quick and painless way to run a quick script without needing to be logged in while you’re waiting for it to finish.

First, VPN and log into the server you want to run a long process.

Then, enter the following command:

screen -R -D

Next, simply run the script that will take a really long time:

./this_script_will_take_many_days_to_finish.sh

Detach from the proccess with the following keystroke: “CTRL-A D

and logout:

exit

Check the progress

Anytime you want to check the progress of the script:

screen -R -D

And again, use “CTRL-A D” to detach from the process again.

There you go! If you have a long process or you simply want to be able to continue your work even if your internet connection dies, give this a shot!

Websites for Coding Tutorials

Looking into starting to code? Here are two websites for some online hands-on courses that will help you get started:

Codecademy is of course a popular choice. There are numerous free courses focused on web development, encompassing HTML, CSS, Javascript, PHP, Ruby, Python, SQL and more. The lessons go step-by-step, are interactive, and do a pretty good job in simplifying and explaining the concepts. The website also has a fairly active forum community for questions if you get stuck. This will be a good sampler, even for beginners.

For data science oriented courses, DataCamp offers some free courses in Python and R, the two most popular languages used in the area. If Codecademy’s interactive format appeals to you, this website will feel very similar. Again, there are a fair number of free courses to get you started.

Getting into coding can be overwhelming at first, and the hands-on experience was helpful for me personally to get acclimated and more comfortable learning. For that reason, these two websites make it easy to take the first step. And remember: if you get stuck, you can always contact us at the Research Commons for support. 

Data Visualization Resources

As a developer or researcher, you may need to create an engaging visualization to present data at some point or another. This is meant to be a quick-and-dirty reference for those needs.

Here is a list that can be referenced for data visualization tools: http://selection.datavisualization.ch. While not exhaustive, this can be used as a starting point in searching for the right tool for the job. Among those listed are two Javascript libraries that I have used myself: D3.js, a popular library for creating interactive vector-based visualizations, and Leaflet, a lightweight library for creating visually-appealing maps. The website also lists some datasets and showcases that may be useful.

Perhaps more important is selecting the right type of visualization for your data. For that, here is another post that goes into further detail that I have found helpful: https://infogram.com/page/choose-the-right-chart-data-visualization.

And if you have questions regarding data visualization, the Research Commons is a great resource! 

Data Analysis: R or Python?

In recent years, Python has emerged as a serious contender to R in the ecosystem of data science. But can it be considered a replacement for R?

The answer is simple: neither language can be declared as “better” than the other. While there are many differences between the two languages, they can generally be used to achieve similar results. Both are open source and have an active and supportive communities. It can be said that due to their backgrounds, R is more statistically-oriented, while Python is more general purpose.

So which one should you use? That will depend on the problem you are trying to solve, what tools you are familiar with, and personal preference. If you are trying to make that choice, here is a good starting point that will go into more detail in comparing the two languages for data analysis: https://www.dataquest.io/blog/python-vs-r

 

Public Data from Tech Giants

The tech industry has recently come under fire regarding its data collection and retention practices. While I will leave that topic for them to sort out, I will instead focus on some of the more benevolent, interesting datasets that they provide to the public.

During my research on Digital Humanities, I came across a few data sources from Google, Amazon, and Facebook that may be of interest to researchers, and to those that simply like to explore. While not an exhaustive list, I will share some of these here:

  • Google Public Data Explorer: An online interface to search and visualize some of the datasets hosted by Google.  To my knowledge, the raw data is not downloadable.
  • Public Datasets on Google Cloud: These are accessible and analyzable on Google Cloud using a Google account; however, there is a monthly quota on queries, beyond which charges are incurred.
  • Google Trends / Google Correlate: Lets the user explore trends in Google searches temporally and geographically. Google Correlate further builds on Google Trends to help find and compare patterns across searches that align with real-world events.
  • Google Ngram Viewer: Allows the user to find the frequency of a part-of-speech across corpora generated from items in Google Books, dated 1500-2008. Raw data can be downloaded, but keep in mind that it is very large and consider disk space/network bandwidth before doing so.
  • AWS Open Data Registry: A repository of datasets, along with detailed usage examples, available on AWS. These are hosted by third parties, and will each have their own license agreement.
  • Facebook Graph API: This one is more technically-demanding than the others. It represents the underlying objects within the Facebook platform (users, posts, media, etc.) and gives a glimpse into how they are connected to one another.

Lint Your Code

In the grand scheme of things, writing code is easy. The challenge is writing clean code, to which there are many characteristics, from readability to conforming to best practices (e.g. principles such as SOLID).

One step you can take in writing cleaner code is using a linter, which is a tool that scans your code to ensure that it is free of syntactic and stylistic errors. Linters can potentially help you catch bugs before you even run tests against your code. Take them like a free second set of eyes – and yes, there are many free linters out there that you can use as a part of your workflow.

There are tools online that you can use, and certain text editors and IDEs that have some of this functionality built-in, but below are some editor-independent examples for some of the popular languages (and more) of today.

CSS: csslint
Docker: hadolint
HTML: HTMLHint
Javascript: eslint, jshint
JSON: jsonlint
PHP: phplint
Python: pycodestyle, pylint
Ruby: RuboCop
SCSS: scss-lint
Shell: ShellCheck
SQL: SQLint
XML: xmllint (part of libxml2)
YAML: yamllint

Bookmarking the staff intranet

The staff intranet, which includes NewsNotes, links to the document registry, staff directory,  forms and staff resources, is no longer a link on the public (patron-facing) web site. If you haven’t already bookmarked it, the address for the intranet is:  go.osu.edu/osulstaff

 

Easy screen recording in Windows using the Microsoft Office Mix Add-in

Easy screen recording in Windows using the Microsoft Office Mix Add-in

 

As instructors, help desk operators, support staff, and researchers – we often find ourselves in a position where the easiest way to describe a process or problem would be in real-time, with video.  If you work on a Mac, this is easy.  MacOS includes a program called QuickTime, which includes an option to record your screen (with or without sound).  However, for Windows users, this process has always been a bit more difficult.  Unless you are a gamer (and use an XBox), Windows doesn’t provide built-in software to facilitate screen recording, which means individuals are often left to look for their own solutions or software.  And if you are like me, this has meant trying or purchasing a wide range of products throughout the years to find something that works. 

But I do create a lot of videos.  One of the hazards of writing software or doing research, is that people ask questions – and often, a video may be the best way to provide an answer.  And for software, I find that a video tutorial provides a level of clarity that is difficult to get with just text.  So, when I was asked recently what software I use to create these quick tutorials — I wanted to share.

The Office Mix Add-in

About a year ago, I stumbled onto a new add-in that was being developed for called the Office Mix Add-in.  In PowerPoint, the plug-in was part of a new PowerPoint Template for creating interactive presentations, which was interesting, but what I found more interesting, was that that this plug-in provided a way to create high quality screen captures optimized in mp4 format (great for sharing on YouTube). 

So how does it work?  Well, the first thing you need to do is get the add-in.  You can get the add-in from the Microsoft Office Website at: https://mix.office.com/en-us/Home

Once the Office Mix add-in has been installed, you can start using it.  To use the add-in:

  • Open PowerPoint
  • Start a blank Presentation and click on Mix Tab

    Office Mix Plugin Toolbar

    Screenshot of the MS Office-Mix Add-in Toolbar

The Office Mix Add-in has a lot of options available – including helping to facilitate the generation of a variety of quizzes or the ability to convert your presentation to a video.  However, if you are looking to create a screen recording, you would select the Screen Recording option.

When selected, you’ll see the following control added to the top of your primary monitor:

Office Mix Add-in Recording Menu

Office Mix Add-in Recording Menu

Here, you can select the size of the screen to capture, if you want it to capture audio from your microphone, and if the recording should capture your mouse movements.  Once you’ve set a recording area, the record option is enabled.  Click record, and start recording. 

When the recording is completed, the video will be embedded into your blank PowerPoint session.  If you just want the video, right click on it, and select the option, save to media.   Otherwise, the video will be embedded as part of your PowerPoint session.

Office Mix Add-in: Save As Video

Office Mix Add-in: Save As Video

 

–tr

 

 

Older posts Newer posts