Search for a String in Multiple Files

Search for a String in Multiple Files
In support of an earlier post Fetching NHL Play by Play game data, I was recently asked how could one quickly search for a specific string in multiple JSON files recursively? Well if you are running macOS or Linux grep is your best friend! In a nutshell grep prints lines that contain a match for a pattern. The following is a sample grep and cut command that will list out (output) the games (files) that contains the following string -> “Montréal Canadiens”:

Want to try it out! You can download and extract sample data which contains all the play by play games from the 2016-2017 season

grep -H -R "Montréal Canadiens" /data/20162017/*.json | cut -d: -f1

Search for a String in Multiple Files

The above command outputs the path and file name, if you just want the file name then use the command below (uses awk – text processing and data extraction):

grep -rl "Montréal Canadiens" /data/20162017 | awk -F/ '{ print $NF }'

Search for a String in Multiple Files

If you are running Windows you could use Findstr or PowerShell … The following is the findstr command that will output the files where the following string “Montréal Canadiens” is present:

findstr /m /s /i /p /c:"Montréal Canadiens" "C:\data\20162017\*.*"

And for Powershell use the following command:

Get-ChildItem -Recurse C:\data\20162017\*.* | Select-String -Pattern "Montréal Canadiens" | Select-Object -Unique Path

Or a shorter version using aliases:

dir -recurse C:\data\20162017\*.* | sls -pattern "Montréal Canadiens" | select -unique path

*If you want just the filenames, not full paths, replace Path with Filename.

Some explanation about the Powershell commands:

  • Get-ChildItem-Recurse *.* returns all files in the current directory and all its subdirectories.
  • Select-String-Pattern “Montréal Canadiens” searches those files for the given pattern “Montréal Canadiens”.
  • Select-Object-Unique Path returns only the file path for each match; the -Unique parameter eliminates duplicates.

That’s it! While scratching the surface and not going in details about the different arguments for each tools (that’s for you to further explore!) I demonstrated how it could be accomplished. If you are a data wrangler who manipulates data files, or an aspiring data engineer | data scientist… these tools and commands should be in your toolbox.


comments powered by Disqus