Search for a String in Multiple Files

Search for a String in Multiple Files
In support of an earlier post Fetching NHL Play by Play game data, I was recently asked how could one quickly search for a specific string in multiple JSON files recursively? Well if you are running macOS or Linux grep is your best friend! In a nutshell grep prints lines that contain a match for a pattern. The following is a sample grep and cut command that will list out (output) the games (files) that contains the following string -> “Montréal Canadiens”:

Want to try it out! You can download and extract sample data which contains all the play by play games from the 2016-2017 season

grep -H -R "Montréal Canadiens" /data/20162017/*.json | cut -d: -f1

Search for a String in Multiple Files

The above command outputs the path and file name, if you just want the file name then use the command below (uses awk – text processing and data extraction):

grep -rl "Montréal Canadiens" /data/20162017 | awk -F/ '{ print $NF }'

Search for a String in Multiple Files

If you are running Windows you could use Findstr or PowerShell… The following is the findstr command that will output the files where the following string “Montréal Canadiens” is present:

findstr /m /s /i /p /c:"Montréal Canadiens" "C:\data\20162017\*.*"

And for Powershell use the following command:

Get-ChildItem -Recurse C:\data\20162017\*.* | Select-String -Pattern "Montréal Canadiens" | Select-Object -Unique Path

Or a shorter version using aliases:

dir -recurse C:\data\20162017\*.* | sls -pattern "Montréal Canadiens" | select -unique path

*If you want just the filenames, not full paths, replace Path with Filename.

Some explanation about the Powershell commands:

  • Get-ChildItem-Recurse *.* returns all files in the current directory and all its subdirectories.
  • Select-String-Pattern “Montréal Canadiens” searches those files for the given pattern “Montréal Canadiens”.
  • Select-Object-Unique Path returns only the file path for each match; the -Unique parameter eliminates duplicates.

That’s it! While scratching the surface and not going in details about the different arguments for each tools (that’s for you to further explore!) I demonstrated how it could be accomplished. If you are a data wrangler who manipulates data files, or an aspiring data engineer | data scientist… these tools and commands should be in your toolbox.


comments powered by Disqus