Search for a String in Multiple Files
In support of an earlier post Fetching NHL Play by Play game data, I was recently asked how could one quickly search for a specific string in multiple JSON files recursively? Well if you are running macOS or Linux grep is your best friend! In a nutshell grep prints lines that contain a match for a pattern. The following is a sample grep and cut command that will list out (output) the games (files) that contains the following string -> “Montréal Canadiens”:
Want to try it out! You can download and extract sample data which contains all the play by play games from the 2016-2017 season
grep -H -R "Montréal Canadiens" /data/20162017/*.json | cut -d: -f1
The above command outputs the path and file name, if you just want the file name then use the command below (uses awk – text processing and data extraction):
grep -rl "Montréal Canadiens" /data/20162017 | awk -F/ '{ print $NF }'
If you are running Windows you could use Findstr or PowerShell … The following is the findstr command that will output the files where the following string “Montréal Canadiens” is present:
findstr /m /s /i /p /c:"Montréal Canadiens" "C:\data\20162017\*.*"
And for Powershell use the following command:
Get-ChildItem -Recurse C:\data\20162017\*.* | Select-String -Pattern "Montréal Canadiens" | Select-Object -Unique Path
Or a shorter version using aliases:
dir -recurse C:\data\20162017\*.* | sls -pattern "Montréal Canadiens" | select -unique path
*If you want just the filenames, not full paths, replace Path with Filename.
Some explanation about the Powershell commands:
- Get-ChildItem-Recurse *.* returns all files in the current directory and all its subdirectories.
- Select-String-Pattern “Montréal Canadiens” searches those files for the given pattern “Montréal Canadiens”.
- Select-Object-Unique Path returns only the file path for each match; the -Unique parameter eliminates duplicates.
That’s it! While scratching the surface and not going in details about the different arguments for each tools (that’s for you to further explore!) I demonstrated how it could be accomplished. If you are a data wrangler who manipulates data files, or an aspiring data engineer | data scientist… these tools and commands should be in your toolbox.
Enjoy!