Bug 1310 - Find in files has problems with large files
Summary: Find in files has problems with large files
Alias: None
Product: Xamarin Studio
Classification: Desktop
Component: General ()
Version: Trunk
Hardware: PC All
: Normal enhancement
Target Milestone: Future Cycle (TBD)
Assignee: Mike Krüger
Depends on:
Reported: 2011-10-06 11:31 UTC by Mikayla Hutchinson [MSFT]
Modified: 2016-12-16 19:58 UTC (History)
5 users (show)

Is this bug a regression?: ---
Last known good build:

crash log (8.69 KB, text/plain)
2011-11-08 14:45 UTC, Michael Karayani

Notice (2018-05-24): bugzilla.xamarin.com is now in read-only mode.

Please join us on Visual Studio Developer Community and in the Xamarin and Mono organizations on GitHub to continue tracking issues. Bugzilla will remain available for reference in read-only mode. We will continue to work on open Bugzilla bugs, copy them to the new locations as needed for follow-up, and add the new items under Related Links.

Our sincere thanks to everyone who has contributed on this bug tracker over the years. Thanks also for your understanding as we make these adjustments and improvements for the future.

Please create a new report for Bug 1310 on Developer Community or GitHub if you have new information to add and do not yet see a matching new report.

If the latest results still closely match this report, you can use the original description:

  • Export the original title and description: Developer Community HTML or GitHub Markdown
  • Copy the title and description into the new report. Adjust them to be up-to-date if needed.
  • Add your new information.

In special cases on GitHub you might also want the comments: GitHub Markdown with public comments

Related Links:

Description Mikayla Hutchinson [MSFT] 2011-10-06 11:31:24 UTC
Find in Files currently reads all files into strings. This is really problematic with large files. At the very least it should have a cap on the filesize that it loads in this way.

Alternatively, maybe it could convert the search string into binary using various encodings, and match this on a binary stream. This would also enable searching binary files better.

Regexes will still need to use strings, so they'll need the cap.
Comment 1 Michael Karayani 2011-11-08 14:45:15 UTC
Created attachment 849 [details]
crash log

Find in file has problem not only with big files but with small ones when there are a lot of them. For me it crashes when more than 1500-2000 files on the pad.
Comment 2 Jeffrey Stedfast 2011-11-09 15:30:34 UTC
The core RegexSearch() and Search() logic really needs to be moved into FileProvider so that they have the context to be able to search via a stream, rather than the contents of the entire file read into one big string.

Unfortunately, System.Text.RegularExpressions.Regex does not support searching a stream... sigh.

The Search() method can probably be hacked up to work on a stream (although we'd have to check the pattern string for \n's to figure out how many lines our buffer needs to contain to match against). Annoying, but doable.

I don't know how to get around the Regex limitation.

Imposing an arbitrary file size limitation seems lame. Perhaps the way to do it is to "sample" the file to see if it is even textual (try reading a few lines of text and converting to UTF-8?), if that fails, then just assume no matches... I don't think we actually want to match non-text files anyway, do we?

Real text files aren't *generally* likely to be so large that we won't be able to load them (unless they are perhaps massive log files or uuencoded blobs or something?).

That said, I saw some opportunities to reduce the number of copies of the loaded file content as a string and committed a patch to try and limit it to 1 string content buffer per file. That should help things, but might not be enough.
Comment 3 Jeffrey Stedfast 2011-11-09 15:50:04 UTC
I guess we could actually do the same multi-line hack for Regex (except what do we do if something like \n+ is in the pattern?).

If we didn't have to support multi-line searches, this would be so much easier... would anyone be opposed to us dropping multi-line matching support?
Comment 4 Greg Munn 2016-12-16 16:48:17 UTC
Is this still an issue?
Comment 5 Mikayla Hutchinson [MSFT] 2016-12-16 18:08:44 UTC
It appears this has been fixed: https://github.com/mono/monodevelop/blob/master/main/src/core/MonoDevelop.Ide/MonoDevelop.Ide.FindInFiles/FindReplace.cs

It only reads the full file into memory when using regexes or replace patterns.

A couple ways we could improve further:
* when replacing, only read full file into memory when it finds a match in that file.
* for regex search, use a streaming regex engine
* don't search files over a particular size (i.e. stop multi-gigabyte files from OOMing).

Probably not worth worrying about them right now though.
Comment 6 Greg Munn 2016-12-16 19:58:11 UTC
ok, lowering the priority and marking confirmed as something that can still be done.