Whole Tomato Software Forums
Whole Tomato Software Forums
Main Site | Profile | Register | Active Topics | Members | Search | FAQ
User name:
Password:
Save Password
Forgot your password?

 All Forums
 Visual Assist
 Feature Requests
 Include optimization.
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

Rain Dog
Ketchup Master

88 Posts

Posted - Apr 27 2007 :  2:29:38 PM  Show Profile  Reply with Quote
I think it would be a handy feature to have some functionality that will tell you whether or not a particular source file already has a particular include through one of its other included files already including this particular file.

for example:

foo.h:
#include <string>
#include "bar.h"
void foo(int f);

foo.cpp:
#include "foo.h"
#include <string>
#include "bar.h"

void foo(int f){return f*2;}


In foo.cpp we would be told that <string> and "bar.h" are already included and are not needed here. An extension of this feature would also be able to tell us that "bar.h" and <string> are entirely unneeded in foo.cpp/h because there are no references to any symbols in these other files.

This feature is similar to ReSharper's "import optimization" feature that tells you when you have extra imports that are not needed.

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - Apr 30 2007 :  07:44:36 AM  Show Profile  Reply with Quote
An interesting idea, but currently not something we could try to do. Consider the following code:

foo.h

#ifdef WIN32
    #include "bar.h"
    #include <string>
#elif UNIX
    #include "../bar.h"
    #include "homegrown_string.h>
#else
    #include <string.h>
#endif


suddenly there is no single answer to the question, since the code is targeting several OS's, and has different rules for each OS. If ReSharper only supports C# (a guess from memory) then they don't really have this problem.

zen is the art of being at one with the two'ness
Go to Top of Page

sl@sh
Tomato Guru

Switzerland
204 Posts

Posted - May 02 2007 :  06:03:20 AM  Show Profile  Reply with Quote
I certainly would have liked such a feature when I recently refactored some 300 source files to remove unneeded includes (well, from the headers, mostly).

However, be aware that doing this will require more than locally changing one or two files - everytime a header file changes, implementation files might need changing too. For instance if you remove an include from a header because, within the header, you don't need it, you might find that some of the implementation files including this header did rely on that include in the first place. Or even worse - other headers including the changed headers relied on it - making it much more difficult to spot what kind of problem you just created.

I know what I'm talking of: I didn't refactor the 300 files I mentioned above because I wanted to refactor 300 files, but because I wanted to simplify just one header which created problems for me - and then had to unravel all the strings of dependencies through virtually all of the project. (that's what you get when you touch legacy code... )
Go to Top of Page

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - May 02 2007 :  09:58:44 AM  Show Profile  Reply with Quote
I have seen, and experienced similar problems myself, probably why people don't rush to do these sort of jobs.

zen is the art of being at one with the two'ness
Go to Top of Page

Rain Dog
Ketchup Master

88 Posts

Posted - May 07 2007 :  9:19:06 PM  Show Profile  Reply with Quote
The problem you describe is exactly the kind of reason why one would need tool based support. It becomes a huge task to do everything manually so for the most part it never gets done.

On the issue of preprocessor statements, VA already handles the preprocessor very well. It just becomes a matter of "for this particular set of preprocessor defined symbols, what is *not* used."

Also, something else that tool support can do is determine whether or not a particular header file requires an included file or if only a forward declared class/struct can do.
Go to Top of Page

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - May 08 2007 :  1:19:18 PM  Show Profile  Reply with Quote
I agree that this is where you need a tool. Do you have a thought / idea for how this could work, allowing for the complexities of #ifdef statements?

So far the closest I have come is the concept of scanning the file for #include statements, and reporting for each statement if it is "directly" required by the current file.

This would help for the current file, but still leave you with all the other files that break when you change the current file. The other problem is to define if a #include is required. How far, how many files away, do you dig?

I am prepared to suggest this, to see what people think, if we can think up a solution that sounds workable. I just want to make sure we have thought about any major problems first

zen is the art of being at one with the two'ness
Go to Top of Page

sl@sh
Tomato Guru

Switzerland
204 Posts

Posted - May 09 2007 :  04:42:42 AM  Show Profile  Reply with Quote
Well, when I did the refactoring mentioned above, what I did was pretty much straight forward. It did involve regular compiling (on a per file basis), but that was only required because I needed to find out whether taking away a particular include would break a file or not. So instead of compiling, a method that can determine whether a particular include file is needed locally would be sufficient.

What I did was basically this
  • pick a .h, .cpp or .c file

  • remove all existing #include statements (i. e. comment out - it helped to see which includes were used, although technically that wasn't really neccessary)

  • if it's a .h file, find a corresponding .c or .cpp - or if that doesn't exist, create one - then remove all the includes from that .c or .cpp as well (except for the include to the file I was working on, obviously)

  • try compiling (either the .c or .cpp i was working on, or the .c or .cpp corresponding to the .h I was working on), then pick the very first symbol the compiler doesn't recognize

  • insert forward declaration (if possible) or else insert an include that contains the missing symbol's definition, then repeat the above (compiling) until all compiler errors are gone (in case of analyzing headers the header was considered done once there was no more compiler error referring to the header)

  • while performing the above step, each time I needed to add another include statement, recursively apply the algorithm to the newly included header



It was the last step (recursion) that was responsible for my need to go through virtually all of the project, and you can imagine how long that took.

Let's assume VA could provide a method "FindMissingDeclaration" that scans a file for the first symbol that isn't defined, either locally or through forward declarations or includes.

The next method needed would be "InsertMissingDeclaration", which for such a symbol would insert a forward declaration at the top of the file, if applicable, or else insert an #include statement to include the header with the missing declaration. Obviously, in order to catch recursion, this method would need to return a handle to the include file it inserted - if it did insert one.

The complete algorithm "FixDependencies" would look like this:
SelectedHeader.removeIncludes()
while (NULL != MissingSymbol = SelectedHeader.FindMissingDeclaration()) {
    if (NULL != InsertedHeader = SelectedHeader.InsertMissingDeclaration(MissingSymbol)) {
        InsertedHeader.FixDependencies();
    }
}


Unfortunately, this is only the first part. This algorithm only recurses through header files, and every source file (including other header files!) that #includes any of those headers that were modified could potentially be broken. Thus, VA would need to scan the whole solution for source files for #include statements to any of the affected headers. Actually, if VA finds other headers that are indirectly affected, this process would have to be repeated - yet another recursion!

I think the algorithm could be implemented without too many problems. However, testing it, and taking care of corner cases could be ... challenging.

What bugs me a little however is that basically, for this algorithm to work properly, the code needs to be syntactically correct. All of it! Probably it would be possible to cope with some kinds of erroneous code, but overall I would expect all kinds of weird results if applying such a wide-ranging algorithm to an erroneous code base.

Maybe it would probably be better to only go for a lower level solution, such as the methods FindMissingSymbol and InsertMissingDeclaration above. Another method could be useful: "FindDependentSources" could list all source files that include the file you're currently editing - so, if you're about to streamline a header file, you can easily see which other files might be broken by such a change.
Go to Top of Page

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - May 09 2007 :  08:22:38 AM  Show Profile  Reply with Quote
sl@sh you have done a very good job of putting my feeling of unease about this into words, and explaining it.

At my old job we had some "terrible" legacy code that was deep in the system, and was responsible for a large number of our include file problems. We had one infamous header that was nothing by several hundred #include statements

Of course the official advice was to never use that include file, since it pulled in to much code and made the compiles to slow.

The alternative, once you got past "hello world" and needed to pull in high level library headers... spend several hours trying to find the correct mix, and ordering of the header files

Fixing the problem headers was a tempting thought, but the code base was so sprawling, and the problems so many that no one ever tried, and then the whole code base was marked as "legacy" anyway.

So while I understand the need for a tool to do this, I also have some idea of just how hard it might be to make it work well, especially for people targeting multiple platforms.

zen is the art of being at one with the two'ness
Go to Top of Page

Rain Dog
Ketchup Master

88 Posts

Posted - May 15 2007 :  6:44:19 PM  Show Profile  Reply with Quote
I dont think that the code would need to be syntactically correct, as those symbols would just be ignored and whose dependancies would be ignored until the code was correct. There are already a few problems where VA will think a file is included but the compiler thinks otherwise (mostly based on macro confusion it appears).

Since VA already handles the preprocessor well and can determine which symbols are and are not defined in the preprocessor, I don;t think it is required for VA to handle code segments that are blocked out due to #ifdef, etc unless the #ifdef condition was true in which case VA seems to not have problems with handling those sections anyways.

Feline, the problem you describe is exactly again why one needs tool support. To take it to the next level beyond merely not including erroneous unneeded files, a tool could suggest or determine when only a forward declaration would be necessary.
Go to Top of Page

Rain Dog
Ketchup Master

88 Posts

Posted - May 15 2007 :  6:51:36 PM  Show Profile  Reply with Quote
I would like the feature to work similar to how ReSharper or Refactor! operate, in that they parse the file and determine what symbols are used *in* the file and what is the smallest subset of included files (namespaces in .NET of course) that contain all the used symbols. It could be either a constant running thing or run on demand. I would prefer constant running (IE: Intelligent background execution) and then for files that do not need to be included, to exclude them. For i guess downlevel files from a header, i think it would make sense to explicitly place those "optimized" includes directly into the cpp/h files that explicitly require them, for example in the case where a.cpp and b.cpp include foo.h which includes a.h and b.h, a.cpp does not need b.h, so b.h is removed from foo.h and placed into b.cpp as b.cpp depends on foo.h to contain b.h.
Go to Top of Page

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - May 16 2007 :  08:03:26 AM  Show Profile  Reply with Quote
quote:
Originally posted by Rain Dog

Since VA already handles the preprocessor well and can determine which symbols are and are not defined in the preprocessor, I don;t think it is required for VA to handle code segments that are blocked out due to #ifdef, etc unless the #ifdef condition was true in which case VA seems to not have problems with handling those sections anyways.


*um* since when? I must have missed that happening.

One of the core points of my concern here is that VA does NOT handle this.

As far as I am aware C# does not support #ifdef blocks, so there is no need to handle this, or the problems that come with it if you are targeting .NET code with such a tool.

zen is the art of being at one with the two'ness
Go to Top of Page

Rain Dog
Ketchup Master

88 Posts

Posted - May 18 2007 :  12:03:26 AM  Show Profile  Reply with Quote
VA handles it because it's obviously able to parse code that has preprocessor definitions such as macros, etc unless I was mistaken.

C# does support #ifdef and some others.
Go to Top of Page

feline
Whole Tomato Software

United Kingdom
18750 Posts

Posted - May 18 2007 :  08:14:25 AM  Show Profile  Reply with Quote
I think we are talking about totally different things. VA does understand macro's, although this is not always perfect.

Given the code:

#ifdef _DEBUG
    // do something
#else
    // do something else
#endif


VA will parse both blocks. The reason for this is that VA is designed to be active on the code you are working on, even if that code is not going to be compiled given your current settings.

But now consider the following piece of code:

#ifdef WIN32
// include windows headers
#else
// include UNIX headers
#endif


#ifdef WIN32
    // use windows API for file manipulation
#else
    // use UNIX API for file manipulation
#endif


I ended up writing and maintaining this code myself at my old job. The functions were for generating unique file names, and were simply different on the two OS's

The same source code was compiled on two different OS's, but the Visual Studio solution only knew about windows. Make files were used to compile a *different* group of files under UNIX, but as many library files as possible were used on both OS's.

If a tool is asked to analyse this code it is going to report that all UNIX headers should be deleted, since they don't even exist.


Now how about the simpler case:

#ifdef _DEBUG
#include <debugging.h>
#endif


what is a tool supposed to conclude about this header file, and the conditional code that uses it? Remember that a lot of windows header files are full of conditional code, so looking inside header files to work out if they are needed, or not, quickly becomes "difficult".

Do you see what I am thinking about now?

zen is the art of being at one with the two'ness
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
© 2023 Whole Tomato Software, LLC Go To Top Of Page
Snitz Forums 2000