Function Hijacking Mitigation

by Walter Bright
Digital Mars
http://www.digitalmars.com

software is more complex
more reliance on module interfaces
users combine modules from multiple sources
- users have no control over them
- they have no knowledge of users
modules must be improvable
users must have programmatic notification of breaking changes

we'll cover function hijacking
adding reasonable declarations in a module
wreak havoc on users in C++ and Java
look at solutions in the D programming language

Global Function Hijacking

application that imports two modules:

X from the XXX Corporation
Y from the YYY Corporation

X and Y are unrelated to each other, and are used for completely different purposes.

module X;

void foo();
void foo(long);

module Y;

void bar();

The application program would look like:

import X;
import Y;

void abc()
{
  foo(1); // calls X.foo(long)
}

void def()
{
  bar();  // calls Y.bar();
}

so far, so good
application is tested and works
application is shipped
time goes by
application programmer moves on
application is put in maintenance mode

and then...

YYY Corporation, responding to customer requests, adds a type A and a function foo(A)

module Y;

void bar();
class A;
void foo(A);

application maintainer gets the latest version of Y
recompiles
no problems
but then...

YYY Corporation expands the functionality of foo(A), adding a function foo(int):

module Y;

void bar();
class A;
void foo(A);
void foo(int);

Suddenly something unexpected happens to our application:

import X;
import Y;

void abc()
{
  foo(1); // calls Y.foo(int)
          // not X.foo(long)
}

void def()
{
  bar();
}

The problem is, this is how overloading is supposed to work!

Mitigation?

The module developer can mitigate by:

using namespaces
using 'unique' name prefixes
using obscure names

But that's no guarantee, and there's nothing the user can do about it.

Fixing the Language

The first stab:

by default functions can only overload against other functions in the same module
if a name is found in more than one scope, in order to use it it must be fully qualified
in order to overload functions from multiple modules together, an alias statement is used to merge the overloads

application maintainer now gets a compilation error that foo is defined in both module X and module Y

works, but is a little restrictive
no way foo(A) would be confused with foo() or foo(long)
must be a better way

Overload Sets

formed by a group of functions with the same name declared in the same scope.

X.foo() and X.foo(long) form one overload set
Y.foo(A) and Y.foo(int) form another overload set

Our method for resolving a call to foo becomes:

Perform overload resolution independently on each overload set
If there is no match in any overload set, then error
If there is a match in exactly one overload set, then go with that
If there is a match in more than one overload set, then error

Most Importantly

even if there is a BETTER match in one overload set over another overload set, it is still an error. The overload sets must not overlap.

void abc()
{
 foo(1); // matches Y.foo(int)
         // matches X.foo(long)
         // error!
 A a;
 foo(a); // matches Y.foo(A)
         // no match in X
 foo();  // matches X.foo()
         // no match in Y
}

to overload foo between X and Y:

import X;
import Y;

alias X.foo foo;
alias Y.foo foo;

void abc()
{
 foo(1); // calls Y.foo(int)
         // not X.foo(long)
}

Hijacking can happen here, but user deliberately conflated the overload sets.

Derived Class Member Function Hijacking

Imagine a class A coming from AAA Corporation:

module M;

class A { }

Application code derives from A and adds virtual member function foo:

import M;

class B : A
{
 void foo(long);
}

void abc(B b)
{
 b.foo(1); //calls B.foo(long)
}

AAA Corporation (who cannot know about B) extends A's functionality by adding foo(int):

module M;

class A
{
  void foo(int);
}

Assume Java-style overloading rules: base class member functions overload right alongside derived class functions.

import M;

class B : A
{
  void foo(long);
}

void abc(B b)
{
 b.foo(1); //calls A.foo(int)
}

A.foo(int) hijacked call to B.foo(long).

Mitigation

In C++, functions in a derived class hide all the functions of the same name in a base class.

Even if the functions in the base class might be a better match.

Overloading can still be done with using declaration.

D follows the same method.

Base Class Member Function Hijacking

Hijacking can go the other way, too.

A derived class can hijack a base class member function!

module M;

class A
{
    void def() { }
}

application code derives from A, adds virtual member function foo:

import M;

class B : A
{
  void foo();
}

void abc(B b)
{
  b.def(); // calls A.def()
}

AAA Corporation once again knows nothing about B.

AAA adds function foo()

uses it to implement some new functionality of A

module M;

class A
{
  void foo();

  void def()
  {
    foo(); // expects A.foo()
           // but gets B.foo()
  }
}

B.foo() has hijacked A.foo()!

Shouldn't A.foo() be non-virtual?

What if A expects A.foo() to be overridden?
But B.foo() is not designed to override A.foo()!

No way to safely add functionality to A.

Solution: Qualify with overload

To override function in a base class, use the storage class override.

Error if:

overriding without using the override storage class
uses the override without overriding anything

class C
{
  void foo();
  void bar();
}
class D : C
{
  override void foo(); //ok
  void bar();          //error
           //overrides C.bar()
  override void abc(); //error
                  //no C.abc()
}

eliminates this form of hijacking

Derived Class Member Function Hijacking #2

module A;

class A
{
  void def()
  {
    foo(1);
  }

  void foo(long);
}

foo(long) is a virtual function that provides a specific functionality.

Our user overrides foo(long):

import A;

class B
{
  override void foo(long);
}

void abc(B b)
{
  b.def(); // eventually calls
           // B.foo(long)
}

call to foo(1) inside A winds up correctly calling B.foo(long).

A's designer decides to optimize things, and adds an overload for foo:

module A;

class A
{
  void def()
  {
    foo(1);
  }

  void foo(long);
  void foo(int);
}

Again, our user class:

import A;

class B
{
  override void foo(long);
}

void abc(B b)
{
  b.def(); //eventually calls
           //A.foo(int)
}

B is no longer overriding A's foo!

It's been hijacked by the base class.

B needs to add another function:

class B
{
    override void foo(long);
    override void foo(int);
}

But there's no indication this must be done.

A's vtbl[] looks like:

A.vtbl[0] = &A.foo(long);
A.vtbl[1] = &A.foo(int);

B's vtbl[] looks like:

B.vtbl[0] = &B.foo(long);
B.vtbl[1] = &A.foo(int);

call in A.def() to foo(int) is actually a call to vtbl[1].

We'd really like A.foo(int) to be inaccessible from a B object.

Solution: Fix the `vtbl[]`

The solution is to rewrite B's vtbl[] as:

B.vtbl[0] = &B.foo(long);
B.vtbl[1] = &error;

calling vtbl[1] means error() is called instead

which throws an exception

not caught at compile time, but at least it's caught

Conclusion

Function hijacking is a nasty problem
There's no defense
Need to change the language to fix

Discussion

Who works on very large projects?
Has anyone been caught by hijacking?
Who uses source code analyzers?

which ones?
do they find real bugs?
do they detect hijacking?